Danger
Nothing here should be used for any security purposes.
Usage#
This document walks through a concrete example. It assumes that you have obtained the wcyheproof data as discussed in Obtaining the Wycheproof data, and that you have at least glanced the data overview section.
Structure of use#
The structure of one way to use this module might look something like
from toy_crypto import wycheproof
# import ... # the modules with the things you will be testing
# WP_ROOT: Path = ... # a pathlib.Path for the root wycheproof directory
loader = wycheproof.Loader(WP_ROOT) # This only needs to be done once
...
# Get test data from one of the data files
test_data = loader.load("SOME_WYCHEPROOF_DATA_FILE_test.json")
... # May wish to get some information from test_data
# for loggging or reporting.
for group in test.groups:
... # Per TestGroup setup
for test in group.tests:
... # set up for specific test
... # perform computation with thing you are testing
... # Check that your results meet expectations
For the example below, we will step through parts of that, but will sometimes need to use a different flow so that each of the parts actually runs when constructing this document.
An example#
We will be testing RSA decryption from PyCryptodome
against the Wycheproof OAEP test data for 2048-bit keys with SHA1 as the
hash algorithm and MGF1SHA1 as the mask generation function.
The data file for those tests is in
testvectors_v1/rsa_oaep_2048_sha1_mgf1sha1_test.json
relative to WP_ROOT.
In what follows, we assume that you have already set up WP_ROOT
as a pathlib.Path
with the appropriate file system location.
See Obtaining the Wycheproof data for discussion of ways to do that.
Set up loader#
This assumes that you have already set up WP_ROOT
(or whatever you wish to call it)
as a pathlib.Path
with the appropriate file system location
as discussed Obtaining the Wycheproof data.
To be able to load a wycheproof JSON data file a loader must first be set up.
The Loader`
you create will not only know where the data files are,
but it will have internal mechanisms set up for constructing the schemata
used for validating the loaded JSON.
from pathlib import Path
from toy_crypto import wycheproof
# These imports include the function we will be testing
from Crypto.PublicKey import RSA
from Crypto.Cipher import PKCS1_OAEP
# WP_ROOT: Path = ... # set up elsewhere
loader = wycheproof.Loader(WP_ROOT)
Loading the test data#
Now what we have loader
, we can use it
to load Wycheproof data.
The data is loaded using Loader.load()
.
The loaded TestData
instance is not the
raw result of loading JSON, but many of its internals
still reflect its origins.
test_data = loader.load("rsa_oaep_2048_sha1_mgf1sha1_test.json")
assert test_data.header == "Test vectors of type RsaOeapDecrypt check decryption with OAEP."
If for some reason the JSON does not validate against the expected schema,
warnings will be logged at the
logging.WARNING
level.
For each TestGroup
#
Test cases are organized into test groups within the raw data.
See Data overview for more information about
what kinds of things are typically found in test groups.
TestData.groups
returns an
Iterator of TestGroups
.
In the case of this test data each
TestGroup
specifies
the parameters needed to construct a private RSA key
that is to be used for all tests in the group.
The private key is offered in several formats.
In this example,
I will use the Crypto.PublicKey.RSA.import_key()
method
to get the key information from the PEM format.
for group in test_data.groups:
pem = group.other_data["privateKeyPem"]
sk = RSA.import_key(pem)
## Let's do some sanity checks on the private keys
assert sk.size_in_bits() == 2048
assert sk.has_private()
Each group also has the parameters used for our RSA decryption. These are the same for all test groups in this particular data set. So let’s just do a sanity check on this just for demonstration purposes.
for g in test_data.groups:
assert g["keySize"] == 2048
assert g["sha"] == "SHA-1"
assert g["mgf"] == "MGF1"
assert g["mgfSha"] == "SHA-1"
For each TestCase
#
We are finally ready for our actual tests.
In addition to the properties that all Wycheproof test cases have, the test cases here have.
- “msg”
The plaintext message
- “ct”
The ciphertext
- “label”
The OAEP label that is rarely ever used.
These are accessible as keys to the dictionary
TestCase.other_data
.
Fortunately the defaults for creating a cryptor,
Crypto.Cipher.PKCS1_OAEP.new()
cryptor with PyCryptodome
uses as hash algorithm, mask generation function are the ones we
are testing here, so we won’t have to specify them.
We can create the cryptor we wish to test with
cryptor = PKCS1_OAEP.new(key = sk, label = label)
where sk
is the private key we set up for the test group,
and label
is from each test.
test_count = 0
group_count = 0
for g in test_data.groups:
group_count += 1
pem = group.other_data["privateKeyPem"]
sk = RSA.import_key(pem)
for case in g.tests:
test_count += 1
label: bytes = case.other_data["label"]
ciphertext: bytes = case.other_data["ct"]
message: bytes = case.other_data["msg"]
cryptor = PKCS1_OAEP.new(key=sk, label=label)
decrypted: bytes
try:
decrypted = cryptor.decrypt(ciphertext)
except ValueError:
assert case.invalid
else:
assert case.valid
assert decrypted == message
assert test_count == test_data.test_count
print(f"Completed a total {test_count} tests in {group_count} group(s).")
Completed a total 36 tests in 1 group(s).
Data conversion#
The TLDR for this section is that you are advised to make sure that
things like case.other_data["ct"]
are of the data types you expect
when you run tests.
Be familiar with the data you are importing, and do not rely
on the fully automatic conversion from hex strings to bytes or integers
to always get things right.
We will continue with the same example as above for this discussion.
In some of the test cases in the test data we used,
the "ct"
, "msg"
, and "label"
JSON keywords
have values that are strings.
In all of those cases, the strings are hex encoded byte sequences.
Consider this excerpt from test case 9:
{
"tcId" : 9,
"comment" : "",
"flags" : [
"EncryptionWithLabel"
],
"msg" : "313233343030", // That is actually hex encoded
"ct" : ..., // A longer string of hex digits was here
"label" : "000102030405060708090a0b0c0d0e0f10111213",
"result" : "valid"
}
But when we ran our tests we were able to use code like
label: bytes = case.other_data["label"]
ciphertext: bytes = case.other_data["ct"]
message: bytes = case.other_data["msg"]
and those things really were bytes.
The initializers for TestGroup
and TestCase
automatically perform some necessary conversions from hexadecimal
strings to bytes
or int
as appropriate.
It does this using the data from TestData.formats
,
which is a mapping from JSON keywords to information about how
the string is formatted.
# We have already loaded test_data with:
# test_data = loader.load("rsa_oaep_2048_sha1_mgf1sha1_test.json")
formats: dict[str, str] = test_data.formats
assert formats["ct"] == "HexBytes"
assert formats["publicExponent"] == "BigInt" # Not used yet
TestData.formats
is constructed by inspecting the schema associated with
with the JSON file requesting during loading.
That may give incorrect results when there are multiple places a particular
JSON keyword might exist in the data.
As a consequence, the automatic conversion is conservative and only
acts on the keywords in the top-most level of a test group or test case.
Additionally, the TestData.formats
dictionary will be empty
when the loaded JSON was successfully validated when the JSON was loaded.
Users can use TestData.schema_is_valid()
to check
whether the JSON test file was successfully validated
against its JSON schema.
When that validation fails,
Semi-automatic conversion#
As mentioned above,
the fully automatic conversation using TestData.formats
is only performed at the top level of
each TestGroup
and TestCase
.
Suppose in our OAEP test, instead of creating the
private key from the PEM format in each test group
we created it through the information in other_data["privateKey"]
.
"testGroups" : [
{
...
"privateKey" : {
"privateExponent" : ...,
"publicExponent" : "010001", // that is a hex string
"prime1" : ...,
"prime2" : ...,
...
},
"privateKeyPem" : ...,
...
"tests" : [ ... ]
}
]
we might need to extract the values of
"publicExponent"
, "prime1"
, and "prime2"
in each test group.
But these are not top-level keys within the test group,
and so they will remain as hex strings.
# We will test for just the first group
group = next(test_data.groups)
priv_key_data = group.other_data["privateKey"]
e = priv_key_data["publicExponent"]
assert isinstance(e, str)
print(e)
010001
But because formats
does contain information for the relevant
members of group.other_data["privateKey"]
we can manually
call automatic conversion using deserialize_top_level()
.
Note that this mutates the dictionary it is given.
group = next(test_data.groups)
priv_key_data = group.other_data["privateKey"]
wycheproof.deserialize_top_level(priv_key_data, test_data.formats)
e = priv_key_data["publicExponent"]
p = priv_key_data["prime1"]
q = priv_key_data["prime2"]
N = priv_key_data["modulus"]
assert p * q == N
assert isinstance(e, int)
print(e)
65537
Note again that schema loading and validation can fail
Gotchas#
Pretty much all of the many ways this can break are a consequence of the fact that I have not found a way to make use of JSON schemata the way I feel they should be able to be used. When I started working on this module, I had assumed that there would be a fairly straightforward way to make use of the JSON schema loaded for each test file to reason about the loaded JSON within Python code.
Hard-coded data assumptions#
The code here makes assumptions about things that will be common
to all of the wycheproof JSON test files. Similarly it makes assumptions
about what each test group within those files will have.
I have yet to write tests to see if those actually hold of each test file.
It is more likely that these assumptions will fail in with test vectors from the
older wycheproof/testvectors
directory than in the (default)
wycheproof/testvectors
.
If you find that the assumptions fail for things in wycheproof/testvectors
definitely let me know.
Data conversion#
As discussed in Data conversion
the automatic data conversion of hexadecimal strings
to bytes
or int
s will miss things
that you will need to manually handle, perhaps with the help of
deserialize_top_level()
.
Additionally there are currently (August 29, 2025)
52 test files in the wycheproof project that are missing schemas.
In these cases, no automatic conversion will be attempted
and TestData.formats
will be empty.
TestData.schema_is_valid()
can be used to check
if there was a problem during schema loading and validation.
TestData.schema_file
can be used for further debugging.
It is also possible that it will attempt to convert things it shouldn’t or be mistaken about which conversion to use. If you find that this occurs, please let me know.
Other data is just the left overs#
The dictionaries
TestData.other_data
,
TestGroup.other_data
,
and TestCase.other_data
exclude things for which those classes automatically offer as properties.
For example, the existence of TestGroup.algorithm
means that
trying something like test_data.other_data["algorithm"]
will result
in a KeyError
.
There are reasons for my choice here. Perhaps not good reasons, but reasons none the less.