Danger

Nothing here should be used for any security purposes.

  • If you need cryptographic tools in a Python environment use pyca or PyNaCl.

  • If you need efficient and reliable abstract math utilities in a Python-like environment consider using SageMath.

Usage#

This document walks through a concrete example. It assumes that you have obtained the wcyheproof data as discussed in Obtaining the Wycheproof data, and that you have at least glanced the data overview section.

Structure of use#

The structure of one way to use this module might look something like

from toy_crypto import wycheproof
# import ... # the modules with the things you will be testing

# WP_ROOT: Path = ... # a pathlib.Path for the root wycheproof directory
loader = wycheproof.Loader(WP_ROOT)  # This only needs to be done once
...

# Get test data from one of the data files

test_data = loader.load("SOME_WYCHEPROOF_DATA_FILE_test.json")
... # May wish to get some information from test_data
    # for loggging or reporting.

for group in test.groups:
    ... # Per TestGroup setup
    for test in group.tests:
        ... # set up for specific test
        ... # perform computation with thing you are testing
        ... # Check that your results meet expectations

For the example below, we will step through parts of that, but will sometimes need to use a different flow so that each of the parts actually runs when constructing this document.

An example#

We will be testing RSA decryption from PyCryptodome against the Wycheproof OAEP test data for 2048-bit keys with SHA1 as the hash algorithm and MGF1SHA1 as the mask generation function. The data file for those tests is in testvectors_v1/rsa_oaep_2048_sha1_mgf1sha1_test.json relative to WP_ROOT.

In what follows, we assume that you have already set up WP_ROOT as a pathlib.Path with the appropriate file system location. See Obtaining the Wycheproof data for discussion of ways to do that.

Set up loader#

This assumes that you have already set up WP_ROOT (or whatever you wish to call it) as a pathlib.Path with the appropriate file system location as discussed Obtaining the Wycheproof data.

To be able to load a wycheproof JSON data file a loader must first be set up. The Loader` you create will not only know where the data files are, but it will have internal mechanisms set up for constructing the schemata used for validating the loaded JSON.

from pathlib import Path
from toy_crypto import wycheproof

# These imports include the function we will be testing
from Crypto.PublicKey import RSA
from Crypto.Cipher import PKCS1_OAEP

# WP_ROOT: Path = ... # set up elsewhere
loader = wycheproof.Loader(WP_ROOT)

Loading the test data#

Now what we have loader, we can use it to load Wycheproof data.

The data is loaded using Loader.load(). The loaded TestData instance is not the raw result of loading JSON, but many of its internals still reflect its origins.

test_data = loader.load("rsa_oaep_2048_sha1_mgf1sha1_test.json")

assert test_data.header == "Test vectors of type RsaOeapDecrypt check decryption with OAEP."

If for some reason the JSON does not validate against the expected schema, warnings will be logged at the logging.WARNING level.

For each TestGroup#

Test cases are organized into test groups within the raw data. See Data overview for more information about what kinds of things are typically found in test groups. TestData.groups returns an Iterator of TestGroups.

In the case of this test data each TestGroup specifies the parameters needed to construct a private RSA key that is to be used for all tests in the group.

The private key is offered in several formats. In this example, I will use the Crypto.PublicKey.RSA.import_key() method to get the key information from the PEM format.

for group in test_data.groups:
    pem = group.other_data["privateKeyPem"]
    sk = RSA.import_key(pem)

    ## Let's do some sanity checks on the private keys

    assert sk.size_in_bits() == 2048
    assert sk.has_private()

Each group also has the parameters used for our RSA decryption. These are the same for all test groups in this particular data set. So let’s just do a sanity check on this just for demonstration purposes.

for g in test_data.groups:
    assert g["keySize"] == 2048
    assert g["sha"] == "SHA-1"
    assert g["mgf"] == "MGF1"
    assert g["mgfSha"] == "SHA-1"

For each TestCase#

We are finally ready for our actual tests.

In addition to the properties that all Wycheproof test cases have, the test cases here have.

“msg”

The plaintext message

“ct”

The ciphertext

“label”

The OAEP label that is rarely ever used.

These are accessible as keys to the dictionary TestCase.other_data.

Fortunately the defaults for creating a cryptor, Crypto.Cipher.PKCS1_OAEP.new() cryptor with PyCryptodome uses as hash algorithm, mask generation function are the ones we are testing here, so we won’t have to specify them. We can create the cryptor we wish to test with

cryptor = PKCS1_OAEP.new(key = sk, label = label)

where sk is the private key we set up for the test group, and label is from each test.

test_count = 0
group_count = 0
for g in test_data.groups:
    group_count += 1
    pem = group.other_data["privateKeyPem"]
    sk = RSA.import_key(pem)

    for case in g.tests:
        test_count += 1

        label: bytes = case.other_data["label"]
        ciphertext: bytes = case.other_data["ct"]
        message: bytes = case.other_data["msg"]

        cryptor = PKCS1_OAEP.new(key=sk, label=label)

        decrypted: bytes
        try:
            decrypted = cryptor.decrypt(ciphertext)
        except ValueError:
            assert case.invalid
        else:
            assert case.valid
            assert decrypted == message

assert test_count == test_data.test_count
print(f"Completed a total {test_count} tests in {group_count} group(s).")
Completed a total 36 tests in 1 group(s).

Data conversion#

The TLDR for this section is that you are advised to make sure that things like case.other_data["ct"] are of the data types you expect when you run tests. Be familiar with the data you are importing, and do not rely on the fully automatic conversion from hex strings to bytes or integers to always get things right.

We will continue with the same example as above for this discussion.

In some of the test cases in the test data we used, the "ct", "msg", and "label" JSON keywords have values that are strings. In all of those cases, the strings are hex encoded byte sequences. Consider this excerpt from test case 9:

{
    "tcId" : 9,
    "comment" : "",
    "flags" : [
        "EncryptionWithLabel"
    ],
    "msg" : "313233343030",  // That is actually hex encoded
    "ct" : ..., // A longer string of hex digits was here
    "label" : "000102030405060708090a0b0c0d0e0f10111213",
    "result" : "valid"
}

But when we ran our tests we were able to use code like

label: bytes = case.other_data["label"]
ciphertext: bytes = case.other_data["ct"]
message: bytes = case.other_data["msg"]

and those things really were bytes.

The initializers for TestGroup and TestCase automatically perform some necessary conversions from hexadecimal strings to bytes or int as appropriate. It does this using the data from TestData.formats, which is a mapping from JSON keywords to information about how the string is formatted.

# We have already loaded test_data with:
# test_data = loader.load("rsa_oaep_2048_sha1_mgf1sha1_test.json")

formats: dict[str, str] = test_data.formats

assert formats["ct"] == "HexBytes"
assert formats["publicExponent"] == "BigInt" # Not used yet

TestData.formats is constructed by inspecting the schema associated with with the JSON file requesting during loading. That may give incorrect results when there are multiple places a particular JSON keyword might exist in the data. As a consequence, the automatic conversion is conservative and only acts on the keywords in the top-most level of a test group or test case.

Additionally, the TestData.formats dictionary will be empty when the loaded JSON was successfully validated when the JSON was loaded. Users can use TestData.schema_is_valid() to check whether the JSON test file was successfully validated against its JSON schema. When that validation fails,

Semi-automatic conversion#

As mentioned above, the fully automatic conversation using TestData.formats is only performed at the top level of each TestGroup and TestCase.

Suppose in our OAEP test, instead of creating the private key from the PEM format in each test group we created it through the information in other_data["privateKey"].

    "testGroups" : [
        {
            ...
            "privateKey" : {
                "privateExponent" : ...,
                "publicExponent" : "010001", // that is a hex string
                "prime1" : ...,
                "prime2" : ...,
                ...
            },
            "privateKeyPem" : ...,
            ...
            "tests" : [ ... ]
        }
    ]

we might need to extract the values of "publicExponent", "prime1", and "prime2" in each test group. But these are not top-level keys within the test group, and so they will remain as hex strings.

# We will test for just the first group
group = next(test_data.groups)

priv_key_data = group.other_data["privateKey"]
e = priv_key_data["publicExponent"]

assert isinstance(e, str)
print(e)
010001

But because formats does contain information for the relevant members of group.other_data["privateKey"] we can manually call automatic conversion using deserialize_top_level(). Note that this mutates the dictionary it is given.

group = next(test_data.groups)

priv_key_data = group.other_data["privateKey"]

wycheproof.deserialize_top_level(priv_key_data, test_data.formats)

e = priv_key_data["publicExponent"]
p = priv_key_data["prime1"]
q = priv_key_data["prime2"]
N = priv_key_data["modulus"]

assert p * q == N

assert isinstance(e, int)

print(e)
65537

Note again that schema loading and validation can fail

Gotchas#

Pretty much all of the many ways this can break are a consequence of the fact that I have not found a way to make use of JSON schemata the way I feel they should be able to be used. When I started working on this module, I had assumed that there would be a fairly straightforward way to make use of the JSON schema loaded for each test file to reason about the loaded JSON within Python code.

Hard-coded data assumptions#

The code here makes assumptions about things that will be common to all of the wycheproof JSON test files. Similarly it makes assumptions about what each test group within those files will have. I have yet to write tests to see if those actually hold of each test file. It is more likely that these assumptions will fail in with test vectors from the older wycheproof/testvectors directory than in the (default) wycheproof/testvectors.

If you find that the assumptions fail for things in wycheproof/testvectors definitely let me know.

Data conversion#

As discussed in Data conversion the automatic data conversion of hexadecimal strings to bytes or ints will miss things that you will need to manually handle, perhaps with the help of deserialize_top_level().

Additionally there are currently (August 29, 2025) 52 test files in the wycheproof project that are missing schemas. In these cases, no automatic conversion will be attempted and TestData.formats will be empty. TestData.schema_is_valid() can be used to check if there was a problem during schema loading and validation. TestData.schema_file can be used for further debugging.

It is also possible that it will attempt to convert things it shouldn’t or be mistaken about which conversion to use. If you find that this occurs, please let me know.

Other data is just the left overs#

The dictionaries TestData.other_data, TestGroup.other_data, and TestCase.other_data exclude things for which those classes automatically offer as properties. For example, the existence of TestGroup.algorithm means that trying something like test_data.other_data["algorithm"] will result in a KeyError.

There are reasons for my choice here. Perhaps not good reasons, but reasons none the less.