.. include:: /../common/unsafe.rst

.. _PyCryptodome: https://www.pycryptodome.org


Usage
=================

.. currentmodule:: toy_crypto.wycheproof

This document walks through a concrete example.
It assumes that you have
obtained the wcyheproof data as discussed in
:ref:`sec-wycheproof-obtain`,
and that you have at least glanced the
:ref:`data overview section <sec_wycheproof_data_overview>`.


Structure of use
+++++++++++++++++

The structure of one way to use this module might look something like

.. code-block:: python

    from toy_crypto import wycheproof
    # import ... # the modules with the things you will be testing

    # WP_ROOT: Path = ... # a pathlib.Path for the root wycheproof directory
    loader = wycheproof.Loader(WP_ROOT)  # This only needs to be done once
    ...

    # Get test data from one of the data files

    test_data = loader.load("SOME_WYCHEPROOF_DATA_FILE_test.json")
    ... # May wish to get some information from test_data
        # for loggging or reporting.

    for group in test.groups:
        ... # Per TestGroup setup
        for test in group.tests:
            ... # set up for specific test
            ... # perform computation with thing you are testing
            ... # Check that your results meet expectations

For the example below, we will step through parts of that,
but will sometimes need to use a different flow so that each
of the parts actually runs when constructing this document.

An example
+++++++++++

We will be testing RSA decryption from PyCryptodome_
against the Wycheproof OAEP test data for 2048-bit keys with SHA1 as the
hash algorithm and MGF1SHA1 as the mask generation function.
The data file for those tests is in
``testvectors_v1/rsa_oaep_2048_sha1_mgf1sha1_test.json`` relative to WP_ROOT.

In what follows, we assume that you have already set up ``WP_ROOT``
as a :py:class:`pathlib.Path` with the appropriate file system location.
See :ref:`sec-wycheproof-obtain` for discussion of ways to do that.

Set up loader
--------------

.. testsetup:: 

    # Use the str BASE_TEST_DATA from doctest_global_setup

    from pathlib import Path

    WP_ROOT = Path(BASE_TEST_DATA) / "resources" / "wycheproof"
    assert WP_ROOT.is_dir(), str(WP_ROOT)


This assumes that you have already set up ``WP_ROOT``
(or whatever you wish to call it)
as a :py:class:`pathlib.Path` with the appropriate file system location
as discussed :ref:`sec-wycheproof-obtain`.

To be able to load a wycheproof JSON data file a loader must first be set up.
The :class:`Loader`` you create will not only know where the data files are,
but it will have internal mechanisms set up for constructing the schemata
used for validating the loaded JSON.

..  testcode::

    from pathlib import Path
    from toy_crypto import wycheproof

    # These imports include the function we will be testing
    from Crypto.PublicKey import RSA
    from Crypto.Cipher import PKCS1_OAEP

    # WP_ROOT: Path = ... # set up elsewhere
    loader = wycheproof.Loader(WP_ROOT)

Loading the test data
----------------------

Now what we have ``loader``, we can use it
to load Wycheproof data.

The data is loaded using :meth:`Loader.load`.
The loaded :class:`TestData` instance is not the
raw result of loading JSON, but many of its internals
still reflect its origins.

..  testcode::
   
    test_data = loader.load("rsa_oaep_2048_sha1_mgf1sha1_test.json")

    assert test_data.header == "Test vectors of type RsaOeapDecrypt check decryption with OAEP."

If for some reason the JSON does not validate against the expected schema,
warnings will be logged at the
|WARNING|_ level.

.. |WARNING| replace:: ``logging.WARNING``
.. _WARNING: https://docs.python.org/3/library/logging.html#logging.WARNING

For each :class:`TestGroup`
-----------------------------------

Test cases are organized into test groups within the raw data.
See :ref:`sec_wycheproof_data_overview` for more information about
what kinds of things are typically found in test groups.
:attr:`TestData.groups` returns an 
Iterator of :class:`TestGroup\s`.

In the case of this test data each
:class:`TestGroup` specifies
the parameters needed to construct a private RSA key
that is to be used for all tests in the group.

The private key is offered in several formats.
In this example,
I will use the :external+crypto:func:`Crypto.PublicKey.RSA.import_key` method
to get the key information from the PEM format.


..  testcode::

    for group in test_data.groups:
        pem = group.other_data["privateKeyPem"]
        sk = RSA.import_key(pem)

        ## Let's do some sanity checks on the private keys
    
        assert sk.size_in_bits() == 2048
        assert sk.has_private()

Each group also has the parameters used for our RSA decryption.
These are the same for all test groups in this particular data set.
So let's just do a sanity check on this just for demonstration purposes.

..  testcode::

    for g in test_data.groups:
        assert g["keySize"] == 2048
        assert g["sha"] == "SHA-1"
        assert g["mgf"] == "MGF1"
        assert g["mgfSha"] == "SHA-1"
        

For each :class:`TestCase`
--------------------------------------

We are finally ready for our actual tests.

In addition to the properties that all Wycheproof test cases have,
the test cases here have. 

"msg"
    The plaintext message

"ct"
    The ciphertext

"label"
    The OAEP label that is rarely ever used.

These are accessible as keys to the dictionary
:attr:`TestCase.other_data`.

Fortunately the defaults for creating a cryptor,
:external+crypto:func:`Crypto.Cipher.PKCS1_OAEP.new`
cryptor with PyCryptodome_
uses as hash algorithm, mask generation function are the ones we
are testing here, so we won't have to specify them.
We can create the cryptor we wish to test with

.. code-block:: python

    cryptor = PKCS1_OAEP.new(key = sk, label = label)

where ``sk`` is the private key we set up for the test group,
and ``label`` is from each test.

..  testcode::

    test_count = 0
    group_count = 0
    for g in test_data.groups:
        group_count += 1
        pem = group.other_data["privateKeyPem"]
        sk = RSA.import_key(pem)

        for case in g.tests:
            test_count += 1
        
            label: bytes = case.other_data["label"]
            ciphertext: bytes = case.other_data["ct"]
            message: bytes = case.other_data["msg"]

            cryptor = PKCS1_OAEP.new(key=sk, label=label)

            decrypted: bytes
            try:
                decrypted = cryptor.decrypt(ciphertext)
            except ValueError:
                assert case.invalid
            else:
                assert case.valid
                assert decrypted == message

    assert test_count == test_data.test_count
    print(f"Completed a total {test_count} tests in {group_count} group(s).")

.. testoutput::

    Completed a total 36 tests in 1 group(s).

.. _sec_wycheproof_data_conversion:

Data conversion
++++++++++++++++++++++++++++

The TLDR for this section is that you are advised to make sure that
things like ``case.other_data["ct"]`` are of the data types you expect
when you run tests.
Be familiar with the data you are importing, and do not rely
on the fully automatic conversion from hex strings to bytes or integers
to always get things right.

We will continue with the same example as above for this discussion.

In some of the test cases in the test data we used,
the ``"ct"``, ``"msg"``, and ``"label"`` JSON keywords
have values that are strings.
In all of those cases, the strings are hex encoded byte sequences.
Consider this excerpt from test case 9:

.. code-block:: json
    :force:

    {
        "tcId" : 9,
        "comment" : "",
        "flags" : [
            "EncryptionWithLabel"
        ],
        "msg" : "313233343030",  // That is actually hex encoded
        "ct" : ..., // A longer string of hex digits was here
        "label" : "000102030405060708090a0b0c0d0e0f10111213",
        "result" : "valid"
    }

But when we ran our tests we were able to use code like

.. code-block:: python

    label: bytes = case.other_data["label"]
    ciphertext: bytes = case.other_data["ct"]
    message: bytes = case.other_data["msg"]

and those things really were bytes.

The initializers for :class:`TestGroup` and :class:`TestCase`
automatically perform *some* necessary conversions from hexadecimal
strings to :py:class:`bytes` or :py:class:`int` as appropriate.
It does this using the data from :attr:`TestData.formats`,
which is a mapping from JSON keywords to information about how
the string is formatted.

.. testcode::

    # We have already loaded test_data with:
    # test_data = loader.load("rsa_oaep_2048_sha1_mgf1sha1_test.json")

    formats: dict[str, str] = test_data.formats

    assert formats["ct"] == "HexBytes"
    assert formats["publicExponent"] == "BigInt" # Not used yet

:attr:`TestData.formats` is constructed by inspecting the schema associated with
with the JSON file requesting during loading.
That may give incorrect results when there are multiple places a particular
JSON keyword might exist in the data.
As a consequence, the automatic conversion is conservative and only
acts on the keywords in the top-most level of a test group or test case.

Additionally, the :attr:`TestData.formats` dictionary will be empty
when the loaded JSON was successfully validated when the JSON was loaded.
Users can use :func:`TestData.schema_is_valid` to check
whether the JSON test file was successfully validated
against its JSON schema.
When that validation fails, 


Semi-automatic conversion
-------------------------

As mentioned above,
the fully automatic conversation using :attr:`TestData.formats`
is only performed at the top level of
each :class:`TestGroup` and :class:`TestCase`.

Suppose in our OAEP test, instead of creating the
private key from the PEM format in each test group
we created it through the information in ``other_data["privateKey"]``.

.. code-block:: json
    :force:

        "testGroups" : [
            {
                ...
                "privateKey" : {
                    "privateExponent" : ...,
                    "publicExponent" : "010001", // that is a hex string
                    "prime1" : ...,
                    "prime2" : ...,
                    ...
                },
                "privateKeyPem" : ...,
                ...
                "tests" : [ ... ]
            }
        ]

we might need to extract the values of
``"publicExponent"``, ``"prime1"``, and ``"prime2"``
in each test group.
But these are not top-level keys within the test group,
and so they will remain as hex strings.

.. testcode::

    # We will test for just the first group
    group = next(test_data.groups)

    priv_key_data = group.other_data["privateKey"]
    e = priv_key_data["publicExponent"]

    assert isinstance(e, str)
    print(e)

.. testoutput::

    010001

But because ``formats`` does contain information for the relevant
members of ``group.other_data["privateKey"]`` we can manually
call automatic conversion using :func:`deserialize_top_level`.
Note that this mutates the dictionary it is given.

.. testcode::

    group = next(test_data.groups)

    priv_key_data = group.other_data["privateKey"]
    
    wycheproof.deserialize_top_level(priv_key_data, test_data.formats)

    e = priv_key_data["publicExponent"]
    p = priv_key_data["prime1"]
    q = priv_key_data["prime2"]
    N = priv_key_data["modulus"]

    assert p * q == N

    assert isinstance(e, int)

    print(e)

.. testoutput::

    65537

Note again that schema loading and validation can fail


Gotchas
++++++++

Pretty much all of the many ways this can break are a consequence
of the fact that I have not found a way to make use of JSON schemata
the way I feel they should be able to be used.
When I started working on this module, I had assumed that there
would be a fairly straightforward way to make use of the JSON schema
loaded for each test file to reason about the loaded JSON within Python code.

Hard-coded data assumptions
---------------------------

The code here makes assumptions about things that will be common
to all of the wycheproof JSON test files. Similarly it makes assumptions
about what each test group within those files will have.
I have yet to write tests to see if those actually hold of each test file.
It is more likely that these assumptions will fail in with test vectors from the
older ``wycheproof/testvectors`` directory than in the (default) 
``wycheproof/testvectors``.

If you find that the assumptions fail for things in ``wycheproof/testvectors`` definitely let me know.

Data conversion
----------------

As discussed in :ref:`sec_wycheproof_data_conversion`
the automatic data conversion of hexadecimal strings
to :py:class:`bytes` or :py:class:`int`\s will miss things
that you will need to manually handle, perhaps with the help of
:func:`deserialize_top_level`.

Additionally there are currently (August 29, 2025)
52 test files in the wycheproof project that are missing schemas.
In these cases, no automatic conversion will be attempted
and :attr:`TestData.formats` will be empty.
:func:`TestData.schema_is_valid` can be used to check
if there was a problem during schema loading and validation.
:attr:`TestData.schema_file` can be used for further debugging.

It is also possible that it will attempt to convert things
it shouldn't or be mistaken about which conversion to use.
If you find that this occurs, please let me know.

Other data is just the left overs
----------------------------------

The dictionaries
:attr:`TestData.other_data`,
:attr:`TestGroup.other_data`,
and :attr:`TestCase.other_data`
exclude things for which those classes automatically offer as properties.
For example, the existence of :attr:`TestGroup.algorithm` means that
trying something like ``test_data.other_data["algorithm"]`` will result
in a :py:class:`KeyError`.

There are reasons for my choice here.
Perhaps not good reasons, but reasons none the less.