Danger
Nothing here should be used for any security purposes.
Birthday Paradox Computations¶
This module is imported with:
import toy_crypto.birthday
The classic Birthday problem example is to estimate the number of individuals (whose birthdays are uniformly distributed among 365 days of the year) for there to be at least a 0.5 probability of there being at least one pair of individuals who share the same birthday.
The function that returns a propbably is named P()
,
and the one that returns a quantile is named Q()
.
This follows the R conventions
[RCTeam24]
as reflected in R’s qbirthday
and pbirthday
functions.
from toy_crypto import birthday
computed_n = birthday.Q(0.5, 367)
print(computed_n)
23
Birthday computations are useful for computing collision probabilites. Suppose you had a hash function (truncated to ) returning 32 bit hashes and you wished to know the probability of a collision if you hashed ten thousand items.
from toy_crypto import birthday
from math import isclose
n = 10_000
c = 2 ** 32
p = birthday.P(n, c)
assert isclose(p, 0.011574013876)
The birthday
functions¶
- toy_crypto.birthday.EXACT_THRESHOLD = 1000¶
With auto mode, the threshold for using exact or approximate modes.
- toy_crypto.birthday.MAX_QBIRTHDAY_P = 0.99999999¶
Maximum probability that Q can handle.
- toy_crypto.birthday.P(n: int, classes: int = 365, coincident: int = 2, mode: str = 'auto') Prob [source]¶
probability of at least 1 collision among n individuals for c classes”.
The “exact” method still involves floating point approximations and may be very slow for large n.
- toy_crypto.birthday.Q(prob: float = 0.5, classes: int = 365, coincident: int = 2) int [source]¶
Returns minimum number n to get a probability of p for c classes.
- Raises:
ValueError – if
prob
is less than 0 or greater than 1.ValueError – if
classes
is less than 1.ValueError – if
coincident
is less than 1.
- Parameters:
- Return type:
Implementation ported from R¶
It was not a simple matter for me to find or construct an algorithm that produces reasonable approximations in a reasonably efficient way for the ranges of numbers I wished to consider. Eventually I found the solution used by R Core Team [2024] in R’s birthday.R source, which credits Diaconis and Mosteller [1989].
My port of that implementation to Python only changes the conditions underwhich the approximate method is used along with some internal variable naming and commenting to help me better understand the algorithm and its implementation.