Chi-Squared Test: Statistical Validation of RNG Fairness

Pearson's chi-squared (χ²) test is the foundational instrument for verifying whether observed frequencies conform to expected values under a specified theoretical distribution. The test statistic is computed as χ² = Σᵢ(Oᵢ − Eᵢ)² / Eᵢ, where Oᵢ is the observed frequency in the i-th category and Eᵢ is the expected frequency under the null hypothesis. For evaluating pseudo-random number generators, the null hypothesis H₀ assumes a uniform distribution of output values across all possible categories. Asymptotically, as n → ∞ and provided that Eᵢ ≥ 5 for all categories, this statistic follows a χ² distribution with (k − 1) degrees of freedom, where k is the number of categories.

The number of degrees of freedom (df) determines the shape of the χ² distribution and the critical values for decision-making. When testing a k-category generator, df = k − 1, as one degree of freedom is consumed by the constraint on total frequency sum. The critical value χ²_α at significance level α = 0.05 with df = 255 (typical for an 8-bit generator with 256 possible outcomes) is 293.25. If the observed statistic exceeds this threshold, H₀ is rejected, concluding that the generator's distribution is non-uniform. Test power increases with growing sample size n and the magnitude of actual departure from uniformity, which is formalized through the non-centrality parameter δ = n · Σᵢ(pᵢ − 1/k)²/(1/k).

Correct interpretation of χ² test results requires accounting for the multiple testing effect and sample size considerations. When the test is applied sequentially to thousands of generated data blocks, the Bonferroni correction or the Benjamini–Hochberg false discovery rate (FDR) control method must be employed to prevent Type I error inflation. Excessively large samples create the inverse problem: the test becomes hypersensitive, rejecting H₀ for statistically significant but practically negligible deviations. To control this effect, Cramér's V effect size measure is calculated as V = √(χ²/(n·(k−1))), enabling the distinction between substantive deviations and statistical artifacts.

In the practice of cryptographic generator auditing, the χ² test is deployed as part of statistical test batteries alongside runs tests, gap tests, and spectral tests. An automated PRNG validation pipeline executes the χ² test at multiple aggregation levels: byte-level, block-level (blocks of 16, 32, and 64 bits), and at the level of composite patterns. A combined p-value computed via Fisher's method for combining independent tests provides the final generator quality metric. A p-value below 10⁻⁶ unambiguously indicates systematic non-uniformity and triggers immediate server seed rotation with full entropy pool reinitialization. This protocol ensures that output sequences conform to NIST SP 800-22 standards and guarantees the cryptographic strength of the audited generator.

Verify Theoretical Frameworks