Download as pdf or txt
Download as pdf or txt
You are on page 1of 5

See discussions, stats, and author profiles for this publication at: https://www.researchgate.

net/publication/288984210

NIST Statistical Test Suite – result interpretation and optimization

Conference Paper · December 2015

CITATIONS READS

3 3,480

3 authors:

Marek Sýs Zdenek Riha


Masaryk University Masaryk University
17 PUBLICATIONS   195 CITATIONS    27 PUBLICATIONS   396 CITATIONS   

SEE PROFILE SEE PROFILE

Vashek Matyas
Masaryk University
146 PUBLICATIONS   1,238 CITATIONS   

SEE PROFILE

Some of the authors of this publication are also working on these related projects:

PIN shouldersurfing View project

Investigating cryptographic keys View project

All content following this page was uploaded by Zdenek Riha on 02 January 2016.

The user has requested enhancement of the downloaded file.


NIST Statistical Test Suite – result interpretation and
optimization
Marek Sýs, Zdeněk Řı́ha, Vashek Matyáš

{syso, zriha, matyas}@fi.muni.cz

Masaryk University, Brno, Czech Republic

Abstract
In the cryptography, randomness is tested using battery of tests consisting of many tests of randomness
each focusing on different feature. Probability that data produced by an good generator pass all the tests
is small for a large number of used tests. Therefore results of many tests should be interpreted with the
focus on this issue. The Šidák correction is a statistical method that can be used for evaluating multiple
but independent tests. We analyzed accuracy of the Šidák correction since tests of randomness are usually
correlated. We analyzed accuracy of the Šidák correction for NIST Statistical Test Suite. Results show
that correlation of tests of randomness has small influence on accuracy of the Šidák correction. We also
provide a speed-optimized version of NIST STS, which achieved test results even more than 30-times
faster.

Keywords: NIST STS, randomness analysis, Šidák correction.

1 Introduction
Randomness plays an important role in many areas of cryptography. Generating random numbers is a
difficult task and so is the quality evaluation of the generated data. In practice randomness assessment
relies heavily on empirical tests of randomness. Each test examines the randomness quality of data
from a specific point of view, testing certain statistical features, such as the frequency of ones or m-bit
blocks in the data, etc. Tests are usually grouped into testing suites (also called batteries) to provide a
more comprehensive randomness analysis. There are three commonly used testing suites for randomness
analysis: NIST Statististical Test Suite, Dieharder (a novel version of the Diehard battery) and TestU01.
The NIST STS has a special importance since it was published as a NIST standard (also used for selecting
AES) and it is used for the preparation of many formal certifications or approvals.

2 NIST Statistical Test Suite


The NIST STS battery consists of 15 empirical tests specially designed to analyse binary sequences
(bitstreams). The tests examine randomness of data according to various statistics of bits or statistics of
blocks of bits. All NIST STS tests examine randomness for the whole bitstream. Several tests are also
able to detect local non-randomness and these tests divide the bitstream into several typically large parts
and they compute a characteristic of bits for each part. All these partial characteristics are then used for
the computation of the test statistic. Each NIST STS test is defined by the test statistic of one of the
following three types and examines randomness of the sequence according to:

1. bits – these tests analyse various characteristics of bits like proportion of bits, frequency of bit change
(runs) and cumulative sums,
2. m-bit blocks – these tests analyse distribution of m-bit blocks (m is typically smaller than 30 bits)
within the sequence or its parts,
3. M -bit parts – these tests analyse complex property of M -bit (M is typically larger than 1000 bits)
parts of the sequence like rank of the sequence viewed as a matrix, spectrum of the sequence or
linear complexity of the bitstream.

All tests are parametrised by n which denotes the bitlength of a binary sequence to be tested. Several
tests are also parametrised by the second parameter denoted by m or M . Since the reference distributions
of NIST STS test statistics are approximated by asymptotic distributions (χ2 or normal), the tests give

14 Mikulášská kryptobesídka / SantaCrypt 2015


accurate results (p-values) only for certain values of their parameters. Table 1 summarizes appropriate
values of the parameters for each particular test recommended by NIST [1].
Test # Test name n m or M # sub-tests
1. Frequency n ≥ 100 - 1
2. Frequency within a Block n ≥ 100 20 ≤ M ≤ n/100 1
3. Runs n ≥ 100 - 1
4. Longest run of ones n ≥ 128 1
5. Rank n > 38 912 - 1
6. Spectral n ≥ 1000 - 1
7. Non-overlapping T. M. n ≥ 8m − 8 2 ≤ m ≤ 21 148∗
8. Overlapping T.M. n ≥ 106 1
9. Maurer’s Universal n > 387 840 1
10. Linear complexity n ≥ 106 500 ≤ M ≤ 5000 1
11. Serial 2 < m < blog2 nc − 2 2
12. Approximate Entropy m < blog2 nc − 5 1
13. Cumulative sums n ≥ 100 2
14. Random Excursions n ≥ 106 8
15. Random Excursions Variant n ≥ 106 18

Table 1: The recommended size n of the bitstream for each particular test. Some tests are parameterised
by a second parameter m, M, respectively. The table shows meaningful settings for the second parameter
and the number of sub-tests executed by each particular test.
Several of the NIST STS tests are performed in more variants, i.e., they execute several sub-tests and
examine more properties of the sequence of the same type. For instance, the Cumulative sum test
examines a sequence according to forward and backward cumulative sum. Table 1 also summarizes the
number of sub-tests performed by each particular test. The Non-overlapping template matching test is
marked by an asterisk since the number of its sub-tests is not fixed and depends on the value chosen
for the parameter m (the number 148 mentioned in the Table 1 corresponds to the default value of the
parameter m = 9).

3 Optimizations
To assess the quality of a generator, a large amount of data has to be tested by a battery. It takes
almost 50 minutes to analyze the randomness of 1 GB using NIST STS (default setting) on a standard
computer. With regards to test efficiency, NIST STS is troublesome. The main problem of NIST STS is
that it transforms data into a byte array, where each byte stores a single bit of the data. Using this data
representation, NIST STS works well on little-endian and big-endian systems, but the cost of universality
is the bad performance of the battery. We changed the data representation and re-implemented all tests
of the battery [3]. Moreover, standard data representation allows us to use other approaches to speed up
the NIST STS tests. Improvements can be divided into three classes corresponding to the type of tests
(tests computing statistic of bits, m-bit blocks, M -bit large parts):

• Simple and fastest tests that compute statistic of single bits are optimized by the Look Up Tables
(LUTs). Tests use LUTs that consist of precomputed values for all 8 blocks indexed by a block
interpreted as an integer value.
• Simple but slower tests computing statistic of m-bit blocks of the bitstream are optimized by a
new and fast function get nth block that can extract arbitrary m-bit block (m ≤ 25 ) from a given
bitstream (byte array). Function get nth block is fast and it is able to return all m-bit blocks from a
100 MB bitstream within a second on a standard modern computer. All tests of this class (except
Universal) compute statistic of m-bit blocks that can be computed from a single histogram of
overlapping m-bits of the bitstream. Using function get nth block we are able to compute histogram
of all m-bit blocks within a seconds.

Remark 1. Although serial test computes statistic of m, m + 1, m + 2-bits blocks it is sufficient to


compute single histogram of m + 2-bit blocks since frequencies of smaller blocks (m-bit,m + 1-bit)
blocks can be simply computed using frequencies of larger blocks (m + 2-bit blocks). For instance

Mikulášská kryptobesídka / SantaCrypt 2015 15


frequency of 2-bit block ”11” can be computed as sum of frequencies of 3-bit blocks of ”111” and
”110”.

• The third class consists of complex and very slow tests. We modified tests in this class in such a
way that word-word operations instead of original bit-bit operations can be used. This optimization
sped up the well-known Berlekamp-Massey algorithm in the Linear complexity test by 64x. In order
to speed up Spectral (based on Fast Fourier Transformation) we incorporated FFTW library into
the NIST STS battery. FFTW is slightly worse than original FFT if n (size of the bitstream) is of
the form n = 2k . With a rising number of large factors of n FFTW is more and more better. For
very large factors of n FFTW is slow but original FFT is not able return some result within hours.

For a more detailed description of tests from the first, look into the source code available at [2] or see
[3].The speed improvements are summarized in the Table . Table shows that our optimized implemen-
tation [2] is about 30x faster than original NIST STS (default parameters) and 1GB of data can tested
now within minutes on a standard computer.
Original New Speedup Our
Test m, M (ms) (ms) vs. NIST
Frequency (Monobit) 203 15 13.5
Frequency within a Block 128 94 31 3.0
Runs 1140 31 36.8
Longest run of ones in a block 656 31 21.2
Binary Matrix Rank 3781 297 12.7
Spectral 24625 25062 0.98
Non-overlapping Template 9 139641 343 407.1
Overlapping Template 9 1359 406 3.3
Maurer’s Universal 2843 156 18.2
Linear complexity 5000 1187453 18421 64.5
Serial 9 24078 313 76.9
Approximate Entropy 8 16484 312 52.8
Cumulative sums 984 31 31.7
Random Excursions 562 515 1.1
Random Excursions Variant 2125 515 4.3
Total 1406028 46464 30.3

Table 2: Run times (for 20 MB of data) of the original implementation NIST STS and our new imple-
mentation [2] for default parameters.

4 Results and interpretation


Empirical tests of randomness are based on statistical hypothesis testing. Each test compares certain
characteristics of data (frequency of ones, frequency of m-bit blocks, etc.) with the expected test statistic
(0.5, 2−m , etc.) that is precomputed for random infinite sequences. In this context randomness is a
probabilistic property and it can be characterized and described in terms of probability. Results of
statistical tests of randomness are typically in the form of a p-value which represents the probability that
a perfect random number generator (RNG) would produce less random sequences than the sequence being
tested. A small p-value (e.g., 0.01) means that data are too extreme because we will get comparable or
worse results with a small probability (1%) for the true hypothesis (e.g., data are random). To evaluate
a test, the p-value is compared with the significance level α that is chosen by a tester (for cryptography,
α is usually set to α = 0.01). If the p-value is smaller/bigger than α, the hypothesis is rejected/accepted
according to given test. Although the p-value of a randomness test focusing on a single characteristic
has a clear statistical interpretation, the interpretation of the results of testing suites (including multiple
tests) is problematic since with a rising number of tests, even a good RNG is more likely to fail some
tests. The probability that even the sequence produced by a perfect RNG passes all tests of the test suite
is small (15% for NIST STS, 78% for Diehard, 34% for TestU01, for α = 0.01). The Šidák correction
is a statistical method that can be used for evaluating multiple and independent tests but the tests of
randomness are sometimes mutually dependent (correlated). We analysed 819200 sequences (100 GB
of data) produced by a physical source of randomness (quantum random number generator) in order

16 Mikulášská kryptobesídka / SantaCrypt 2015


to analyse interdependency of the NIST STS tests. The analysis shows that some test are significanty
dependent and a sequence produced by a perfect RNG fails usually less tests than expected (sequence
pass NIST STS battery with the probability 20%). In the work [5], we analyzed accuracy of the Šidák
correction in the context of dependent NIST STS tests. Result of the analysis shows that denendency of
tests has small influence on accuracy of the Šidák correction and this method can be used for evaluation
of multiple NIST STS tests with the significance level α > 0.001.

References
[1] A. Rukhin, J. Soto, J. Nechvatal, M. Smid, E. Barker, S. Leigh, M. Levenson, M. Vangel, D. Banks, A.
Heckert, J. Dray, S. Vo: A Statistical Test Suite for the Validation of Random Number Generators and
Pseudo Random Number Generators for Cryptographic Applications, Version STS-2.1, NIST Special
Publication 800-22rev1a, April, 2010.
[2] M. Sýs, Z. Řı́ha: Faster Randomness Testing with the NIST Statistical Test Suite, Security, Privacy,
and Applied Cryptography Engineering, LNCS 8804, pp 272-284, 2014.
[3] M. Sýs, Z. Řı́ha: Optimised implementation of NIST STS. 2014, https://github.com/sysox/NIST-
STS-optimised.
[4] M. Sýs, Z. Řı́ha, V. Matyáš, K. Márto, A. Suciu: On the Interpretation of Results from the NIST
Statistical Test Suite, Romanian Journal of Information Science and Technology, Manuscript in prepa-
ration.
[5] M. Sýs, V. Matyáš: Randomness testing: result interpretation and speed, LNCS, Manuscript in
preparation.

Mikulášská kryptobesídka / SantaCrypt 2015 17

View publication stats

You might also like