Correlation attacks are a class of cryptographic known-plaintext attacks for breaking stream ciphers whose keystreams are generated by combining the output of several linear-feedback shift registers (LFSRs) using a Boolean function. Correlation attacks exploit a statistical weakness that arises from the specific Boolean function chosen for the keystream. While some Boolean functions are vulnerable to correlation attacks, stream ciphers generated using such functions are not inherently insecure.
Correlation attacks become possible when a significant correlation exists between the output state of an individual LFSR in the keystream generator and the output of the Boolean function that combines the output states of all the LFSRs. These attacks are employed in combination with partial knowledge of the keystream, which is derived from partial knowledge of the plaintext. The two are then compared using an XOR logic gate. This vulnerability allows an attacker to brute-force the key for the individual LFSR and the rest of the system separately. For instance, in a keystream generator where four 8-bit LFSRs are combined to produce the keystream, and if one of the registers is correlated to the Boolean function output, it becomes possible to brute force it first, followed by the remaining three LFSRs. As a result, the total attack complexity becomes 28 + 224.
Compared to the cost of launching a brute-force attack on the entire system, with complexity 232, this represents an attack effort saving factor of just under 256. If a second register is correlated with the function, the process may be repeated and decrease the attack complexity down to 28 + 28 + 216 for an effort saving factor of just under 65028.
One example is the Geffe generator, which consists of three LFSRs: LFSR-1, LFSR-2, and LFSR-3. Let these registers be denoted as:
x1
x2
x3
F(x1,x2,x3)=(x1\wedgex2) ⊕ (\negx1\wedgex3)
x1
x2
x1
x3
x1 | x2 | x3 | F(x1,x2,x3) | |
---|---|---|---|---|
0 | 0 | 0 | 0 | |
0 | 0 | 1 | 1 | |
0 | 1 | 0 | 0 | |
0 | 1 | 1 | 1 | |
1 | 0 | 0 | 0 | |
1 | 0 | 1 | 0 | |
1 | 1 | 0 | 1 | |
1 | 1 | 1 | 1 |
Consider the output of the third register,
x3
x3
F(x1,x2,x3)
x3=F(x1,x2,x3)
An interception can be made on the cipher text
c1,c2,c3,\ldots,cn
p1,p2,p3,\ldots
ci=pi ⊕ F(x1i,x2i,x3i)
i=1,2,3,\ldots,n
x1i
i
p1,p2,p3,\ldots,p32
c1,c2,c3,\ldots,c32
p1,p2,p3,\ldots,p32
F(x1i,x2i,x3i)
i=1,2,3,\ldots,32
This enables a brute-force search of the space of possible keys (initial values) for LFSR-3 (assuming we know the tapped bits of LFSR-3, an assumption which is in line with Kerckhoffs' principle). For any given key in the key space, we may quickly generate the first 32 bits of LFSR-3's output and compare these to our recovered 32 bits of the entire generator's output. Because we have established earlier that there is a 75% correlation between the output of LFSR-3 and the generator, we know we have correctly guessed the key for LFSR-3 if approximately 24 of the first 32 bits of LFSR-3 output will match up with the corresponding bits of generator output. If we have guessed incorrectly, we should expect roughly half, or 16, of the first 32 bits of these two sequences to match. Thus we may recover the key for LFSR-3 independently of the keys of LFSR-1 and LFSR-2. At this stage we have reduced the problem of brute forcing a system of 3 LFSRs to the problem of brute forcing a single LFSR and then a system of 2 LFSRs. The amount of effort saved here depends on the length of the LFSRs. For realistic values, it is a very substantial saving and can make brute-force attacks very practical.
Observe in the table above that
x2
x2
Note from the table above that
x1
F(x1,x2,x3)
While the above example illustrates well the relatively simple concepts behind correlation attacks, it perhaps simplifies the explanation of precisely how the brute forcing of individual LFSRs proceeds. Incorrectly guessed keys will generate LFSR output that agrees with the generator output roughly 50% of the time because, given two random bit sequences of a given length, the probability of agreement between the sequences at any particular bit is 0.5. However, specific individual incorrect keys may well generate LFSR output that agrees with the generator output more or less often than exactly 50% of the time. This is particularly salient in the case of LFSRs whose correlation with the generator is not especially strong; for small enough correlations, it is certainly not outside the realm of possibility that an incorrectly guessed key will also lead to LFSR output that agrees with the desired number of bits of the generator output. Thus, it may not be possible to identify the unique key to that LFSR. It may be possible to identify a number of potential keys, however, which is still a significant breach of the cipher's security. Moreover, given a megabyte of known plain text, the situation would be substantially different. An incorrect key may generate LFSR output that agrees with more than 512 kilobytes of the generator output but is not likely to generate output that agrees with as much as 768 kilobytes of the generator output as a correctly guessed key would. As a rule, the weaker the correlation between an individual register and the generator output, the more known plain text is required to find that register's key with a high degree of confidence. Estimates of the length of known plain text required for a given correlation can be calculated using the binomial distribution.
The correlations which were exploited in the example attack on the Geffe generator are examples of what are called first order correlations: they are correlations between the value of the generator output and an individual LFSR. It is possible to define higher-order correlations in addition to these. For instance, it may be possible that while a given Boolean function has no strong correlations with any of the individual registers it combines, a significant correlation may exist between some Boolean function of two of the registers, e.g.,
x1 ⊕ x2
Higher-order correlation attacks can be more powerful than single-order correlation attacks, however, this effect is subject to a "law of limiting returns". The table below shows a measure of the computational cost for various attacks on a keystream generator consisting of eight 8-bit LFSRs combined by a single Boolean function. Understanding the calculation of cost is relatively straightforward: the leftmost term of the sum represents the size of the key space for the correlated generators, and the rightmost term represents the size of the key space for the remaining generators.
Attack | Effort (size of keyspace) | |
---|---|---|
Brute force | 28=18446744073709551616 | |
Single 1st order correlation attack | 28+27=72057594037928192 | |
Single 2nd order correlation attack | 22+26=281474976776192 | |
Single 3rd order correlation attack | 23+25=1099528404992 | |
Single 4th order correlation attack | 24+24=8589934592 | |
Single 5th order correlation attack | 25+23=1099528404992 | |
Single 6th order correlation attack | 26+22=281474976776192 | |
Single 7th order correlation attack | 27+28=72057594037928192 |
While higher-order correlations lead to more powerful attacks, they are also more difficult to find, as the space of available Boolean functions to correlate against the generator output increases as the number of arguments to the function does.
A Boolean function
F(x1,\ldots,xn)
Siegenthaler showed that the correlation immunity of a Boolean function of algebraic degree of variables satisfies
m+d\leqn
m\leqn-1
It follows that it is impossible for a function of variables to be th order correlation immune. This also follows from the fact that any such function can be written using a Reed-Muller basis as a combination of XORs of the input functions.
Given the probable extreme severity of a correlation attack's impact on a stream cipher's security, it should be essential to test a candidate Boolean combination function for correlation immunity before deciding to use it in a stream cipher. However, it is important to note that high correlation immunity is a necessary, but not sufficient condition for a Boolean function to be appropriate for use in a keystream generator. There are other issues to consider, for example, whether or not the function is balanced - whether it outputs as many or roughly as many 1's as it does 0's when all possible inputs are considered.
Research has been conducted into methods for easily generating Boolean functions of a given size which are guaranteed to have at least some particular order of correlation immunity. This research has uncovered links between correlation immune Boolean functions and error correcting codes.[2]