Note: In this section, all congruences are mod 2.
In many situations involving encryption, there is a trade-off between speed and security. If one wants a very high level of security, speed is often sacrificed, and vice versa. For example, in cable television, many bits of data are being transmitted, so speed of encryption is important. On the other hand, security is not usually as important since there is rarely an economic advantage to mounting an expensive attack on the system.
In this section, we describe a method that could be used when speed is more important than security. However, the real use is as one building block in more complex systems.
The sequence
can be described by giving the initial values
and the linear recurrence relation
This sequence repeats after 31 terms.
For another example, see Example 18 in the Computer Appendices.
More generally, consider a linear recurrence relation of length :
where the coefficients are integers. If we specify the initial values
then all subsequent values of can be computed using the recurrence. The resulting sequence of 0s and 1s can be used as the key for encryption. Namely, write the plaintext as a sequence of 0s and 1s, then add an appropriate number of bits of the key sequence to the plaintext mod 2, bit by bit. For example, if the plaintext is and the key sequence is the example given previously, we have
Decryption is accomplished by adding the key sequence to the ciphertext in exactly the same way.
One advantage of this method is that a key with large period can be generated using very little information. The long period gives an improvement over the Vigenère method, where a short period allowed us to find the key. In the above example, specifying the initial vector and the coefficients yielded a sequence of period 31, so 10 bits were used to produce 31 bits. It can be shown that the recurrence
and any nonzero initial vector will produce a sequence with period . Therefore, 62 bits produce more than two billion bits of key. This is a great advantage over a one-time pad, where the full two billion bits must be sent in advance.
This method can be implemented very easily in hardware using what is known as a linear feedback shift register (LFSR) and is very fast. In Figure 5.2 we depict an example of a linear feedback shift register in a simple case. More complicated recurrences are implemented using more registers and more XORs.
For each increment of a counter, the bit in each box is shifted to other boxes as indicated, with denoting the addition mod 2 of the incoming bits. The output, which is the bit , is added to the next bit of plaintext to produce the ciphertext. The diagram in Figure 5.2 represents the recurrence . Once the initial values are specified, the machine produces the subsequent bits very efficiently.
Unfortunately, the preceding encryption method succumbs easily to a known plaintext attack. More precisely, if we know only a few consecutive bits of plaintext, along with the corresponding bits of ciphertext, we can determine the recurrence relation and therefore compute all subsequent bits of the key. By subtracting (or adding; it’s all the same mod 2) the plaintext from the ciphertext mod 2, we obtain the bits of the key. Therefore, for the rest of this discussion, we will ignore the ciphertext and plaintext and assume we have discovered a portion of the key sequence. Our goal is to use this portion of the key to deduce the coefficients of the recurrence and consequently compute the rest of the key.
For example, suppose we know the initial segment of the sequence , which has period 15, and suppose we know it is generated by a linear recurrence. How do we determine the coefficients of the recurrence? We do not necessarily know even the length, so we start with length 2 (length 1 would produce a constant sequence). Suppose the recurrence is . Let and and use the known values . We obtain the equations
In matrix form, this is
The solution is , so we guess that the recurrence is . Unfortunately, this is not correct since Therefore, we try length 3. The resulting matrix equation is
The determinant of the matrix is 0 mod 2; in fact, the equation has no solution. We can see this because every column in the matrix sums to 0 mod 2, while the vector on the right does not.
Now consider length 4. The matrix equation is
The solution is . The resulting recurrence is now conjectured to be
A quick calculation shows that this generates the remaining bits of the piece of key that we already know, so it is our best guess for the recurrence that generates the key sequence.
What happens if we try length 5? The matrix equation is
The determinant of the matrix is 0 mod 2. Why? Notice that the last row is the sum of the first and second rows. This is a consequence of the recurrence relation: , etc. As in linear algebra with real or complex numbers, if one row of a matrix is a linear combination of other rows, then the determinant is 0.
Similarly, if we look at the matrix, we see that the 5th row is the sum of the first and second rows, and the 6th row is the sum of the second and third rows, so the determinant is 0 mod 2. In general, when the size of the matrix is larger than the length of the recurrence relation, the relation forces one row to be a linear combination of other rows, hence the determinant is 0 mod 2.
The general situation is as follows. To test for a recurrence of length , we assume we know . The matrix equation is
We show later that the matrix is invertible mod 2 if and only if there is no linear recurrence of length less than that is satisfied by .
A strategy for finding the coefficients of the recurrence is now clear. Suppose we know the first 100 bits of the key. For , form the matrix as before and compute its determinant. If several consecutive values of yield 0 determinants, stop. The last to yield a nonzero (i.e., 1 mod 2) determinant is probably the length of the recurrence. Solve the matrix equation to get the coefficients . It can then be checked whether the sequence that this recurrence generates matches the sequence of known bits of the key. If not, try larger values of .
Suppose we don’t know the first 100 bits, but rather some other 100 consecutive bits of the key. The same procedure applies, using these bits as the starting point. In fact, once we find the recurrence, we can also work backwards to find the bits preceding the starting point.
Here is an example. Suppose we have the following sequence of 100 bits:
The first 20 determinants, starting with , are
A reasonable guess is that gives the last nonzero determinant. When we solve the matrix equation for the coefficients we get
so we guess that the recurrence is
This recurrence generates all 100 terms of the original sequence, so we have the correct answer, at least based on the knowledge that we have.
Suppose that the 100 bits were in the middle of some sequence, and we want to know the preceding bits. For example, suppose the sequence starts with , so . Write the recurrence as
(it might appear that we made some sign errors, but recall that we are working mod 2, so and ). Letting yields
Continuing in this way, we successively determine .
For more examples, see Examples 19 and 20 in the Computer Appendices.
We now prove the result we promised.
Let be a sequence of bits produced by a linear recurrence mod 2. For each , let
Let be the length of the shortest recurrence that generates the sequence . Then and for all .
Proof. We first make a few remarks on the length of recurrences. A sequence could satisfy a length 3 relation such as . It would clearly then also satisfy shorter relations such as (at least for ). However, there are less obvious ways that a sequence could satisfy a recurrence of length less than expected. For example, consider the relation . Suppose the initial values of the sequence are . The recurrence allows us to compute subsequent terms: . It is easy to see that the sequence satisfies .
If there is a recurrence of length and if , then one row of the matrix is congruent mod 2 to a linear combination of other rows. For example, if the recurrence is , then the fourth row is the sum of the first and third rows. Therefore, for all .
Now suppose . Then there is a nonzero row vector such that . We’ll show that this gives a recurrence relation for the sequence and that the length of this relation is less than . This contradicts the assumption that is smallest. This contradiction implies that .
Let the recurrence of length be
For each , let
Then . The recurrence relation implies that
which is the last column of .
By the choice of , we have . Suppose that we know that for some . Then
Therefore, annihilates the last column of . Since the remaining columns of are columns of , we find that . By induction, we obtain for all .
Let . The first column of yields
Since is not the zero vector, for at least one . Let be the largest such that , which means that . We are working mod 2, so . Therefore, we can rearrange the relation to obtain
This is a recurrence of length . Since , and is assumed to be the shortest possible length, we have a contradiction. Therefore, the assumption that must be false, so . This completes the proof.
Finally, we make a few comments about the period of a sequence. Suppose the length of the recurrence is . Any consecutive terms of the sequence determine all future elements, and, by reversing the recurrence, all previous values, too. Clearly, if we have consecutive 0s, then all future values are 0. Also, all previous values are 0. Therefore, we exclude this case from consideration. There are strings of 0s and 1s of length in which at least one term is nonzero. Therefore, as soon as there are more than terms, some string of length must occur twice, so the sequence repeats. The period of the sequence is at most .
Associated to a recurrence , there is a polynomial
If is irreducible mod 2 (this means that it is not congruent to the product of two lower-degree polynomials), then it can be shown that the period divides . An interesting case is when is prime (these are called Mersenne primes). If the period isn’t 1, that is, if the sequence is not constant, then the period in this special case must be maximal, namely (see Section 3.11). The example where the period is is of this type.
Linear feedback shift register sequences have been studied extensively. For example, see [Golomb] or [van der Lubbe].
One way of thwarting the above attack is to use nonlinear recurrences, for example,
Moreover, a look-up table that takes inputs and outputs a bit could be used, or several LFSRs could be combined nonlinearly and some of these LFSRs could have irregular clocking. Generally, these systems are somewhat harder to break. However, we shall not discuss them here.