In practice, the RSA algorithm has proven to be effective, as long as it is implemented correctly. We give a few possible implementation mistakes in the Exercises. Here are a few other potential difficulties. For more about attacks on RSA, see [Boneh].
Let have digits. If we know the first , or the last , digits of , we can efficiently factor .
In other words, if and have 300 digits, and we know the first 150 digits, or the last 150 digits, of , then we can factor . Therefore, if we choose a random starting point to choose our prime , the method should be such that a large amount of is not predictable. For example, suppose we take a random 150-digit number and test numbers of the form , , for primality until we find a prime (which should happen for ). An attacker who knows that this method is used will know 147 of the last 150 digits (they will all be 0 except for the last three or four digits). Trying the method of the theorem for the various values of will eventually lead to the factorization of .
For details of the preceding result, see [Coppersmith2]. A related result is the following.
Suppose is an RSA public key and has digits. Let be the decryption exponent. If we have at least the last digits of , we can efficiently find in time that is linear in .
This means that the time to find is bounded as a function linear in . If is small, it is therefore quite fast to find when we know a large part of . If is large, perhaps around , the theorem is no better than a case-by-case search for . For details, see [Boneh et al.].
Low encryption or decryption exponents are tempting because they speed up encryption or decryption. However, there are certain dangers that must be avoided. One pitfall of using is given in Computer Problem 14. Another difficulty is discussed in Chapter 23 (Lattice Methods). These problems can be avoided by using a somewhat higher exponent. One popular choice is . This is prime, so it is likely that it is relatively prime to . Since it is one more than a power of 2, exponentiation to this power can be done quickly: To calculate , square sixteen times, then multiply the result by .
The decryption exponent should of course be chosen large enough that brute force will not find it. However, even more care is needed, as the following result shows. One way to obtain desired properties of is to choose first, then find with .
Suppose Bob wants to be able to decrypt messages quickly, so he chooses a small value of . The following theorem of M. Wiener [Wiener] shows that often Eve can then find easily. In practice, if the inequalities in the hypotheses of the proposition are weakened, then Eve can still use the method to obtain in many cases. Therefore, it is recommended that be chosen fairly large.
Suppose are primes with . Let and let satisfy . If , then can be calculated quickly (that is, in time polynomial in ).
Proof. Since , we have . Therefore, since ,
Write for some integer . Since , we have
so . Therefore,
Also, since , we have . Dividing by yields
since by assumption.
We now need a result about continued fractions. Recall from Section 3.12 that if is a positive real number and and are positive integers with
then arises from the continued fraction expansion of . Therefore, in our case, arises from the continued fraction expansion of . Therefore, Eve does the following:
Computes the continued fraction of . After each step, she obtains a fraction .
Eve uses and to compute . (Since , this value of is a candidate for .)
If is not an integer, she proceeds to the next step of the continued fraction.
If is an integer, then she finds the roots of . (Note that this is possibly the equation from earlier.) If and are integers, then Eve has factored . If not, then Eve proceeds to the next step of the continued fraction algorithm.
Since the number of steps in the continued fraction expansion of is at most a constant times , and since the continued fraction algorithm stops when the fraction is reached, the algorithm terminates quickly. Therefore, Eve finds the factorization of quickly.
Recall that the rational approximations to a number arising from the continued fraction algorithm are alternately larger than and smaller than . Since , we only need to consider every second fraction arising from the continued fraction.
What happens if Eve reaches without finding the factorization of ? This means that the hypotheses of the proposition are not satisfied. However, it is possible that sometimes the method will yield the factorization of even when the hypotheses fail.
Let and . The continued fraction of is
The first fraction is , so we try . Since must be odd, we discard this possibility.
By the remark, we may jump to the third fraction:
Again, we discard this since must be odd.
The fifth fraction is . This gives , which is not an integer.
The seventh fraction is This gives as the candidate for . The roots of
are and , to several decimal places of accuracy. Since
we have factored .
A common use of RSA is to transmit keys for use in DES, AES, or other symmetric cryptosystems. However, a naive implementation could lead to a loss of security. Suppose a 56-bit DES key is written as a number . This is encrypted with RSA to obtain . Although is small, the ciphertext is probably a number of the same size as , so perhaps around 200 digits. However, Eve attacks the system as follows. She makes two lists:
for all with .
for all with .
She looks for a match between an element on the first list and an element on the second list. If she finds one, then she has for some . This yields
so . Is this attack likely to succeed? Suppose is the product of two integers and , both less than . Then these will yield a match for Eve. Not every will have this property, but many values of are the product of two integers less than . For these, Eve will obtain .
This attack is much more efficient than trying all possibilities for , which is nearly impossible on one computer, and would take a long time even with several computers working in parallel. In the present attack, Eve needs to compute and store a list of length , then compute the elements on the other list and check each one against the first list. Therefore, Eve performs approximately computations (and compares with the list up to times). This is easily possible on a single computer. For more on this attack, see [Boneh-Joux-Nguyen].
It is easy to prevent this attack. Instead of using a small value of , adjoin some random digits to the beginning and end of so as to form a much longer plaintext. When Bob decrypts the ciphertext, he simply removes these random digits and obtains .
A more sophisticated method of preprocessing the plaintext, namely Optimal Asymmetric Encryption Padding (OAEP), was introduced by Bellare and Rogaway [Bellare-Rogaway2] in 1994. Suppose Alice wants to send a message to Bob, whose RSA public key is , where has bits. Two positive integers and are specified in advance, with . Alice’s message is allowed to have bits. Typical values are , , . Let be a function that inputs strings of bits and outputs strings of bits. Let be a function that inputs bits and outputs bits. The functions and are usually constructed from hash functions (see Chapter 11 for a discussion of hash functions). To encrypt , Alice first expands it to length by adjoining zero bits. The result is denoted . She then chooses a random string of bits and computes
If the concatenation is a binary number larger than , Alice chooses a new random number and computes new values for and . As soon as she obtains (this has a probability of at least 1/2 of happening for each , as long as produces fairly random outputs), she forms the ciphertext
To decrypt a ciphertext , Bob uses his private RSA decryption exponent to compute . The result is written in the form
where has bits and has bits. Bob then computes
The correctness of this decryption can be justified as follows. If the ciphertext is the encryption of , then
Therefore,
and
Bob removes the zero bits from the end of and obtains . Also, Bob has check on the integrity of the ciphertext. If there are not zeros at the end, then the ciphertext does not correspond to a valid encryption.
This method is sometimes called plaintext-aware encryption. Note that the padding with depends on the message and on the random parameter . This makes chosen ciphertext attacks on the system more difficult. It also is used for ciphertext indistinguishability. See Section 4.5.
For discussion of the security of OAEP, see [Shoup].
Another type of attack on RSA and similar systems was discovered by Paul Kocher in 1995, while he was an undergraduate at Stanford. He showed that it is possible to discover the decryption exponent by carefully timing the computation times for a series of decryptions. Though there are ways to thwart the attack, this development was unsettling. There had been a general feeling of security since the mathematics was well understood. Kocher’s attack demonstrated that a system could still have unexpected weaknesses.
Here is how the timing attack works. Suppose Eve is able to observe Bob decrypt several ciphertexts . She times how long this takes for each . Knowing each and the time required for it to be decrypted will allow her to find the decryption exponent . But first, how could Eve obtain such information? There are several situations where encrypted messages are sent to Bob and his computer automatically decrypts and responds. Measuring the response times suffices for the present purposes.
We need to assume that we know the hardware being used to calculate . We can use this information to calculate the computation times for various steps that potentially occur in the process.
Let’s assume that is computed by an algorithm given in Exercise 56 in Chapter 3, which is as follows:
Let be written in binary (for example, when , we have ). Let and be integers. Perform the following procedure:
Start with and .
If , let . If , let .
Let .
If , stop. If , add 1 to and go to (2).
Then .
Note that the multiplication occurs only when the bit . In many situations, there is a reasonably large variation in how long this multiplication takes. We assume this is the case here.
Before we continue, we need a few facts from probability. Suppose we have a random process that produces real numbers as outputs. For us, will be the time it takes for the computer to complete a calculation, given a random input . The mean is the average value of these outputs. If we record outputs , the mean should be approximately . The variance for the random process is approximated by
The standard deviation is the square root of the variance and gives a measure of how much variation there is in the values of the ’s.
The important fact we need is that when two random processes are independent, the variance for the sum of their outputs is the sum of the variances of the two processes. For example, we will break the computation done by the computer into two independent processes, which will take times and . The total time will be . Therefore, should be approximately .
Now assume Eve knows ciphertexts and the times that it took to compute each . Suppose she knows bits of the exponent . Since she knows the hardware being used, she knows how much time was used in calculating in the preceding algorithm. Therefore, she knows, for each , the time that it takes to compute .
Eve wants to determine . If , a multiplication will take place for each ciphertext that is processed. If , there is no such multiplication.
Let be the amount of time it takes the computer to perform the multiplication , though Eve does not yet know whether this multiplication actually occurs. Let . Eve computes and . If , then Eve concludes that . If not, . After determining , she proceeds in the same manner to find all the bits.
Why does this work? If the multiplication occurs, is the amount of time it takes the computer to complete the calculation after the multiplication. It is reasonable to assume and are outputs that are independent of each other. Therefore,
If the multiplication does not occur, is the amount of time for an operation unrelated to the computation, so it is reasonable to assume and are independent. Therefore,
Note that we couldn’t use the mean in place of the variance, since the mean of would be negative, so the last inequality would not hold. All that can be deduced from the mean is the total number of nonzero bits in the binary expansion of .
The preceding gives a fairly simple version of the method. In practice, various modifications would be needed, depending on the specific situation. But the general strategy remains the same. For more details, see [Kocher]. For more on timing attacks, see [Crosby et al.].
A similar attack on RSA works by measuring the power consumed during the computations. See [Kocher et al.]. Another method, called acoustic cryptanalysis, obtains information from the high-pitched noises emitted by the electronic components of a computer during its computations. See [Genkin et al.]. Attacks such as these and the timing attack can be prevented by appropriate design features in the physical implementation.
Timing attacks, power analysis, and acoustic cryptanalysis are examples of side-channel attacks, where the attack is on the implementation rather than on the basic cryptographic algorithm.