12.2 Multicollisions

In this section, we show that the iterative nature of hash algorithms based on the Merkle-Damgård construction makes them less resistant than expected to finding multicollisions, namely inputs x1, , xn all with the same hash value. This was pointed out by Joux [Joux], who also gave implications for properties of concatenated hash functions, which we discuss below.

Suppose there are r people and there are N possible birthdays. It can be shown that if rN(k1)/k, then there is a good chance of at least k people having the same birthday. In other words, we expect a k-collision. If the output of a hash function is random, then we expect that this estimate holds for k-collisions of hash function values. Namely, if a hash function has n-bit outputs, hence N=2n possible values, and if we calculate r=2n(k1)/k values of the hash function, we expect a k-collision. However, in the following, we’ll show that often we can obtain collisions much more easily.

In many hash functions, for example, SHA-256, there is a compression function f that operates on inputs of a fixed length. Also, there is a fixed initial value IV. The message is padded to obtain the desired format, then the following steps are performed:

  1. Split the message M into blocks M1, M2, , M.

  2. Let H0 be the initial value IV.

  3. For i=1, 2, , , let Hi=f(Hi1, Mi).

  4. Let H(M)=H.

In SHA-256, the compression function is described in Section 11.4. For each iteration, it takes a 256-bit input from the preceding iteration along with a message block Mi of length 512 and outputs a new string of length 256.

Suppose the output of the function f, and therefore also of the hash function H, has n bits. A birthday attack can find, in approximately 2n/2 steps, two blocks m0 and m0 such that f(H0, m0)=f(H0, m0). Let h1=f(H0, m0). A second birthday attack finds blocks m1 and m1 with f(h1, m1)=f(h1, m1). Continuing in this manner, we let

hi=f(hi1, mi1)

and use a birthday attack to find mi and mi with

f(hi, mi)=f(hi, mi).

This process is continued until we have t pairs of blocks m0, m0, m1, m1, , mt1, mt1, where t is some integer to be determined later.

We claim that each of the 2t messages

m0m1mt1m0m1mt1m0m1mt1m0m1mt1 m0m1mt1m0m1mt1m0m1mt1

(all possible combinations with mi and mi) has the same hash value. This is because of the iterative nature of the hash algorithm. At each calculation hi=f(m, hi1), the same value hi is obtained whether m=mi1 or m=mi1. Therefore, the output of the function f during each step of the hash algorithm is independent of whether an mi1 or an mi1 is used. Therefore, the final output of the hash algorithm is the same for all messages. We thus have a 2t-collision.

This procedure takes approximately t2n/2 steps and has an expected running time of approximately a constant times tn2n/2 (see Exercise 13). Let t=2, for example. Then it takes only around twice as long to find four messages with same hash value as it took to find two messages with the same hash. If the output of the hash function were truly random, rather than produced, for example, by an iterative algorithm, then the above procedure would not work. The expected time to find four messages with the same hash would then be approximately 23n/4, which is much longer than the time it takes to find two colliding messages. Therefore, it is easier to find multicollisions with an iterative hash algorithm.

An interesting consequence of the preceding discussion relates to attempts to improve hash functions by concatenating their outputs. Suppose we have two hash functions H1 and H2. Before [Joux] appeared, the general wisdom was that the concatenation

H(M)=H1(M)||H2(M)

should be a significantly stronger hash function than either H1 or H2 individually. This would allow people to use somewhat weak hash functions to build much stronger ones. However, it now seems that this is not the case. Suppose the output of Hi has ni bits. Also, assume that H1 is calculated by an iterative algorithm, as in the preceding discussion. No assumptions are needed for H2. We may even assume that it is a random oracle, in the sense of Section 12.3. In time approximately 12n2n12n1/2, we can find 2n2/2 messages that all have the same hash value for H1. We then compute the value of H2 for each of these 2n2/2 messages. By the birthday paradox, we expect to find a match among these values of H2. Since these messages all have the same H1 value, we have a collision for H1||H2. Therefore, in time proportional to 12n2n12n1/2+n22n2/2 (we’ll explain this estimate shortly), we expect to be able to find a collision for H1||H2. This is not much longer than the time a birthday attack takes to find a collision for the longer of H1 and H2, and is much faster than the time 2(n1+n2)/2 that a standard birthday attack would take on this concatenated hash function.

How did we get the estimate 12n2n12n1/2+n22n2/2 for the running time? We used 12n2n12n1/2 steps to get the 2n2/2 messages with the same H1 value. Each of these messages consisted of n2 blocks of a fixed length. We then evaluated H2 for each of these messages. For almost every hash function, the evaluation time is proportional to the length of the input. Therefore, the evaluation time is proportional to n2 for each of the 2n2/2 messages that are given to H2. This gives the term n22n2/2 in the estimated running time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset