Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

12.2 Multicollisions

In this section, we show that the iterative nature of hash algorithms based on the Merkle-Damgård construction makes them less resistant than expected to finding multicollisions, namely inputs $x_{1}, \dots, x_{n}$ $x_{1}, \dots, x_{n}$ all with the same hash value. This was pointed out by Joux [Joux], who also gave implications for properties of concatenated hash functions, which we discuss below.

Suppose there are $r$ $r$ people and there are $N$ $N$ possible birthdays. It can be shown that if $r \approx N^{(k - 1) / k}$ $r \approx N^{(k - 1) / k}$ , then there is a good chance of at least $k$ $k$ people having the same birthday. In other words, we expect a $k$ $k$ -collision. If the output of a hash function is random, then we expect that this estimate holds for $k$ $k$ -collisions of hash function values. Namely, if a hash function has $n$ $n$ -bit outputs, hence $N = 2^{n}$ $N = 2^{n}$ possible values, and if we calculate $r = 2^{n (k - 1) / k}$ $r = 2^{n (k - 1) / k}$ values of the hash function, we expect a $k$ $k$ -collision. However, in the following, we’ll show that often we can obtain collisions much more easily.

In many hash functions, for example, SHA-256, there is a compression function $f$ $f$ that operates on inputs of a fixed length. Also, there is a fixed initial value $I V$ $I V$ . The message is padded to obtain the desired format, then the following steps are performed:

Split the message $M$ $M$ into blocks $M_{1}, M_{2}, \dots, M_{ℓ}$ $M_{1}, M_{2}, \dots, M_{ℓ}$ .
Let $H_{0}$ $H_{0}$ be the initial value $I V$ $I V$ .
For $i = 1, 2, \dots, ℓ$ $i = 1, 2, \dots, ℓ$ , let $H_{i} = f (H_{i - 1}, M_{i})$ $H_{i} = f (H_{i - 1}, M_{i})$ .
Let $H (M) = H_{ℓ}$ $H (M) = H_{ℓ}$ .

In SHA-256, the compression function is described in Section 11.4. For each iteration, it takes a 256-bit input from the preceding iteration along with a message block $M_{i}$ $M_{i}$ of length 512 and outputs a new string of length 256.

Suppose the output of the function $f$ $f$ , and therefore also of the hash function $H$ $H$ , has $n$ $n$ bits. A birthday attack can find, in approximately $2^{n / 2}$ $2^{n / 2}$ steps, two blocks $m_{0}$ $m_{0}$ and $m_{0}^{^{'}}$ $m_{0}^{^{'}}$ such that $f (H_{0}, m_{0}) = f (H_{0}, m_{0}^{^{'}})$ $f (H_{0}, m_{0}) = f (H_{0}, m_{0}^{^{'}})$ . Let $h_{1} = f (H_{0}, m_{0})$ $h_{1} = f (H_{0}, m_{0})$ . A second birthday attack finds blocks $m_{1}$ $m_{1}$ and ${m_{1}}^{'}$ ${m_{1}}^{'}$ with $f (h_{1}, m_{1}) = f (h_{1}, m_{1}^{^{'}})$ $f (h_{1}, m_{1}) = f (h_{1}, m_{1}^{^{'}})$ . Continuing in this manner, we let

h_{i} = f (h_{i - 1}, m_{i - 1})

$h_{i} = f (h_{i - 1}, m_{i - 1})$

and use a birthday attack to find $m_{i}$ $m_{i}$ and $m_{i}^{^{'}}$ $m_{i}^{^{'}}$ with

f (h_{i}, m_{i}) = f (h_{i}, m_{i}^{^{'}}) .

$f (h_{i}, m_{i}) = f (h_{i}, m_{i}^{^{'}}) .$

This process is continued until we have $t$ $t$ pairs of blocks $m_{0}, m_{0}^{^{'}}, m_{1}, m_{1}^{^{'}}, \dots$ $m_{0}, m_{0}^{^{'}}, m_{1}, m_{1}^{^{'}}, \dots$ , $m_{t - 1}, m_{t - 1}^{^{'}}$ $m_{t - 1}, m_{t - 1}^{^{'}}$ , where $t$ $t$ is some integer to be determined later.

We claim that each of the $2^{t}$ $2^{t}$ messages

\begin{matrix} m_{0} ∥ m_{1} ∥ \dots ∥ m_{t - 1} \\ m_{0}^{'} ∥ m_{1} ∥ \dots ∥ m_{t - 1} \\ m_{0} ∥ m_{1}^{'} ∥ \dots ∥ m_{t - 1} \\ m_{0}^{'} ∥ m_{1}^{'} ∥ \dots ∥ m_{t - 1} \\ \dots \dots \dots \\ m_{0}^{'} ∥ m_{1} ∥ \dots ∥ m_{t - 1}^{'} \\ m_{0} ∥ m_{1}^{'} ∥ \dots ∥ m_{t - 1}^{'} \\ m_{0}^{'} ∥ m_{1}^{'} ∥ \dots ∥ m_{t - 1}^{'} \end{matrix}

$\begin{matrix} m_{0} ∥ m_{1} ∥ \dots ∥ m_{t - 1} \\ m_{0}^{'} ∥ m_{1} ∥ \dots ∥ m_{t - 1} \\ m_{0} ∥ m_{1}^{'} ∥ \dots ∥ m_{t - 1} \\ m_{0}^{'} ∥ m_{1}^{'} ∥ \dots ∥ m_{t - 1} \\ \dots \dots \dots \\ m_{0}^{'} ∥ m_{1} ∥ \dots ∥ m_{t - 1}^{'} \\ m_{0} ∥ m_{1}^{'} ∥ \dots ∥ m_{t - 1}^{'} \\ m_{0}^{'} ∥ m_{1}^{'} ∥ \dots ∥ m_{t - 1}^{'} \end{matrix}$

(all possible combinations with $m_{i}$ $m_{i}$ and $m_{i}^{^{'}}$ $m_{i}^{^{'}}$ ) has the same hash value. This is because of the iterative nature of the hash algorithm. At each calculation $h_{i} = f (m, h_{i - 1})$ $h_{i} = f (m, h_{i - 1})$ , the same value $h_{i}$ $h_{i}$ is obtained whether $m = m_{i - 1}$ $m = m_{i - 1}$ or $m = m_{i - 1}^{'}$ $m = m_{i - 1}^{'}$ . Therefore, the output of the function $f$ $f$ during each step of the hash algorithm is independent of whether an $m_{i - 1}$ $m_{i - 1}$ or an $m_{i - 1}^{'}$ $m_{i - 1}^{'}$ is used. Therefore, the final output of the hash algorithm is the same for all messages. We thus have a $2^{t}$ $2^{t}$ -collision.

This procedure takes approximately $t 2^{n / 2}$ $t 2^{n / 2}$ steps and has an expected running time of approximately a constant times $t n 2^{n / 2}$ $t n 2^{n / 2}$ (see Exercise 13). Let $t = 2$ $t = 2$ , for example. Then it takes only around twice as long to find four messages with same hash value as it took to find two messages with the same hash. If the output of the hash function were truly random, rather than produced, for example, by an iterative algorithm, then the above procedure would not work. The expected time to find four messages with the same hash would then be approximately $2^{3 n / 4}$ $2^{3 n / 4}$ , which is much longer than the time it takes to find two colliding messages. Therefore, it is easier to find multicollisions with an iterative hash algorithm.

An interesting consequence of the preceding discussion relates to attempts to improve hash functions by concatenating their outputs. Suppose we have two hash functions $H_{1}$ $H_{1}$ and $H_{2}$ $H_{2}$ . Before [Joux] appeared, the general wisdom was that the concatenation

H (M) = H_{1} (M) || H_{2} (M)

$H (M) = H_{1} (M) || H_{2} (M)$

should be a significantly stronger hash function than either $H_{1}$ $H_{1}$ or $H_{2}$ $H_{2}$ individually. This would allow people to use somewhat weak hash functions to build much stronger ones. However, it now seems that this is not the case. Suppose the output of $H_{i}$ $H_{i}$ has $n_{i}$ $n_{i}$ bits. Also, assume that $H_{1}$ $H_{1}$ is calculated by an iterative algorithm, as in the preceding discussion. No assumptions are needed for $H_{2}$ $H_{2}$ . We may even assume that it is a random oracle, in the sense of Section 12.3. In time approximately $\frac{1}{2} n_{2} n_{1} 2^{n_{1} / 2}$ $\frac{1}{2} n_{2} n_{1} 2^{n_{1} / 2}$ , we can find $2^{n_{2} / 2}$ $2^{n_{2} / 2}$ messages that all have the same hash value for $H_{1}$ $H_{1}$ . We then compute the value of $H_{2}$ $H_{2}$ for each of these $2^{n_{2} / 2}$ $2^{n_{2} / 2}$ messages. By the birthday paradox, we expect to find a match among these values of $H_{2}$ $H_{2}$ . Since these messages all have the same $H_{1}$ $H_{1}$ value, we have a collision for $H_{1} || H_{2}$ $H_{1} || H_{2}$ . Therefore, in time proportional to $\frac{1}{2} n_{2} n_{1} 2^{n_{1} / 2} + n_{2} 2^{n_{2} / 2}$ $\frac{1}{2} n_{2} n_{1} 2^{n_{1} / 2} + n_{2} 2^{n_{2} / 2}$ (we’ll explain this estimate shortly), we expect to be able to find a collision for $H_{1} || H_{2}$ $H_{1} || H_{2}$ . This is not much longer than the time a birthday attack takes to find a collision for the longer of $H_{1}$ $H_{1}$ and $H_{2}$ $H_{2}$ , and is much faster than the time $2^{(n_{1} + n_{2}) / 2}$ $2^{(n_{1} + n_{2}) / 2}$ that a standard birthday attack would take on this concatenated hash function.

How did we get the estimate $\frac{1}{2} n_{2} n_{1} 2^{n_{1} / 2} + n_{2} 2^{n_{2} / 2}$ $\frac{1}{2} n_{2} n_{1} 2^{n_{1} / 2} + n_{2} 2^{n_{2} / 2}$ for the running time? We used $\frac{1}{2} n_{2} n_{1} 2^{n_{1} / 2}$ $\frac{1}{2} n_{2} n_{1} 2^{n_{1} / 2}$ steps to get the $2^{n_{2} / 2}$ $2^{n_{2} / 2}$ messages with the same $H_{1}$ $H_{1}$ value. Each of these messages consisted of $n_{2}$ $n_{2}$ blocks of a fixed length. We then evaluated $H_{2}$ $H_{2}$ for each of these messages. For almost every hash function, the evaluation time is proportional to the length of the input. Therefore, the evaluation time is proportional to $n_{2}$ $n_{2}$ for each of the $2^{n_{2} / 2}$ $2^{n_{2} / 2}$ messages that are given to $H_{2}$ $H_{2}$ . This gives the term $n_{2} 2^{n_{2} / 2}$ $n_{2} 2^{n_{2} / 2}$ in the estimated running time.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 12.2 Multicollisions

Create new playlist

Sign In

Sign Up

Table of Contents for
12.2 Multicollisions