Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 11 Hash Functions

11.1 Hash Functions

A basic component of many cryptographic algorithms is what is known as a hash function. When a hash function satisfies certain non-invertibility properties, it can be used to make many algorithms more efficient. In the following, we discuss the basic properties of hash functions and attacks on them. We also briefly discuss the random oracle model, which is a method of analyzing the security of algorithms that use hash functions. Later, in Chapter 13, hash functions will be used in digital signature algorithms. They also play a role in security protocols in Chapter 15, and in several other situations.

A cryptographic hash function $h$ $h$ takes as input a message of arbitrary length and produces as output a message digest of fixed length, for example, 256 bits as depicted in Figure 11.1. Certain properties should be satisfied:

Given a message $m$ $m$ , the message digest $h (m)$ $h (m)$ can be calculated very quickly.
Given a $y$ $y$ , it is computationally infeasible to find an $m^{'}$ $m^{'}$ with $h (m^{'}) = y$ $h (m^{'}) = y$ (in other words, $h$ $h$ is a one-way, or preimage resistant, function). Note that if $y$ $y$ is the message digest of some message, we are not trying to find this message. We are only looking for some $m^{'}$ $m^{'}$ with $h (m^{'}) = y$ $h (m^{'}) = y$ .
It is computationally infeasible to find messages $m_{1}$ $m_{1}$ and $m_{2}$ $m_{2}$ with $h (m_{1}) = h (m_{2})$ $h (m_{1}) = h (m_{2})$ (in this case, the function $h$ $h$ is said to be strongly collision resistant).

An illustration shows a cryptographic hash function. It shows a long message and 256-bit message digest.

Figure 11.1 Full Alternative Text

Note that since the set of possible messages is much larger than the set of possible message digests, there should always be many examples of messages $m_{1}$ $m_{1}$ and $m_{2}$ $m_{2}$ with $h (m_{1}) = h (m_{2})$ $h (m_{1}) = h (m_{2})$ . The requirement (3) says that it should be hard to find examples. In particular, if Bob produces a message $m$ $m$ and its hash $h (m)$ $h (m)$ , Alice wants to be reasonably certain that Bob does not know another message $m^{'}$ $m^{'}$ with $h (m^{'}) = h (m)$ $h (m^{'}) = h (m)$ , even if both $m$ $m$ and $m^{'}$ $m^{'}$ are allowed to be random strings of symbols.

Preimage resistance and collision resistance are closely related, but we list them separately because they are used in slightly different circumstances. The following argument shows that, for our hash functions, collision resistance implies preimage resistance: Suppose $H$ $H$ is not preimage resistant. Take a random $x$ $x$ and compute $y = H (x)$ $y = H (x)$ . If $H$ $H$ is not preimage resistant, we can quickly find $x^{'}$ $x^{'}$ with $H (x^{'}) = y = H (x)$ $H (x^{'}) = y = H (x)$ . Because $H$ $H$ is many-to-one, it is likely that $x \neq x^{'}$ $x \neq x^{'}$ , so we have a collision, contradicting the collision resistance of $H$ $H$ . However, there are examples that show that for arbitrary functions, collision resistance does not imply preimage resistance. See Exercise 12.

In practice, it is sometimes sufficient to weaken (3) to require $H$ $H$ to be weakly collision resistant. This means that given $x$ $x$ , it is computationally infeasible to find $x^{'} \neq x$ $x^{'} \neq x$ with $H (x^{'}) = H (x)$ $H (x^{'}) = H (x)$ . This property is also called second preimage resistance.

Requirement (3) is the hardest one to satisfy. In fact, in 2004, Wang, Feng, Lai, and Yu (see [Wang et al.]) found many examples of collisions for the popular hash functions MD4, MD5, HAVAL-128, and RIPEMD. The MD5 collisions have been used by Ondrej Mikle [Mikle] to create two different and meaningful documents with the same hash, and the paper [Lenstra et al.] shows how to produce examples of X.509 certificates (see Section 15.5) with the same MD5 hash (see also Exercise 15). This means that a valid digital signature (see Chapter 13) on one certificate is also valid for the other certificate, hence it is impossible for someone to determine which is the certificate that was legitimately signed by a Certification Authority. It has been reported that weaknesses in MD5 were part of the design of the Flame malware, which attacked several computers in the Middle East, including Iran’s oil industry, from 2010 to 2012.

In 2005, Wang, Yin, and Yu [Wang et al. 2] predicted that collisions could be found for the hash function SHA-1 with around $2^{69}$ $2^{69}$ calculations, which is much better than the expected $2^{80}$ $2^{80}$ calculations required by the birthday attack (see Section 12.1). In addition, they found collisions in a smaller 60-round version of SHA-1. These weaknesses were a cause for concern for using these hash algorithms and led to research into replacements. Finally, in 2017, a joint project between CWI Amsterdam and Google Research found collisions for SHA-1 [Stevens et al.]. Although SHA-1 is still common, it is starting to be used less and less.

One of the main uses of hash functions is in digital signatures. Since the length of a digital signature is often at least as long as the document being signed, it is much more efficient to sign the hash of a document rather than the full document. This will be discussed in Chapter 13.

Hash functions may also be employed as a check on data integrity. The question of data integrity comes up in basically two scenarios. The first is when the data (encrypted or not) are being transmitted to another person and a noisy communication channel introduces errors to the data. The second occurs when an observer rearranges the transmission in some manner before it gets to the receiver. Either way, the data have become corrupted.

For example, suppose Alice sends Bob long messages about financial transactions with Eve and encrypts them in blocks. Perhaps Eve deduces that the tenth block of each message lists the amount of money that is to be deposited to Eve’s account. She could easily substitute the tenth block from one message into another and increase the deposit.

In another situation, Alice might send Bob a message consisting of several blocks of data, but one of the blocks is lost during transmission. Bob might never realize that the block is missing.

Here is how hash functions can be used. Say we send $(m, h (m))$ $(m, h (m))$ over the communications channel and it is received as $(M, H)$ $(M, H)$ . To check whether errors might have occurred, the recipient computes $h (M)$ $h (M)$ and sees whether it equals $H$ $H$ . If any errors occurred, it is likely that $h (M) \neq H$ $h (M) \neq H$ , because of the collision-resistance properties of $h$ $h$ .

Example

Let $n$ $n$ be a large integer. Let $h (m) = m (m o d n)$ $h (m) = m (m o d n)$ be regarded as an integer between 0 and $n - 1$ $n - 1$ . This function clearly satisfies (1). However, (2) and (3) fail: Given $y$ $y$ , let $m = y$ $m = y$ . Then $h (m) = y$ $h (m) = y$ . So $h$ $h$ is not one-way. Similarly, choose any two values $m_{1}$ $m_{1}$ and $m_{2}$ $m_{2}$ that are congruent mod $n$ $n$ . Then $h (m_{1}) = h (m_{2})$ $h (m_{1}) = h (m_{2})$ , so $h$ $h$ is not strongly collision resistant.

Example

The following example, sometimes called the discrete log hash function, is due to Chaum, van Heijst, and Pfitzmann [Chaum et al.]. It satisfies (2) and (3) but is much too slow to be used in practice. However, it demonstrates the basic idea of a hash function.

First we select a large prime number $p$ $p$ such that $q = (p - 1) / 2$ $q = (p - 1) / 2$ is also prime (see Exercise 15 in Chapter 13). We now choose two primitive roots $α_{1}$ $α_{1}$ and $α_{2}$ $α_{2}$ for $p$ $p$ . Since $α_{1}$ $α_{1}$ is a primitive root, there exists $a$ $a$ such that $α_{1}^{a} \equiv α_{2} (m o d p)$ $α_{1}^{a} \equiv α_{2} (m o d p)$ . However, we assume that $a$ $a$ is not known (finding $a$ $a$ , if not given it in advance, involves solving a discrete log problem, which we assume is hard).

The hash function $h$ $h$ will map integers mod $q^{2}$ $q^{2}$ to integers mod $p$ $p$ . Therefore, the message digest usually contains approximately half as many bits as the message. This is not as drastic a reduction in size as is usually required in practice, but it suffices for our purposes.

Write $m = x_{0} + x_{1} q$ $m = x_{0} + x_{1} q$ with $0 \leq x_{0}, x_{1} \leq q - 1$ $0 \leq x_{0}, x_{1} \leq q - 1$ . Then define

h (m) \equiv α_{1}^{x_{0}} α_{2}^{x_{1}} (m o d p) .

$h (m) \equiv α_{1}^{x_{0}} α_{2}^{x_{1}} (m o d p) .$

The following shows that the function $h$ $h$ is probably strongly collision resistant.

Proposition

If we know messages $m \neq m^{'}$ $m \neq m^{'}$ with $h (m) = h (m^{'})$ $h (m) = h (m^{'})$ , then we can determine the discrete logarithm $a = L_{α_{1}} (α_{2})$ $a = L_{α_{1}} (α_{2})$ .

Proof

Write $m = x_{0} + x_{1} q$ $m = x_{0} + x_{1} q$ and $m^{'} = x_{0}^{'} + x_{1}^{'} q$ $m^{'} = x_{0}^{'} + x_{1}^{'} q$ . Suppose

α_{1}^{x_{0}} α_{2}^{x_{1}} \equiv α_{1}^{x_{0}^{'}} α_{2}^{x_{1}^{'}} (mod p) .

$α_{1}^{x_{0}} α_{2}^{x_{1}} \equiv α_{1}^{x_{0}^{'}} α_{2}^{x_{1}^{'}} (mod p) .$

Using the fact that $α_{2} \equiv α_{1}^{a} (m o d p)$ $α_{2} \equiv α_{1}^{a} (m o d p)$ , we rewrite this as

α_{1}^{a (x_{1} - x_{1}^{'}) - (x_{0}^{'} - x_{0})} \equiv 1 (mod p) .

$α_{1}^{a (x_{1} - x_{1}^{'}) - (x_{0}^{'} - x_{0})} \equiv 1 (mod p) .$

Since $α_{1}$ $α_{1}$ is a primitive root mod $p$ $p$ , we know that $α_{1}^{k} \equiv 1 (m o d p)$ $α_{1}^{k} \equiv 1 (m o d p)$ if and only if $k \equiv 0 (m o d p - 1)$ $k \equiv 0 (m o d p - 1)$ . In our case, this means that

a (x_{1} - {x_{1}}^{'}) \equiv {x_{0}}^{'} - x_{0} (m o d p - 1) .

$a (x_{1} - {x_{1}}^{'}) \equiv {x_{0}}^{'} - x_{0} (m o d p - 1) .$

Let $d = gcd(x_{1} - x'_{1}, p - 1)$ $d = gcd(x_{1} - x'_{1}, p - 1)$ . There are exactly $d$ $d$ solutions to the preceding congruence (see Subsection 3.3.1), and they can be found quickly. By the choice of $p$ $p$ , the only factors of $p - 1$ $p - 1$ are $1, 2, q, p - 1$ $1, 2, q, p - 1$ . Since $0 \leq x_{1}, {x_{1}}^{'} \leq q - 1$ $0 \leq x_{1}, {x_{1}}^{'} \leq q - 1$ , it follows that $- (q - 1) \leq x_{1} - {x_{1}}^{'} \leq q - 1$ $- (q - 1) \leq x_{1} - {x_{1}}^{'} \leq q - 1$ . Therefore, if $x_{1} - {x_{1}}^{'} \neq 0$ $x_{1} - {x_{1}}^{'} \neq 0$ , then it is a nonzero multiple of $d$ $d$ of absolute value less than $q$ $q$ . This means that $d \neq q, p - 1$ $d \neq q, p - 1$ , so $d = 1$ $d = 1$ or 2. Therefore, there are at most two possibilities for $a$ $a$ . Calculate $α_{1}^{a}$ $α_{1}^{a}$ for each possibility; only one of them will yield $α_{2}$ $α_{2}$ . Therefore, we obtain $a$ $a$ , as desired.

On the other hand, if $x_{1} - {x_{1}}^{'} = 0$ $x_{1} - {x_{1}}^{'} = 0$ , then the preceding yields ${x_{0}}^{'} - x_{0} \equiv 0 (m o d p - 1)$ ${x_{0}}^{'} - x_{0} \equiv 0 (m o d p - 1)$ . Since $- (q - 1) \leq {x_{0}}^{'} - x_{0} \leq q - 1$ $- (q - 1) \leq {x_{0}}^{'} - x_{0} \leq q - 1$ , we must have ${x_{0}}^{'} = x_{0}$ ${x_{0}}^{'} = x_{0}$ . Therefore, $m = m^{'}$ $m = m^{'}$ , contrary to our assumption.

It is now easy to show that $h$ $h$ is preimage resistant. Suppose we have an algorithm $g$ $g$ that starts with a message digest $y$ $y$ and quickly finds an $m$ $m$ with $h (m) = y$ $h (m) = y$ . In this case, it is easy to find $m_{1} \neq m_{2}$ $m_{1} \neq m_{2}$ with $h (m_{1}) = h (m_{2})$ $h (m_{1}) = h (m_{2})$ : Choose a random $m$ $m$ and compute $y = h (m)$ $y = h (m)$ , then compute $g (y)$ $g (y)$ . Since $h$ $h$ maps $q^{2}$ $q^{2}$ messages to $p - 1 = 2 q$ $p - 1 = 2 q$ message digests, there are many messages $m^{'}$ $m^{'}$ with $h (m^{'}) = h (m)$ $h (m^{'}) = h (m)$ . It is therefore not very likely that $m^{'} = m$ $m^{'} = m$ . If it is, try another random $m$ $m$ . Soon, we should find a collision, that is, messages $m_{1} \neq m_{2}$ $m_{1} \neq m_{2}$ with $h (m_{1}) = h (m_{2})$ $h (m_{1}) = h (m_{2})$ . The preceding proposition shows that we can then solve a discrete log problem. Therefore, it is unlikely that such an algorithm $g$ $g$ exists.

As we mentioned earlier, this hash function is good for illustrative purposes but is impractical because of its slow nature. Although it can be computed efficiently via repeated squaring, it turns out that even repeated squaring is too slow for practical applications. In applications such as electronic commerce, the extra time required to perform the multiplications in software is prohibitive.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for
Chapter 11 Hash Functions

Chapter 11 Hash Functions

11.1 Hash Functions

Figure 11.1 A Hash Function

Example

Example

Proposition

Table of Contents for Chapter 11 Hash Functions

Create new playlist

Sign In

Sign Up

Table of Contents for
Chapter 11 Hash Functions