One variation of the concept of a hash chain is the blockchain. Blockchains are a technology that has garnered a lot of attention since they provide a convenient way to keep track of information in a secure and distributed manner.
Hash chains are iterations of hash functions, while blockchains are hash chains that have extra structure. In order to understand blockchains, we need to look at several different building blocks.
Let us start with hash pointers. A hash pointer is simply a pointer to where some data is stored, combined with a cryptographic hash of the value of the data that is being pointed at. We may visualize a hash pointer as something like Figure 12.1.
The hash pointer is useful in that it gives a means to detect alterations of the block of data. If someone alters the block, then the hash contained in the hash pointer will not match the hash of the altered data block, assuming of course that no one has altered the hash pointer.
If we want to make it harder for someone to alter the hash, then we can create an ordered collection of data blocks, each with a hash pointer to a previous block. This is precisely the idea behind a blockchain. A blockchain is a collection of data blocks and hash pointers arranged in a data structure known as a linked list. Normal linked lists involve series of blocks of data that are each paired with a pointer to a previous block of data and its pointer. Blockchains, however, replace the pointer in a linked list with a hash pointer, as depicted in Figure 12.2.
Blockchains are useful because they allow entities to create a ledger of data that can be continually updated by one or more users. An initial version of the ledger is created, and then subsequent updates to the ledger reference previous updates to the ledger. In order to accomplish this, the basic structure of a blockchain consists of three main components: data, which is the ledger update; a pointer that tells where the previous block is located; and a digest (hash) that allows one to verify that the entire contents of the previous block have not changed. Suppose that a large record of blocks has been stored in a blockchain, such as in Figure 12.2, with the final hash value not paired with any data. What happens, then, if an adversary comes along and wants to modify the data in block ? If the data in block is altered, then the hash contained in block will not match the hash of the modified data in block . This forces the adversary to have to modify the hash in block . But the hash of block was calculated based on the entire block (i.e. including the -th hash), and therefore there will now be a mismatch between the hash of block and the hash pointer in block . This process forces the adversary to continue modifying blocks until the end of the blockchain is reached. This requires a significant effort on the part of the adversary, but is possible since the adversary can calculate the hash values that he needs to replace with. Ultimately, in order to prevent the adversary from succeeding, we require that the final hash value is stored in a manner that prevents the adversary from modifying it, thereby providing a final form of evidence preventing modification to the blockchain.
In practice, the data contained in each block can be quite large, consisting of many data records, and therefore one will often find another hash-based data structure used in blockchains. This data structure uses hash pointers arranged in a binary tree, and is known as a Merkle tree. Merkle trees are useful as they make it easy for someone to prove that a certain piece of data exists within a particular block of data.
In a Merkle tree, one has a collection of records of data that have been arranged as the leaves of a binary tree, such as shown in Figure 12.3. These data records are then grouped in pairs of two, and for each pair two hash pointers are created, one pointing to the left data record and another to the right data record. The collection of hash pointers then serve as the data record in the next level of the tree, which are subsequently grouped in pairs of two. Again, for each pair two hash pointers are created, one pointing to the left record and the other to the right data record. We proceed up the tree until we reach the end, which is a single record that corresponds to the tree’s root. The hash of this record is stored, giving one the ability to make certain that the entire collection of data records contained within the Merkle tree has not been altered.
Now suppose that someone wants to prove to you that a specific data record exists within a block. To do this, all that they need to show you is the data record, along with the hashes on the path from the data record to the root of the Merkle tree. In particular, one does not need to show all of the other data records, one only needs to show the hashes at the higher levels. This process is efficient, requiring roughly items from the Merkle tree to be shown, where is the number of blocks of data recorded. For example, to verify that is in Figure 12.3, someone needs to show you , , (but not ), , the two inputs and the hash at the next level up, and the two inputs and the hash at the top. You can check these hash computations, and, since the hash function is collision-resistant, you can be sure that is there and has not been changed. Otherwise, at some level, a hash value has been changed and a collision has been found.