Chapter IV.5. Encryption Algorithms

Encryption involves scrambling information, or plaintext, and converting it into another format — ciphertext — essentially turning ordered data into seemingly random gibberish. By encrypting information, you can keep data information out of the hands of other people, which can be useful for sending coded messages for military use, sending credit card information over the Internet to online shopping Web sites, or just hiding your personal documents from the prying eyes of family members, co-workers, or strangers.

The simplest form of encryption is substitution cipher, which basically replaces each letter with a specific symbol, such as another letter. A substitution cipher simply replaces one letter with another letter from the alphabet a fixed distance away, such as replacing the letter A with the letter Z, the letter B with the letter A, the letter C with the letter B, and so on.

In this case, each letter gets replaced by the previous letter in the alphabet, like this:

I AM HOT

Replacing the letter I with the letter H, the letter A with the letter Z, and so on creates the following ciphertext:

H ZL GNS

This information may be scrambled, but after someone discovers that each letter in the ciphertext actually represents the next letter in the alphabet, this simple substitution cipher can be cracked easily. When an encryption method can be broken easily, it's weak encryption. If an encryption method can't be broken easily, it's strong encryption.

The key to deciphering the substitution cipher is recognizing both the method it's using (replacing one letter with another) and the specific way it implements that method (replacing each letter with the previous letter in the alphabet). A slightly more-complicated substitution cipher might replace each letter with the third letter from the current letter. So the letter A would be replaced by the letter D, the letter B by the letter E, and so on. In this case, the method is the same, but the implementation is slightly different while being only marginally harder to decipher.

Although substitution ciphers are easy to implement, they're also easy to break. After you know to replace a letter in the ciphertext by another letter that's shifted by a specific distance in the alphabet (such as the third letter), you can easily break the code. One way to avoid this problem is to use a onetime pad, which consists of a series of random numbers that tell how far to shift the next letter in a message. So a one-time pad might contain three random numbers, like this:

27 −3

The first number (2) tells the algorithm to shift the first letter of the text by two letters. So if the first three letters of the message are SAM, the first letter, S, would get replaced by the second letter from S in the alphabet, which is U.

The second number (7) tells the algorithm to shift the second letter by seven letters. So the letter A gets replaced by the seventh letter down, which is H. Finally, the third number (3) tells the algorithm to shift the third letter by the third letter down, so the letter M gets replaced by the letter P. Now the entire message SAM gets encrypted as the ciphertext UHP.

The one-time pad gets its name because the random series of numbers are used only once. Now it's virtually impossible for anyone to discover how the letters are substituted because the replacement letters are picked at random. The only way to decipher this ciphertext is to get a copy of the one-time pad.

Of course, the one-time pad has its drawbacks. To work, both parties need a copy of the same one-time pad. If you could transfer a copy of the one-time pad securely, you might as well transfer the message you're delivering instead. Also, one-time pads can be used only once. If they're used more than once, someone can eventually guess the random pattern of letters.

Even worse is that a one-time pad must specify how far to shift each letter in a message. If you're encrypting a message consisting of 1,000 letters, you need a one-time pad to specify how to shift all 1,000 letters. If you're encrypting a message consisting of 10,000 letters, you need a one-time pad that specifies how to shift all 10,000 letters.

Given these problems, one-time pads are generally impractical for normal use. A slight variation of the one-time pad is the use of a password. A password acts like a one-time pad; instead of defining how to alter each individual character in a message, the password determines how to scramble data. Even if you know how data is being scrambled, you won't know how to read the scrambled data without knowing the right password. Passwords are simply smaller and more convenient versions of one-time pads.

The Basics of Encryption

Encryption involves three parts:

  • The encryption algorithm

  • The implementation of the encryption algorithm

  • The length of the encryption key

The encryption algorithm defines the specific method for scrambling data. Some people try to invent their own, obscure encryption algorithms under the theory that if no one knows how the algorithm works, he or she won't know how to break the encryption. This theory is security through obscurity, and it usually fails because a single flaw can leave the encryption vulnerable, much like how locking a bank is useless if a single door is left unlocked. Because it's nearly impossible for a single person to spot all possible flaws in an encryption algorithm, most encryption algorithms are published for anyone to see.

The idea behind publishing an encryption algorithm is to let as many people as possible examine an encryption algorithm for flaws. The more people examining an encryption algorithm, the more likely any flaws will be discovered and patched before people start using the algorithm to encrypt critical information.

Two common ways to encrypt data involve substitution and permutation. Substitution involves replacing data with another chunk of data. The group of algorithms that substitutes data is typically called a substitution box or S-box. Permutation involves altering bits of data, usually represented as a binary number. The group of algorithms that performs this permutation is typically called a permutation box or P-box. Most encryption algorithms use a combination of S-boxes and P-boxes to scramble data.

After an encryption algorithm is deemed mathematically sound and secure, the second step is correctly implementing that algorithm in a particular programming language. Because there are virtually millions of different ways to accomplish the same task in any programming language, the encryption algorithm may be secure but the implementation of the encryption algorithm may not be secure.

After you have a valid encryption algorithm that's been implemented properly in a particular programming language, the final step to creating a secure encryption algorithm is the key length used to scramble the data.

In a simple substitution cipher, the key length could be considered as the value 1 because it offers only one way of replacing letters with another letter, such as shifting each letter by a fixed position in the alphabet. To encrypt a 1,000-character message, a one-time pad would need 1,000 different random numbers for shifting each letter in the message, so the key length could be considered as 1,000.

The key length is crucial because the details of an encryption algorithm are often published for anyone to examine. As a result, the security of most encryption algorithms rests solely on the key length used for the password. You don't need to create a long password of 1,000 or more characters; the encryption algorithm needs to use more bits of data to store any password whether the password consists of 1 character or 100.

Warning

The length of the password simply makes it harder for other people to guess. A 1-letter password means someone needs only 26 guesses. A 100-letter password forces someone to try all possible combinations of letters, making guessing much more difficult. The key length simply defines the amount of space used to store the password but doesn't specify the physical length of the password.

As a simple analogy, think of encryption key lengths like the physical key to your front door. A physical key consists of rods that drop down to prevent a doorknob from turning. The more rods used, the harder it is to pick the lock. The fewer rods used, the easier it is to pick the lock.

In the same way, encryption keys are used to hold passwords. The shorter the key length (measured in bits), the fewer possibilities exist and the weaker the encryption, making it more vulnerable to being broken. The longer the encryption key length, the less likely the encryption will break.

No encryption is considered unbreakable, but the goal of every encryption algorithm is to make unscrambling data so difficult that the time needed to read the encrypted message takes too long. Typically an encrypted message might take the world's fastest computer a million years to break, which effectively makes the encryption "unbreakable."

Note

At one time, a 56-bit key was considered unbreakable, but with today's computers, the smallest secure key length is 128-bits, although many people prefer using 256-bit or 512-bit keys for added security.

Stream ciphers

Encryption algorithms generally fall into two categories — stream and block ciphers. A stream cipher encrypts data one item at a time, such as individual characters in a message. A block cipher encrypts data in fixed chunks or blocks. So rather than encrypt individual characters, a block cipher might encrypt text in ten-character blocks.

Generally, stream ciphers are used when encrypting data of unknown length, such as voice messages, whereas block ciphers are used to encrypt data of fixed lengths, such as a file.

A stream cipher borrows the features of the one-time pad. Whereas a onetime pad must be as long as the message being encrypted, a stream cipher uses smaller keys of fixed lengths, such as 128-bits. Another major difference is that a one-time pad consists of truly random numbers whereas a stream cipher generates a list of random numbers based on a key (password). Computers can't generate truly random numbers, so computer-generated random numbers are often called pseudorandom numbers.

Note

A computer uses an algorithm to generate random numbers, but these aren't true random numbers because the algorithm generates the same list of random numbers over and over again. To alter the way computers generate random numbers, give the computer a value, or a seed. The computer uses this seed to generate a list of numbers; so by giving the computer different values for its seed, a computer can generate a different list of random numbers. Because this list always changes based on the seed value, any computer-generated random numbers are pseudorandom numbers.

A stream cipher uses a key to generate a list of pseudorandom numbers. Then it uses this generated list of pseudorandom numbers to encrypt each character, as shown in Figure 5-1.

How a stream cipher works.

Figure IV.5-1. How a stream cipher works.

Stream ciphers use two different methods to generate a list of pseudo-random numbers:

  • A synchronous stream cipher generates pseudorandom numbers independent of the plaintext data.

  • A self-synchronizing stream cipher generates pseudorandom numbers based on part of the plaintext.

    By creating pseudorandom numbers based on the plaintext, a stream cipher can further randomize the encryption process because no two messages are ever encrypted the exact same way.

Note

The most popular stream cipher is RC4, named after its creator, Ron Rivest. RC4 is used in the two wireless encryption standards — Wired Equivalent Privacy (WEP) and Wi-Fi Protected Access (WPA), which protects wireless Internet connections.

Block ciphers

Block ciphers encrypt data in chunks, although you can think of a stream cipher as a block cipher with each character representing a single data chunk. A typical block size is 64- or 128-bits. Because most data doesn't fit into neat 64- or 128-bit blocks, a block cipher must pad the last chunk of data with information, such as zeroes.

Electronic codebook (ECB)

After a block cipher divides plaintext into blocks, it has several different ways to encrypt that data. The simplest way to encrypt data is to encrypt each block of data separately with the same key, which is the electronic code-book method, as shown in Figure 5-2.

The electronic codebook encrypts blocks of data separately with the same key.

Figure IV.5-2. The electronic codebook encrypts blocks of data separately with the same key.

Encrypting with the electronic codebook method is simple and fast, but because it uses the same key to encrypt data, it tends to encrypt redundant data in identical chunks. So the message I am Sam. Sam I am might create two blocks of encrypted data that would look nearly identical, such as X*4d*34d^ and 34d*X*4d^. A cursory examination of these two encrypted blocks can reveal that X represents the letter I, * represents a space, 4d represents am, 3 represents S, and ^ represents a period.

Cipher-block chaining (CBC)

The ideal encryption algorithm takes identical data and scrambles it in two different ways to avoid revealing any redundant data. So the idea behind the cipher-block chaining (CBC) method is to use the encrypted output from one block as input to encrypt a second block. Because the output from one encrypted block directly affects the encryption of another block, identical plaintext data gets converted into completely different ciphertext, as shown in Figure 5-3.

Cipher-block chaining uses the output from one block as the input for encrypting a second block.

Figure IV.5-3. Cipher-block chaining uses the output from one block as the input for encrypting a second block.

Symmetric/Asymmetric Encryption Algorithms

The most common type of encryption algorithm is a symmetric algorithm, which uses the same password to encrypt and decrypt data. Basically, this means that the password that scrambles the data can also reverse the process and unscramble the data, as shown in Figure 5-4.

A single password can encrypt and decrypt a message.

Figure IV.5-4. A single password can encrypt and decrypt a message.

The biggest problem with symmetric encryption is that both parties need the same password to encrypt and decrypt data, so if you can't securely transfer the password to someone else, that person can never read the message.

A second problem with symmetric encryption is that the weakest link is the password itself. The encryption algorithm could be the strongest in the world, but if someone steals the password, that's like giving someone the key to unlock the ten-foot-thick steel doors guarding all the gold in the vault of Fort Knox.

Note

Some popular symmetric encryption algorithms include the Data Encryption Standard (DES) and the Advanced Encryption Standard (AES). DES was the original government encryption standard approved in 1976. After computers became fast enough, they could crack DES encryption; so after a five-year contest between cryptographers, the government selected a new encryption standard — AES.

Symmetric encryption is often called private-key encryption because both the sender and the receiver need an identical copy of the key to encrypt and decrypt a message. Another type of encryption algorithm is the asymmetric, or public-key encryption. Unlike symmetric encryption, asymmetric encryption requires two keys for both the sender and the receiver.

These two keys are the public key and the private key. You can make a million copies of your public key and give them out, but you want only one copy of your private key. If someone wants to send you a message, he needs to encrypt a message with your public key. After someone encrypts a message with your public key, the only way to decrypt that message is to use your private key, as shown in Figure 5-5.

Public keys encrypt data, and Private keys decrypt data.

Figure IV.5-5. Public keys encrypt data, and Private keys decrypt data.

Public key encryption is commonly used in digital signatures to verify who actually sent an encrypted message. When you encrypt a message with your private key, that message can be decrypted only with your public key. Because you're the only person with a copy of your private key, the only possible way a message can be decrypted with your public key is if it was originally encrypted with your private key. (Unless, of course, someone steals your private key. In that case, he can mimic you online.)

Public key and private key encryption is commonly used together in programs, such as Pretty Good Privacy (PGP), which are designed for sending encrypted messages. First, you encrypt your message with private key encryption. Then you use public key encryption to send the password (private key) to another person. The receiver unlocks the password using her private key and then uses this password to unlock the actual message, as shown in Figure 5-6.

The reason for using both private key and public key encryption is that public key encryption tends to run much slower than private key encryption. That's because with public key encryption, you need to encrypt data using the combination of the sender's private key with the receiver's public key. With private key encryption, you need only one key to encrypt data.

Note

Public key encryption is used in SSL (Secure Sockets Layer), which is how you can connect to a secure shopping Web site and safely transfer your credit card numbers over the Internet. The shopping Web site basically gives your computer its public key so you can encrypt your credit card number and send it over the Internet. Now the only one who can decrypt your credit card number is the shopping Web site holding the private key.

Public key and private key encryption can work together.

Figure IV.5-6. Public key and private key encryption can work together.

Cracking Encryption

Encryption works by scrambling data, but anything scrambled can always be unscrambled. What makes the difference between strong and weak encryption is how many possible ways exist to unscramble the encrypted data.

If only ten possible ways exist to scramble data, that's much easier to crack than a message that offers ten million different ways to scramble data. To unscramble data that offers ten possible ways of scrambling a message, you can just use a brute force attack.

Brute force attack

Basically, a brute force attack tries every possible combination of ways a message can be scrambled. Think of a combination lock that opens only if you align the right number. If the combination lock offers 36 numbers, you can use a brute force attack and exhaustively try all 36 numbers until you find the one that opens the lock.

Now consider a more complicated combination lock that not only displays 36 numbers but forces you to choose three different numbers in the correct order. You can still exhaustively try every possible number combination, but the time needed to do this is likely more than most people are willing to take, which effectively makes the lock secure.

That's the same idea behind encryption. Every form of encryption can eventually be cracked with a brute force attack, but the time needed to exhaustively try every possibility takes too much time. It's possible to crack even the toughest encryption algorithm with a brute force attack, but you might need a room full of million-dollar supercomputers running 24 hours a day for the next million years to eventually crack the encryption. By making the costs in resources and time too high, encryption algorithms are essentially unbreakable through a brute force attack alone.

A variation of the brute force attack is the Chinese lottery. The idea is that if you gave each person in China (with its billion+ population) a computer and assigned each computer a different range of brute force attacks on the same encrypted data, eventually one of them will crack the encryption and hence "win" the lottery.

Instead of performing a brute force attack sequentially, the Chinese lottery attack performs the same brute force attack in parallel, drastically cutting down the time needed to crack the encryption.

A second improvement of the Chinese lottery attack involves reducing the cost of resources necessary to conduct the brute force attack. A typical brute force attack requires a fast computer to exhaustively search all possible combinations. The Chinese lottery attack requires a vast network of much slower and less expensive computers because each computer needs only to exhaustively brute force attack a much smaller range of possibilities.

Although the Chinese lottery attack is mostly theoretical, it's possible for someone to write a computer worm that can spread and infect computers all over the world and conduct a brute force attack on a problem. The worm that finally cracks the problem can then send its winning ticket (the cracked message) to the original programmer of the worm.

Dictionary attacks

A brute force attack is the simplest encryption cracking method, but it's never the fastest. Because the strength of any encryption algorithm relies solely on the password used, it's often much simpler just to guess the password instead.

Most people use simple passwords that they can remember, such as PASSWORD, SEX, LOVE, 123, or names, such as their own name or the name of their favorite movie stars. Because passwords can vary in length, a simple brute force attack is impractical because not only do you need to exhaustively check all five-character passwords, but also all six-, seven-, eight-, nine-, and ten-character passwords.

So a dictionary attack simply combines a brute force attack, but rather than try all possible character combinations, it tries the most common passwords. Besides trying the previously mentioned common passwords, like LOVE and 123, a dictionary attack tries common words from Star Trek, Shakespeare, sports, and popular culture.

Because many people use a common password along with an additional character, such as PASSWORD5, a dictionary attack combines its dictionary with a brute force attack by picking a common word and trying different variations of that word, such as adding a different character at the beginning or end of the password or spelling the password backward.

Think of a dictionary attack as a smarter and faster version of a brute force attack. The odds of someone choosing a password, like S&$J#, is much less than someone choosing a password of SONYA, which is why dictionary attacks are so often successful.

Plaintext and ciphertext attacks

The easiest way to defeat any form of encryption is to steal the original plaintext message. Although this lets you read a single message, it doesn't help you read any additional messages encrypted with the same password. However, after you have the plaintext version of a message along with the encrypted version of that same message, you can deduce the password used to encrypt that message.

Comparing the plaintext version of a message with its encrypted version is a plaintext attack. Because it's rarely possible to retrieve the plaintext of an entire message, a more common code-breaking technique is to examine the ciphertext for patterns with frequency analysis.

The idea behind frequency analysis is that certain letters (such as e) or words (such as and) are more likely to appear in a message. A poor encryption algorithm encrypts the letter e and the word and with identical characters in different parts of the encrypted message. From this simple clue, it's possible to gradually deduce the encrypted symbols that represent the second-most-frequently used letters and words.

Although no form of encryption is unbreakable, the goal of every encryption algorithm is to resist all known forms of attack so as to make cracking the encryption unfeasible due to the lack of time or resources. As computers get faster and more powerful, today's encryption algorithms will only get weaker and easier to crack. By the time that occurs, mathematicians and computer scientists will have created newer and better encryption algorithms until those age and become easily broken all over again.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset