6.4 Attacks on Password Bias

If our computer user community commits to using long, random passwords, attackers won’t succeed very often at guessing passwords, either online or offline. However, attackers can probably retrieve some passwords if they focus on likely passwords. This is the dictionary attack.

Researchers discussed dictionary attacks in the 1970s and 1980s, and a few systems applied simple tricks to make such attacks more difficult. In late 1988, someone actually used a dictionary attack to crack passwords and exploit them. It was, again, the Morris worm.

Earlier in the text, we saw how the Morris worm used a flaw in the “finger” process to penetrate a vulnerable computer. Not all computers ran that version of “finger,” so that attack vector did not always succeed. If the “finger” attack failed, the worm would retrieve a computer’s password file. The worm attacked Unix systems, and at that time the Unix password file was still readable to all users. The worm would try to crack any user password it could (FIGURE 6.11). If it succeeded, it would infest the computer via that user’s login. Although no reliable numbers were collected regarding the actual achievements of the worm, observers at the time estimated that its password cracking succeeded 50 percent of the time. (See Section 2.3.)

A procedure diagram depicts steps in the dictionary attack by Morris worm.

FIGURE 6.11 Dictionary attack by the Morris worm.

Following the worm, several researchers analyzed dictionary attacks. They constructed dictionaries, calculated hash values for each word, and compared them against genuine hashed passwords. The researchers’ attacks didn’t crack every password, but they often succeeded. The rates of success varied with the researcher and dictionary, but ranged from about 20 percent to about 35 percent.

Programmers used these strategies to create “password cracking utilities.” These programs would take a hashed password file from a well-known system and try to crack the passwords. Typical programs started with a list of likely passwords, hashed them, and compared the hashed results to those in the password file. Effective programs could crack common passwords in a matter of seconds.

Many organizations recognized this as a serious risk. In response, they often passed rules to force—or at least encourage—users to select passwords that were hard to crack. With Windows 2000, Microsoft introduced tools to restrict password selection by requiring users to include a variety of uppercase and lowercase letters, digits, and punctuation characters.

The point of Microsoft’s restrictions, and of similar rules enforced by other systems, is to keep people from choosing simple words from a small collection of possibilities. The rules try to force users to choose passwords containing a broad range of randomly chosen characters, instead of a word consisting entirely of letters. The rules attempt to make passwords harder to guess in order to make dictionary attacks impractical.

Bias and Entropy

Strictly speaking, a password’s strength depends entirely on its uncertainty. When we measure the uncertainty in the value of a data item, we measure its entropy.

In physics, entropy measures the disorder in a system. In computing, it represents the amount of information you don’t know if you don’t know the value of a variable. This is also called Shannon entropy after the author of information theory. The entropy in a password reflects the number of different passwords that might exist in proportion to the likelihood of each one actually occurring. For example, if there are 817 billion possible passwords and all are equally likely, then the entropy is very large. But if everyone chooses from a very small collection of words instead, then the entropy is very small. We focus on the likelihood of particular passwords and not just on the search space.

It is hard to calculate entropy. Password selection relies on hard-to-predict personal behavior. Instead of calculating the entropy itself, we estimate password strength by estimating biases in password selection and combining it with search space calculations.

6.4.1 Biased Choices and Average Attack Space

When we calculate the search space for a particular password, we find the upper bound for the number of guesses required. This represents the attacker’s worst-case scenario when trying to guess a password. However, attackers don’t have to try all possibilities to crack some passwords. Attackers reduce their search by exploiting biases in password selection.

Average Attack Space

When attackers exploit biases in password selection, they aren’t targeting individual users. Instead, they are targeting a whole community. They are relying on the fact that some password choices will reflect well-known biases. This is a statistical attack on passwords; it tries to find any password that it can crack without focusing on a specific user’s password.

As defenders, we need to estimate the danger such attacks might pose. If the search space poses a worst-case scenario for the password attacker, we also need to estimate the work involved in the best-case scenario. The average attack space estimates the number of guesses required before success is likely. This estimates the lower bound for a successful attack.

For our purposes, we treat a likelihood of 50-50 or better as being likely to succeed. If we apply this to our earlier search space calculations, an attacker achieves the 50-50 likelihood after trying half of all possible passwords. In other words:

If the search space contains all possibilities, and they are equally likely, then the Average Attack Space is half of the search space.

Biased Password Selection

When people are biased in their password selection, they choose passwords from only part of the total possible search space. We see a similar problem if we try to keep food stocked without actually counting cartons in the kitchen. We can estimate how quickly we use up our food by measuring the average amount we eat. In practice, however, we can’t assume that we use everything at the same rate; some foods disappear much more quickly. To keep food stocked effectively, we need to replace the food most likely to be used up.

In passwords, we search most efficiently if we focus on the passwords people are most likely to choose. We can either make a list of those words or theorize about peoples’ choices, and then estimate the number of passwords most likely to be chosen. This represents a “dictionary” of the likely passwords. Our biased search space S equals the size of this dictionary.

In practice, not everyone will choose from this biased collection. To estimate the number of guesses needed to find a biased password, we adjust the search to consider the fraction of people likely to choose from that set. We represent the fraction with L. This yields the average attack space (V ).

images

Here is an example: Imagine that research at a particular site found that 1 out of 10 people chose the name of a United Nations member country as their password. There are 192 member countries. If we include the names both capitalized and uncapitalized, that yields a search space of 192 × 2 = 384 passwords.

If those are the only passwords allowed, then the average attack space is half of the search space, or 192.

Because only 1 out of 10 people choose from that search space, we must try 10 times as many passwords so that, on average, we have a better chance of guessing at least one password from that smaller group. That yields an average attack space of 1,920.

Measuring Likelihood, Not Certainty

The result V estimates the average number of trial-and-error guesses the attacker must make to guess a password. The attacker may guess the password sooner or later than that, or not at all. The average attack space simply says that statistically the attacker is likely to find a password after performing that many trials.

If we need to positively guess a password, then we need to search the entire search space of possible passwords. We can’t focus our attention on only the likely passwords. On average, the search may involve only guessing half of the possible passwords. On the other hand, it will never require searching more than the entire search space.

Making Independent Guesses

Our average attack space estimate only applies, however, if we respect a fundamental assumption: the estimate only applies to a large number of independent guesses. Each guess should randomly select a target (a victim user’s password) and a candidate (a legal, randomly chosen password). If the candidate password’s hash matches the target’s hash, then we found a pass-word. If not, then we randomly choose a different target and different candidate to test. We should choose both the target and the candidate randomly each time. This helps ensure that each trial is independent.

Example: Four-Digit Luggage Lock

FIGURE 6.12 shows a combination lock for a lock box or a storage case. There are four digits, and the owner chooses the combination. The combination can be any number between 0000 and 9999, yielding 10,000 possible combinations.

A photograph of a four-digit luggage lock which is set to the number 1225 is shown.

FIGURE 6.12 A four-digit luggage lock.

Courtesy of Dr. Richard Smith.

If we assume that all combinations are equally likely, here is the average attack space calculation:

images

If we take the attacker’s point of view, we have to try at least half of the search space (5000) before it becomes likely that we guess the right combination. If we try one number every 5 seconds or so, it will take almost 7 hours on average to guess a combination.

A smart attacker pursues low-hanging fruit. In this case, we reduce the amount of guessing if we focus on likely combinations. For example, we can easily code a date consisting of month and a day into four digits. Most people have several personal dates they remember: birthdays, anniversaries, and so on. Although we won’t necessarily know which dates are significant to particular people, we dramatically reduce our guessing if we focus on the 366 possible 4-digit dates.

In the previous calculation, we could confidently say that everyone has to choose a four-digit combination from the range of 10,000 possible values. We don’t really know how many people use a date as a four-digit combination, but we can make a pessimistic but plausible estimate. If we assume that only one out of four owners uses a date as a four-digit combination, we have the following average attack space:

images

At 5 seconds per trial, it will take slightly over an hour, on average, to open one of these combination locks.

Remember how the statistics work: We can’t focus on a single lock or a single guessed combination. We need to randomly guess a different date each time. Likewise, we must randomly try different locks from a large collection.

For example, if we are in a room with hundreds of four-digit combination locks, it will take us an hour on average to open one, and we can’t predict which one ahead of time. To apply our estimate correctly, we must try one lock and move to another every 5 seconds. Even then we might not succeed at the end of the hour; the statistic simply says that success is likely.

Moreover, the attack assumes that at least one in four owners uses a date as a combination. If all owners were vigorously indoctrinated to avoid using anything except a truly random number, and fewer than 25 percent of them ignored that advice, then the average time will be longer. If our goal is to open a specific lock and we have no reason to know how the owner chose the combination, then we can’t ensure success except by planning to try every possible combination.

6.4.2 Estimating Language-Based Password Bias

We see biases in password selection because people naturally tend to choose words as passwords. Written words carry far less entropy than random collections of letters. For example, look at the first letter in this sentence (“F”) and ask, “Which letters can possibly come next?” Spelling rules limit us to only a few choices, mostly lowercase vowels. Once we include the second letter (“Fo”), there are more possibilities, but some are much more likely than others.

If we systematically analyze typical English, we find an average entropy of perhaps three letters per character in a typical word. Thus, the practical entropy in an eight-letter English password is closer to 38 (3 × 103). Compare that to 1011, the entropy in a password made up of eight randomly chosen lowercase letters. Even when we permit longer words, we clearly lose a great deal of entropy by choosing words.

This loss of entropy opens the way for a low-hanging fruit attack. The attackers don’t bother testing for all possible passwords. Instead, they use a dictionary or other list of words. If the dictionary contains lots of commonly used words, there’s a good chance that the words will match some user passwords. This is the dictionary attack.

How dangerous can a dictionary attack be? The Morris worm used a dictionary and, according to legend, cracked about half of the passwords it examined. The legend was based on guesswork and we need better information to accurately estimate the risk.

Attacking Password Hashes

In 1990, researcher Daniel Klein completed a comprehensive study of the use of words as passwords. Klein collected over 15,000 hashed passwords collected in password files from Unix systems in the United States and the United Kingdom. Klein developed a series of password guesses and tried each one by hashing the passwords in the files. First, Klein hashed the user name to see if the user name matched the password. This recovered over 300 passwords. He also tried permutations of each user’s proper name and user name. Next, Klein collected lists of proper names, place names, literary titles, mythological references, biblical terms, character names, and so on. From that list he created permutations by substituting upper- and lowercase letters or digits for letters. This yielded a dictionary of over 3 million words to test. Klein’s dictionary search cracked 24.2% of the 15,000 hashed passwords.

To estimate the average attack space, we need two pieces of information. First, we need S, the range of possibilities from which users choose their passwords. Second, we need L, the likelihood that users choose from that particular range of possibilities. Klein’s work provides both of those values.

With the passwords shown in Table 6.4, the likelihood is 100 percent that members of the community will be required to choose passwords from a particular set: If the system only accepts single-case letters, then that’s what the passwords look like. By the same token, if it allows all 94 printable characters, some users may choose a mixture of punctuation, digits, and random letters. It is easy to calculate the average attack space for these passwords; we simply divide the search space by 2. Here is the average attack space calculation based on Klein’s experiments:

images

Klein’s work was just a starting point. Similar academic experiments at the University of Cambridge a decade later achieved a 35% success rate. Today, there is password cracking available websites and downloadable software. The simplest cracking services accept a hashed password and perform a dictionary lookup using a selected hash function. More sophisticated software and services apply a set of rules to test multiword passwords and to automatically substitute typical digits, special characters, and different cases.

In 2013, the news site Ars Technica reported on a series of password cracking tests. One of their editors, who had no prior experience with password cracking, downloaded a high-performance version of the “Hashcat” cracking tool and applied it to a list of 15,000 hashed passwords. The editor, Nate Anderson, cracked 47% of the hashes in a matter of hours. Anderson then contacted experts in password cracking and asked them to try to crack the hashes. One expert cracked over 80% of the passwords in only 1 hour. Another spent 20 hours and cracked 90% of the passwords.

Part of their success arose from improved hardware performance. High-performance cracking software uses the computer’s graphics system to do hash calculations by running numerous computations in parallel. Success also relied on low-quality hashing. Many systems still use the MD5 hash, one of the earliest listed in Table 6.3. Anderson’s list used the MD5 hash. A modern hash like SHA-512 requires at least a hundred times as much processing to check a single hash value.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset