Appendix II: Information Theory

A methodological tool that played a key role in the rise of the human information-processing approach is information theory, developed by Claude Shannon (1948), which Gleick (2011) characterizes as one of the most important technological developments of the mid-20th century:

Shannon’s theory made a bridge between information and uncertainty; between information and entropy and between information and chaos. It led to compact discs and fax machines, computers, and cyberspace …  and all the world’s Silicon Alleys. Information processing was born, along with information storage and information retrieval.

Information theory was developed by communication engineers to quantify the flow of information through communication channels such as telephone lines and computer systems. In the 1950s, contemporaneous with the development of signal-detection theory, psychologists began to apply the concepts of information theory to human performance (Fitts & Posner, 1967; Garner, 1962).

Information theory does not play the prominent role in human factors today that it once did, but it is still useful in many circumstances. As one example, Kang and Seong (2001) used an information-theoretic approach to quantify the perceived complexity of control room interfaces for nuclear power plants and to estimate the extent to which the interface would overload the operator’s capacity for processing information. As another, Strange et al. (2005) used information theory to perform a quantitative analysis showing that activity in a part of the brain involved in visual perception, the hippocampus, is a function of event uncertainty. More generally, Castro (2009) stated with respect to human factors in driving,

We believe a description of the road environment in information theory terms— a theory in decline over the past decades— is enriching and useful in order to understand more about driver limitations. It enables us to quantify and assess the usefulness of traffic elements with regard to the amount of information they can transmit, so that we can estimate the guarantees of such information being received. (p. 7)

Information theory is not a scientific theory. It is a system of measurement for quantifying information, as implied in the last sentence of Castro’s (2009) quote. The amount of information conveyed by the occurrence of an event (a stimulus, response, or the like) is a function of the number of possible events and their probabilities of occurring. If an event is sure to occur, then its occurrence conveys no information. For example, if I know that my car’s engine is not working, then I gain no information by turning the key in the ignition and observing that the car will not start. On the other hand, if I am uncertain about whether the engine is working, say, on a cold winter morning, then I gain information from the same event. The uncertainty of the event is the amount of information that we gain by observing it.

The general idea behind information theory is that the most efficient way to uniquely identify one of a set of events is to ask a series of binary questions. For example, if I told you that I was thinking of a number between 1 and 16, and you were to identify that number by asking me questions, you could proceed in several ways. One way would be to guess randomly each of the numbers until I said yes . Though you occasionally might guess the number the first time, on the average it would take eight questions to determine the correct number.

It would make more sense to systematically restrict the number of possibilities by asking yes -no  questions. There are many ways that you could do this, but the most efficient would be to ask the questions in such a way that each reduced the number of possible alternatives by half. For identifying one of 16 numbers, your first question might be, “ Is it between 1 and 8?”  If my answer is yes, your next question should be, “ Is it between 1 and 4?”  Proceeding in this manner, you always would identify the correct number with four questions. In fact, of all possible guessing strategies you could use, four is the minimum number of questions that would have to be asked on average to correctly identify the number.

This idea of binary questions underlies the information theory definition of information. The number of binary questions required to decode a message provides the measure of information.

When all alternatives are equally likely, the amount of information (H ) is given by

H=log2N

where N  is the number of possible events. The basic unit of information is the bit, or binary digit. Thus, an event conveys 1  bit of information when there are 2 equally likely possibilities, 2  bits when there are 4 possibilities, 3  bits when there are 8 possibilities, and, as we have demonstrated, 4  bits when there are 16 possibilities. In other words, each item in a set of 16 can be represented by a unique 4-digit binary code.

The amount of uncertainty, and thus the average information conveyed by the occurrence of one of N  possible events, is a function of the probability for each event. The maximum amount of information is conveyed when the N  events are equally likely. The average amount of information is less when the events are not equally likely. To understand this, think back to the problem of knowing whether your car’s engine is working on a cold morning. If you know that your car has problems in cold weather, so that the probability of it not starting is greater than the probability of it starting, then the car’s failure to start when you turn the key does not transmit as much information as it might have.

The uncertainty of a single event i  that occurs with probability p i  is − log2p i ; thus, the average uncertainty over all possible events is

H=i=1Npilog2pi

The equation for H  when all events are equally likely, that is, p i   =  1/N, can be easily derived from the more general equation by noting that log2pi=log2(1pi).

The importance of information theory is in analyzing the amount of information transmitted through a system. Because a person can be regarded as a communication system, computing the information input H (S) (stimulus information) and the information output H (R) (response information) will tell us about the properties of the human system. Suppose that a person’s task is to identify a letter spoken over headphones. If four equally likely stimuli can occur (say, the letters A, B, C, and D ), then there are 2  bits of stimulus information. If there are four response categories, again A, B, C, and D, each used equally often, then there are 2  bits of response information.

In most communication systems, we are interested in the output that results from a particular input. Given that, say, the stimulus A is input into the system, we can record the number of times that the responses A, B, C, and D  are output. The frequency of each stimulus-response pair can be counted, thus forming a bivariate frequency distribution (see Table 13A.1). From such a table, the joint information can be computed by the equation

H(S,R)=j=1Ni=1Npijlog2pij

where p ij  equals the relative frequency of response j  to stimulus i.

Using the joint information in a system, we can determine the amount of information transmitted through the system, or the ability of the system to carry information. If the responses correlate perfectly with the stimuli, for example if the stimulus A  is classified as A  every time, then all the information in the stimuli is maintained in the responses, and the information transmitted is 2  bits (see top panel of Table AII.1). Although the example in the table shows all of the responses as correct, note that the responses only have to be consistent. In other words, if stimulus A  were always identified as B, and vice versa, the information transmitted would still be the same. If the responses are distributed equally across the four stimuli, as in the center panel of Table AII.1, then no information is transmitted. When there is a less than perfect, nonzero correlation between stimuli and responses, as in the bottom panel of Table AII.1, then the information transmitted is between 0 and 2  bits. To determine the amount of information transmitted, we must calculate the stimulus information, response information, and joint information. Transmitted information is then given by

T(S,R)=H(S)+H(R)H(S,R)

For the data in the bottom panel of Table AII.1, the amount of transmitted information is computed as follows (see Table AII.2). By summing across the frequencies of the responses to each stimulus, we can determine that each stimulus was presented 24 times. Because the four stimuli were equally likely, we compute the stimulus information to be log2 4 or 2. bits (see Appendix  III for values of log2N  and p  log2p ). By summing across the stimuli, we can determine that the responses were not made equally often. Thus, we must use the second equation, where p i  is the relative frequency of response i, to calculate the response information to be 1.92  bits. Similarly, the joint information is found by using the third equation to be 3.64  bits. The information transmitted then can be found by adding the stimulus information and response information (2.00  +  1.92  =  3.92) and subtracting the joint information from the result (3.92  −   3.64  =  0.28). Thus, in this example, the transmitted information is 0.28  bits.

Among other things, information theory has been applied to the measurement of the human’s ability to make absolute judgments, like the letter identification task we used for our example. This ability is important when an operator needs to identify displayed signals accurately. Usually, as the amount of stimulus information increases, the amount of information a person transmits increases and then levels off. This asymptotic value of information transmitted can be seen as the channel capacity of the human information-processing system. For example, the channel capacity for discriminating distinct pitches of tones is approximately 2.3  bits, or five pitches (Pollack, 1952). This means that in situations that require a listener to distinguish between six or more pitches, the listener will make mistakes. Across a variety of sensory dimensions, the channel capacity is approximately 2.5  bits of information. This point was stressed in a classic article by George Miller (1956) on limitations in perception and memory, called “ The Magical Number Seven, Plus or Minus Two.” 

Perhaps of most concern to human factors specialists is the fact that this limit in the number of stimuli that a person can identify accurately applies only to stimuli that vary on one dimension. When two dimensions, for example pitch and location, are varied simultaneously, the capacity for transmitting information increases. So, if we present a listener with a series of tones to identify, her channel capacity will probably not exceed 2.3  bits. But if we present the same tones, half originating from the left headphone and half originating from the right, her channel capacity will increase. This tells us that we should use multidimensional stimuli in situations where more than just a few potential signals can occur.

As discussed in Chapters  13 and  14, information theory has been used to describe the relationship between uncertainty and response time, as well as movement time (Schmidt & Lee, 2011). However, in recent years, research in human information processing has become less concerned with information theory and more concerned with information flow. Modern research emphasizes developing models of the processes and representations that intervene between stimuli and responses, rather than just looking at the correspondences between them. Nevertheless, information theory’s emphasis on uncertainty continues to play an important role in contemporary human performance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset