Using the 16-bit Register Names

Before we took this detour into the binary and hexadecimal number systems, I promised to explain what it means to say that the instructions using the 16-bit register names “behave exactly as though they were accessing the 16-bit registers on earlier machines”. After a bit more preparation, we'll be ready for that explanation.

First, let's take a look at some characteristics of the human-readable version of machine instructions: assembly language instructions. The assembly language instructions we will look at have a fairly simple format.[33] The name of the operation is given first, followed by one or more spaces. The next element is the “destination”, which is the register or RAM location that will be affected by the instruction's execution. The last element in an instruction is the “source”, which represents another register, a RAM location, or a constant value to be used in the calculation. The source and destination are separated by a comma.[34] Here's an example of a simple assembly language instruction:

[33] I'm simplifying here. There are instructions that follow other formats, but we'll stick with the simple ones for the time being.

[34] The actual machine instructions being executed in the CPU don't have commas, register names, or any other human-readable form; they consist of fixed-format sequences of bits stored in RAM. The CPU actually executes machine language instructions rather than assembly language ones; a program called an assembler takes care of translating the assembly language instructions into machine instructions. However, we can usually ignore this step, because each assembly language instruction corresponds to one machine instruction. This correspondence is quite unlike the relationship between C++ statements and machine instructions, which is far more complex.

add  ax,1

In this instruction, add is the operation, ax is the destination, and the constant value 1 is the source. Thus, add ax,1 means to add 1 to the contents of ax, replacing the old contents of ax with the result.

Let's see what Susan has to say about the makeup of an assembly language instruction:

Susan: Why is the destination first and the source last? That seems backward to me.

Steve: I agree, it does seem backward. That's just the way Intel did it in their assembly language. Other machines and other assemblers have different arrangements; for example, Motorola 68000 assembly language has the source on the left and the destination on the right.

Susan: So the destination can be a register, cache, or RAM?

Steve: Yes, that's right. However, the cache is transparent to the programmer. That is, you don't say “write to the cache” or “read from the cache”; you just use RAM addresses and the hardware uses the cache as appropriate to speed up access to frequently used locations. On the other hand, you do have to address registers explicitly when writing an assembly language program.

Now we're finally ready to see what the statement about using the 16-bit register names on a 32-bit machine means. Suppose we have the register contents shown in Figure 2.8 (indicated in hexadecimal). If we were to add 1 to register ax, by executing the instruction add ax,1, the result would be as shown in Figure 2.9.

Figure 2.8. 32 and 16 bit registers, before add ax,1


Figure 2.9. 32 and 16 bit registers, after add ax,1


In case this makes no sense, consider what happens when you add 1 to 9999 on a four digit counter such as an odometer. It “turns over” to 0000, doesn't it? The same applies here: ffff is the largest number that can be represented as four hex digits, so if you add 1 to a register that has only four (hex) digits of storage available, the result is 0000.

As you might imagine, Susan was quite intrigued with the above detail; here is her reaction.

Susan: I have an understanding retention half-life of about 30 nanoseconds, but while I was reading this I was understanding it except I am boggled as to how adding 1 to ffff makes 0000, see, I am still not clear on hex. Question: When you show the contents of a 32-bit register as being 12350000, then is the 1235 the upper half and the 0000 the lower half? Is that what you are saying?

Steve: That's right!

As this illustrates, instructions that refer to ax have no effect whatever on the upper part of eax; they behave exactly as though the upper part of eax did not exist. However, if we were to execute the instruction add eax,1 instead of add ax,1, the result would look like Figure 2.10.

Figure 2.10. 32 and 16 bit registers, after add eax,1


In this case, eax is treated as a whole. Similar results apply to the other 32-bit registers and their 16-bit counterparts.

Revisiting the Memory Hierarchy

Unfortunately, it isn't possible to use only registers and avoid references to RAM entirely, if only because we'll run out of registers sooner or later. This is a good time to look back at the diagram of the “memory hierarchy” (Figure 2.2) and examine the relative speed and size of each different kind of memory.

The “size” attribute of the disk and RAM are specified in megabytes, whereas the size of an external cache is generally in the range from 64 kilobytes to a few megabytes. As I mentioned before, the internal cache is considerably smaller, usually in the 8 to 16 kilobyte range. The general registers, however, provide a total of 28 bytes of storage; this should make clear that they are the scarcest memory resource. To try to clarify why the registers are so important to the performance of programs, I've listed the “speed” attribute in number of accesses per second, rather than in milliseconds, nanoseconds, and so forth. In the case of the disk, this is about 100 accesses per second. Normal RAM can be accessed about 100 million times per second, while the external cache allows about 250 million accesses per second. The clear winners, though, are the internal cache and the registers, which can be accessed 500 million times per second and 1 billion times per second, respectively.

The Advantages and Disadvantages of Using Registers

In a way, the latter figure (1 billion accesses per second for registers) overstates the advantages of registers relative to the cache. You see, any given register can be accessed only 500 million times per second[35]; however, many instructions refer to two registers and still execute in one CPU cycle. Therefore, the maximum number of references per second is more than the number of instructions per second.

[35] As before, we are ignoring the ability of the CPU to execute more than one instruction simultaneously.

However, this leads to another question: Why not use instructions that can refer to more than one memory address (known as memory-to-memory instructions) and still execute in one CPU cycle? In that case, we wouldn't have to worry about registers; since there's (relatively) a lot of cache and very few registers, it would seem to make more sense to eliminate the middleman and simply refer to data in the cache.[36] Of course, there is a good reason for the provision of both registers and cache. The main drawback of registers is that there are so few of them; on the other hand, one of their main advantages is also that there are so few of them. Why is this?

[36] Perhaps I should remind you that the programmer doesn't explicitly refer to the cache; you can just use normal RAM addresses and let the hardware take care of making sure that the most frequently referenced data ends up in the cache.

The main reason to use registers is that they make instructions shorter: since there are only a few registers, we don't have to use up a lot of bits specifying which register(s) to use. That is, with eight registers, we only need 3 bits to specify which register we need. In fact, there are standardized 3-bit codes that might be thought of as “register addresses”, which are used to specify each register when it is used to hold a variable. Figure 2.11 is the table of these register codes.[37]

[37] Don't blame me for the seemingly scrambled order of the codes; that's the way Intel's CPU architects assigned them to registers when they designed the 8086 and it's much too late to change them now. Luckily, we almost never have to worry about their values, because the assembler takes care of the translation of register names to register addresses.

Figure 2.11. 32 and 16 bit register codes


By contrast, with a “memory-to-memory” architecture, each instruction would need at least 2 bytes for the source address, and 2 bytes for the destination address.[38] Adding 1 byte to specify what the instruction is going to do, this would make the minimum instruction size 5 bytes, whereas some instructions that use only registers can be as short as 1 byte. This makes a big difference in performance because the caches are quite limited in size; big programs don't fit in the caches, and therefore require a large number of RAM accesses. As a result, they execute much more slowly than small programs.

[38] If we want to be able to access more than 64 kilobytes worth of data, which is necessary in most modern programs, we'll need even more room to store addresses.

The Effect of Registers on Program Size

This explains why we want our programs to be smaller. However, it may not be obvious why using registers reduces the size of instructions, so here's an explanation.

Most of the data in use by a program are stored in RAM. When using a 32-bit CPU, it is theoretically possible to have over 4 billion bytes of memory (2^32 is the exact number). Therefore, that many distinct addresses for a given byte of data are possible, and to specify any of these requires 32 bits. Since there are only a few registers, specifying which one you want to use takes only a few bits; therefore, programs use register addresses instead of memory addresses wherever possible to reduce the number of bits in each instruction required to specify addresses.

I hope this is clear, but it might not be. It certainly wasn't to Susan. Here's the conversation we had on this topic:

Susan: I see that you are trying to make a point about why registers are more efficient in terms of making instructions shorter, but I just am not picturing exactly how they do this. How do you go from “make the instructions much shorter” to “we don't have to use up a lot of bits specifying which registers to use”?

Steve: Let's suppose that we want to move data from one place to another in memory. In that case, we'll have to specify two addresses: the “from” address and the “to” address. One way to do this is to store the addresses in the machine language instruction. Since each address is at least 16 bits, an instruction that contains two addresses needs to occupy at least 32 bits just for the addresses, as well as some more bits to specify exactly what instruction we want to perform. Of course, if we're using 32-bit addresses, then a “two-address” instruction would require 64 bits just for the two addresses, in addition to whatever bits were needed to specify the type of instruction.

Susan: OK. . . think I got this. . .

Steve: On the other hand, if we use registers to hold the addresses of the data, we need only enough bits to specify each of two registers. Since there aren't that many registers, we don't need as many bits to specify which ones we're referring to. Even on a machine that has 32 general registers, we'd need only 10 bits to specify two registers; on the Intel machines, with their shortage of registers, even fewer bits are needed to specify which register we're referring to.

Susan: Are you talking about the bits that are needed to define the instruction?

Steve: Yes.

Susan: How would you know how many bits are needed to specify the two registers?

Steve: If you have 32 different possibilities to select from, you need 5 bits to specify one of them, because 32 is 2 to the fifth power. If we have 32 registers, and any of them can be selected, that takes 5 bits to select any one of them. If we have to select two registers on a CPU with 32 registers, we need 10 bits to specify both registers.

Susan: So what does that have to do with it? All we are talking about is the instruction that indicates “select register” right? So that instruction should be the same and contain the same number of bits whether you have 1 or 32 registers.

Steve: There is no “select register” instruction. Every instruction has to specify whatever register or registers it uses. It takes 5 bits to select 1 of 32 items and only 3 bits to select 1 of 8 items; therefore, a CPU design that has 32 registers needs longer instructions than one that has only 8 registers.

Susan: I don't see why the number of registers should have an effect on the number of bits one instruction should have.

Steve: If you have two possibilities, how many bits does it take to select one of them? 1 bit. If you have four possibilities, how many bits does it take to select one of them? 2 bits. Eight possibilities require 3 bits; 16 possibilities require 4 bits; and finally 32 possibilities require 5 bits.

Susan: Some machines have 32 registers?

Steve: Yes. The PowerPC, for example. Some machines have even more registers than that.

Susan: If the instructions to specify a register are the same, then why would they differ just because one machine has more registers than another?

Steve: They aren't the same from one machine to another. Although every CPU that I'm familiar with has registers, each type of machine has its own way of executing instructions, including how you specify the registers.

Susan: OK, and in doing so it is selecting a register, right? An instruction should contain the same number of bits no matter how many registers it has to call on.

Steve: Let's take the example of an add instruction, which as its name implies, adds two numbers. The name of the instruction is the same length, no matter how many registers there are; that's true. However, the actual representation of the instruction in machine language has to have room for enough bits to specify which register(s) are being used in the instruction.

Susan: They are statements right? So why should they be bigger or smaller if there are more or fewer registers?

Steve: They are actually machine instructions, not C++ statements. The computer doesn't know how to execute C++ statements, so the C++ compiler is needed to convert C++ statements into machine instructions. Machine instructions need bits to specify which register(s) they are using; so, with more registers available, more bits in the instructions have to be used to specify the register(s) that the instructions are using.

Susan: Do all the statements change the values of bits they contain depending on the number of registers that are on the CPU?

Steve: Yes, they certainly do. To be more precise, the machine language instructions that execute a statement are larger or smaller depending on the number of registers in the machine because they need more bits to specify one of a larger number of registers.

Susan: “It takes five bits to select one of 32 items...”

“...and only three bits to select one of eight items.” Why?

Steve: What is a bit? It is the amount of information needed to select one of two alternatives. For example, suppose you have to say whether a light is on or off. How many possibilities exist? Two. Since a single bit has two possible states, 0 or 1, we can represent “on” by 1 and “off” by 0 and thus represent the possible states of the light by one bit.

Now suppose that we have a fan that has four settings: low, medium, high, and off. Is one bit enough to specify the current setting of the fan? No, because one bit has only two possible states, while the fan has four. However, if we use two bits, then it will work. We can represent the states by bits as follows:

bitsstate
---------
00off
01low
10medium
11high

Note that this is an arbitrary mapping; there's no reason that it couldn't be like this instead:

bitsstate
---------
00medium
01high
10off
11low

However, having the lowest “speed” (that is, off) represented by the lowest binary value (00) and the increasing speeds corresponding to increasing binary values makes more sense and therefore is easier to remember.

This same process can be extended to represent any number of possibilities. If we have eight registers, for example, we can represent each one by 3 bits, as noted previously in Figure 2.11 on page 53. That is the actual representation in the Intel architecture; however, whatever representation might have been used, it would require 3 bits to select among eight possibilities. The same is true for a machine that has 32 registers, except that you need 5 bits instead of 3.

Susan: Okay, so then does that mean that more than one register can be in use at a time? Wait, where is the room that you are talking about?

Steve: Some instructions specify only one register (a “one-register” instruction), while others specify two (a “two-register” instruction); some don't specify any registers. For example, there are certain instructions whose effect is to determine which instruction will be executed next (the so-called “branch” instructions). These instructions often do not contain any register references at all, but rather specify the address of the next instruction directly. These instructions are used to implement if statements, for loops, and other flow control statements.

Susan: So, when you create an instruction you have to open up enough “room” to talk to all the registers at once?

Steve: No, you have to have enough room to specify any one register, for a one-register instruction, or any two registers for a two-register instruction.

Susan: Well, this still has me confused. If you need to specify only one register at any given time, then why do you always need to have all the room available? Anyway, where is this room? Is it in RAM or is it in the registers themselves? Let's say you are going to specify an instruction that uses only 1 of 32 registers. Are you saying that even though you are going to use just one register you have to make room for all 32?

Steve: The “room” that I'm referring to is the bits in the instruction that specify which register the instruction is using. That is, if there are eight registers and you want to use one of them in an instruction, 3 bits need to be set aside in the instruction to indicate which register you're referring to.

Susan: So you need the bits to represent the address of a register?

Steve: Right. However, don't confuse the “address of a register” with a memory address. They have nothing to do with one another, except that they both specify one of a number of possible places to store information. That is, register ax doesn't correspond to memory address 0, and so on.

Susan: Yes, I understand the bit numbers in relation to the number of registers.

Steve: That's good.

Susan: So the “address of a register” is just where the CPU can locate the register in the CPU, not an address in RAM. Is that right?

Steve: Right. The address of a register merely specifies which of the registers you're referring to; all of them are in the CPU.

After that comedy routine, let's go back to Susan's reaction to something I said earlier about registers and variables:

Susan: The registers hold only variables... Okay, I know what is bothering me! What else is there besides variables? Besides nonvariables, please don't tell me that. (Actually that would be good, now that I think of it.) But this is where I am having problems. You are talking about data, and a variable is a type of data. I need to know what else is out there so I have something else to compare it with. When you say a register can hold a variable, that is meaningless to me, unless I know what the alternatives are and where they are held.

Steve: What else is there besides variables? Well, there are constants, like the number 5 in the statement x = 5;. Constants can also be stored in registers. For example, let's suppose that the variable x, which is a short, is stored in location 1237. In that case, the statement x = 5; might generate an instruction sequence that looks like this:

mov ax,5
mov [1237],ax

where the number in the [] is the address of the variable x. The first of these instructions loads 5 into register ax, and the second one stores the contents of ax (5, in this case) into the memory location 1237.

Sometimes, however, constants aren't loaded into registers as in this case but are stored in the instructions that use them. This is the case in the following instruction:

add ax,3

This means to add 3 to whatever was formerly in register ax. The 3 never gets into a register but is stored as part of the instruction.[39]

[39] We'll go into this whole notion of using registers to represent and manipulate variables in grotesque detail in Chapter 3.

Reading Instructions into Memory in Advance

Another way of reducing overhead is to read instructions from RAM in chunks, rather than one at a time, and feed them into the CPU as it needs them; this is called prefetching. This mechanism operates in parallel with instruction execution, loading instructions from RAM into special dedicated registers in the CPU before they're actually needed; these registers are known collectively as the prefetch queue. Since the prefetching is done by a separate unit in the CPU, the time to do the prefetching doesn't increase the time needed for instruction execution. When the CPU is ready to execute another instruction, it can get it from the prefetch queue almost instantly, rather than having to wait for the slow RAM to provide each instruction. Of course, it does take a small amount of time to retrieve the next instruction from the prefetch queue, but that amount of time is included in the normal instruction execution time.

Susan: I don't understand prefetching. What are “chunks”? I mean I understand what you have written, but I can't visualize this. So, there is just no time used to read an instruction when something is prefetched?

Steve: A separate piece of the CPU does the prefetching at the same time as instructions are being executed, so instructions that have already been fetched are available without delay when the execution unit is ready to “do” them.

The effect of combining the use of registers and prefetching the instructions can be very significant. In our example, if we use an instruction that has already been loaded, which reads data from and writes data only to registers, the timing reduces to that shown in Figure 2.12.

As I indicated near the beginning of this chapter, the manufacturers aren't lying to us: if we design our programs to take advantage of these and other similar efficiency measures, we can often approach the maximum theoretical performance figures. You've just been subjected to a barrage of information on how a computer works. Let's review before continuing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset