How the CPU Stores and Manipulates Data in Memory

In a moment we're going to dive a little deeper into how the CPU accomplishes its task of manipulating data, such as we are doing here with our arithmetic program. First, though, it's time for a little pep talk for those of you who might be wondering exactly why this apparent digression is necessary. It's because if you don't understand what is going on under the surface, you won't be able to get past the “Sunday driver” stage of programming in C++. In some languages it's neither necessary nor perhaps even possible to find out what the computer actually does to execute your program, but C++ isn't one of them. A good C++ programmer needs an intimate acquaintance with the internal workings of the language, for reasons which will become very apparent when we get to Chapter 6. For the moment, you'll just have to take my word that working through these intricacies is essential; the payoff for a thorough grounding in these fundamental concepts of computing will be worth the struggle.

Now let's get to the task of exploring how the CPU actually stores and manipulates data in memory. As we saw previously, each memory location in RAM has a unique memory address; machine instructions that refer to RAM use this address to specify which byte or bytes of memory they wish to retrieve or modify. This is fairly straightforward in the case of a 1-byte variable, where the instruction merely specifies the byte that corresponds to the variable. On the other hand, the situation isn't quite as simple in the case of a variable that occupies more than 1 byte. Of course, no law of nature says that an instruction couldn't contain a number of addresses, one for each byte of the variable. However, this solution is never adopted in practice, as it would make instructions much longer than they need to be. Instead, the address in such an instruction specifies the first byte of RAM occupied by the variable, and the other bytes are assumed to follow immediately after the first one. For example, in the case of a short variable, which as we have seen occupies 2 bytes of RAM, the instruction would specify the address of the first byte of the area of RAM in which the variable is stored.[12] However, there's one point that I haven't brought up yet: how the data for a given variable are actually arranged in memory. For example, suppose that the contents of a small section of RAM (specified as two hex digits per byte) look like Figure 3.2.

[12] Actually, the C++ language does not require that a short variable contain exactly two bytes. However, it does on current Intel CPUs and other current CPUs of which I am aware.

Figure 3.2. A small section of RAM


Also suppose that a short variable i is stored starting at address 1000. To do much with a variable, we're going to have to load it into a general register, one of the small number of named data storage locations in the CPU intended for general use by the programmer; this proximity allows the CPU to operate on data in the registers at maximum speed. You may recall that there are seven general registers in the 386 CPU (and its successors); they're named eax, ebx, ecx, edx, esi, edi, and ebp.[13] Unfortunately, there's another complication here; these registers are designed to operate on 4-byte quantities, while our variable i, being of type short, is only two bytes long. Are we out of luck? No, but we do have to specify how long the variable is that we want to load. This problem is not unique to Intel CPUs, since any CPU has to have the ability to load different-sized variables into registers. Different CPUs use different methods of specifying this important piece of information; in the Intel CPUs, one way to do this is to alter the register name.[14] As we saw in the discussion of the development of Intel machines, we can remove the leading e from the register name to specify that we're dealing with 2-byte values; the resulting name refers to the lower two bytes of the 4-byte register. Therefore, if we wanted to load the value of i into register ax (that is, the lower half of register eax), the instruction could be written as follows:[15]

[13] Besides these general registers, a dedicated register called esp plays an important role in the execution of real programs. We'll see how it does this in Chapter 5.

[14] This is not the only possible solution to this problem, nor necessarily the best one. For example, in many Motorola CPUs, you specify the length of the variable directly in the instruction, so loading a word (i.e., 2-byte) variable might be specified by the instruction move.w, where the .w means “word”. Similarly, a longword (i.e., 4-byte) load might be specified as move.l, where the .l means “long word”.

[15] It's also possible to load a 2-byte value into a 32-bit (i.e., 4-byte) register such as eax and have the high part of that register set to 0 in one instruction, by using an instruction designed specifically for that purpose. This approach has the advantage that further processing can be done with the 32-bit registers.

mov ax,[1000][16]
				

[16] The number inside the brackets [ ] represents a memory address.

As usual, our resident novice Susan had some questions on this topic. Here is our conversation:

Susan: If you put something into 1000 that is “too big” for it, then it spills over to the next address?

Steve: Sort of. When you “put something into 1000” , you have to specify exactly what it is you're “putting in”. That is, it must be either a short, a char, or some other type of variable that has a defined size.

Susan: Is that how it works? Why then is it not necessary to specify that it is going to have to go into 1000 and 1001? So what you put in is not really in 1000 anymore, it is in 1000 and 1001? How do you refer to its REAL address? What if there is no room in 1001? Would it go to 2003 if that is the next available space?

Steve: Because the rule is that you always specify the starting address of any item (variable or constant) that is too big to fit in 1 byte. The other bytes of the item are always stored immediately following the address you specify. No bytes will be skipped when storing (or loading) one item; if the item needs 4 bytes and is to be stored starting at 1000, it will be stored in 1000–1003.

Susan: I see. In other words, the compiler will always use the next bytes of RAM, however many need to be used to store the item?

Steve: Right.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset