Exercises, First Set

1.Assume that a short variable named z starts at location 1001 in a little-endian machine. Using Figure 3.5 for the contents of memory, what is the value of z, in hex?

Figure 3.5. Exercise 1


Playing Compiler

I can almost hear the wailing and tooth gnashing out there. Do I expect you to deal with these instructions and addresses by yourself? You'll undoubtedly be happy to know that this isn't necessary, as the compiler takes care of these details. However, if you don't have some idea of how a compiler works, you'll be at a disadvantage when you're trying to figure out how to make it do what you want. Therefore, we're going to spend the next few pages “playing compiler”; that is, I'll examine each statement in a small program fragment and indicate what action the compiler might take as a result. I'll simplify the statements a bit to make the explanation simpler; you should still get the idea. Figure 3.6 illustrates the set of statements that I'll compile:[18]

[18] As I've mentioned previously, blank lines are ignored by the compiler; you can put them in freely to improve readability.

Figure 3.6. A really little numeric calculation
short i;
short j;

i = 5;
j = i + 3;

Here are the rules of this game:

  1. All numbers in the C++ program are decimal; all addresses and numbers in the machine instructions are hexadecimal.[19]

    [19] However, I've cheated here by using small enough numbers in the C++ program that they are the same in hex as in decimal.

  2. All addresses are 2 bytes long.[20]

    [20] The real compiler on the CD actually uses 4-byte addresses, but this doesn't change any of the concepts involved.

  3. Variables are stored at addresses starting at 1000.

  4. Machine instructions are stored at addresses starting at 2000.[21]

    [21] These addresses are arbitrary; a real compiler will assign addresses to variables and machine instructions by its own rules.

  5. A number not enclosed in [] is a literal value, which represents itself. For example, the instruction mov ax,1000 means to move the value 1000 into the ax register.

  6. A number enclosed in [] is an address, which specifies where data are to be stored or retrieved. For example, the instruction mov ax,[1000] means to move 2 bytes of data starting at location 1000, not the value 1000 itself, into the ax register.

  7. All the data values are shown as “?? ??” to indicate that the variables have not had values assigned to them yet.

Now let's start compiling. The first statement, short i; says to allocate storage for a 2-byte variable called i that will be treated as signed (because that's the default). Since no value has been assigned to this variable yet, the resulting “memory map" looks like Figure 3.7.

Figure 3.7. Compiling, part 1


As you might have guessed, this exercise was the topic of a considerable amount of discussion with Susan. Here's how it started:

Susan: So the first thing we do with a variable is to tell the address that its name is i, but no one is home, right? It has to get ready to accept a value. Could you put a value in it without naming it, just saying address 1000 has a value of 5? Why does it have to be called i first?

Steve: The reason that we use names instead of addresses is because it's much easier for people to keep track of names than it is to keep track of addresses. Thus, one of the main functions of a compiler is to allow us to use names that are translated into addresses for the computer's use.

The second statement, short j; tells me to allocate storage for a 2-byte variable called j that will be treated as signed (because that's the default). Since no value has been assigned to this variable yet, the resulting “memory map” looks like Figure 3.8.

Figure 3.8. Compiling, part 2


Here's the exchange about this step:

Susan: Why isn't the address for j 1001?

Steve: Because a short is 2 bytes, not 1. Therefore, if i is at address 1000, j can't start before 1002; otherwise, the second byte of i would have the same address as the first byte of j, which would cause chaos in the program. Imagine changing i and having j change by itself!

Susan: Okay. I just thought that each address represented 2 bytes for some reason. Then in reality each address always has just 1 byte?

Steve: Every byte of RAM has a distinct address, and there is one address for each byte of RAM. However, it is often necessary to read or write more than one byte at a time, as in the case of a short, which is 2 bytes in length. The machine instructions that read or write more than 1 byte specify only the address of the first byte of the item to be read or written; the other byte or bytes of that item follow the first byte immediately in memory.

Susan: Okay, this is why I was confused. I thought when you specified that the RAM address 1000 was a short (2 bytes), it just made room for 2 bytes. So when you specify address 1000 as a short, you know that 1001 will also be occupied with what you put in 1000.

Steve: Or to be more precise, location 1001 will contain the second byte of the short value that starts in byte 1000.

The next line is blank, so we skip it. This brings us to the statement i = 5; which is an executable statement, so we need to generate one or more machine instructions to execute it. We have already assigned address 1000 to i, so we have to generate instructions that will set the 2 bytes starting at address 1000 to the value that represents 5. One way to do this is to start by setting ax to 5, by the instruction mov ax,5, then storing the contents of ax (5, of course) into the two-byte location where the value of i is kept, namely the two bytes starting at address 1000, via the instruction mov [1000],ax.

Figure 3.9 shows what our “memory map” looks like so far.

Figure 3.9. Compiling, part 3


Here's the next installment of my discussion with Susan on this topic:

Susan: When you use ax in an instruction, that is a register, not RAM?

Steve: Yes.

Susan: How do you know you want that register and not another one? What are the differences in the registers? Is ax the first register that data will go into?

Steve: For our current purposes, all of the 16-bit general registers (ax, bx, cx, dx, si, di, bp) are the same. Some of them have other uses, but all of them can be used for simple arithmetic such as we're doing here.

Susan: How do you know that you are not overwriting something more important than what you are presently writing?

Steve: In assembly language, the programmer has to keep track of that; in the case of a compiled language, the compiler takes care of register allocation for you, which is another reason to use a compiler rather than writing assembly language programs yourself.

Susan: If it overwrites, you said important data will go somewhere else. How will you know where it went? How does it know whether what is being overwritten is important? Wait. If something is overwritten, it isn't gone, is it? It is just moved, right?

Steve: The automatic movement of data that you're referring to applies only to cached data being transferred to RAM. That is, if a slot in the cache is needed, the data that it previously held is written out to RAM without the programmer's intervention. However, the content of registers is explicitly controlled by the programmer (or the compiler, in the case of a compiled language). If you write something into a register, whatever was there before is gone. So don't do that if you need the previous contents!

Susan: OK. Now I have another question: How do you know that the value 5 will require 2 bytes?

Steve: In C++, because it's a short. In assembly language, because I'm loading it into ax, which is a 2-byte register.

Susan: That makes sense. Now why do the variable addresses start at 1000 and the machine addresses start at 2000?

Steve: It's arbitrary; I picked those numbers out of the air. In a real program, the compiler decides where to put things.

Susan: What do you mean by machine address? What is the machine? Where are the machine addresses?

Steve: A machine address is a RAM address. The machine is the CPU. Machine addresses are stored in the instructions so the CPU knows which RAM location we're referring to.

Susan: We talked about storing instructions before; is this what we are doing here? Are those instructions the “machine instructions”?

Steve: Yes.

Susan: Now, this may sound like a very dumb question, but please tell me where 5 comes from? I mean if you are going to move the value of 5 into the register ax, where is 5 hiding to take it from and to put it in ax? Is it stored somewhere in memory that has to be moved, or is it simply a function of the user just typing in that value?

Steve: It is stored in the instruction as a literal value. If you look at the assembly language illustration on page 86, you will see that the mov ax,5 instruction translates into the three bytes b8 05 00; the 05 00 is the 5 in “little-endian” notation.

Susan: Now, what is so magical about ax (or any register for that matter) that will transform the address 1000 to hold the value of 5?

Steve: The register doesn't do it; the execution of the instruction mov [1000],ax is what sets the memory starting at address 1000 to the value 5.

Susan: What are those numbers supposed to be in the machine instruction box? Those are bytes? Bytes of what? Why are they there? What do they do?

Steve: They represent the actual machine language program as it is executed by the CPU. This is where “the rubber meets the road”. All of our C++ or even assembly language programs have to be translated into machine language before they can be executed by the CPU.

Susan: So this is where 5 comes from? I can't believe that there seems to be more code. What is b8 supposed to be? Is it some other type of machine language?

Steve: Machine language is exactly what it is. The first byte of each instruction is the “operation code”, or “op code” for short. That tells the CPU what kind of instruction to execute; in this case, b8 specifies a “load register ax with a literal value” instruction. The literal value is the next 2 bytes, which represent the value 5 in “little-endian” notation; therefore, the full translation of the instruction is “load ax with the literal value 5”.

Susan: So that is the “op code”? Okay, this makes sense. I don't like it, but it makes sense. Will the machine instructions always start with an op code?

Steve: Yes, there's always an op code first; that's what tells the CPU what the rest of the bytes in the instruction mean.

Susan: Then I noticed that the remaining bytes seem to hold either a literal value or a variable address. Are those the only possibilities?

Steve: Those are the ones that we will need to concern ourselves with.

Susan: I don't understand why machine addresses aren't in 2-byte increments like variable addresses.

Steve: Variable addresses aren't always in 2-byte increments either; it just happens that short variables take up 2 bytes. Other kinds of variables can and often do have other lengths.

Susan: So even though variable addresses are the same as instruction addresses they really aren't because they can't share the same actual address. That is why you distinguish the two by starting the instruction addresses at 2000 in the example and variable addresses at 1000, right?

Steve: Right. A particular memory location can hold only one data item at a time. As far as RAM is concerned, machine instructions are just another kind of data. If a particular location is used to store one data item, you can't store anything else there at the same time, whether it's instructions or data.

The last statement, j = i + 3; is the most complicated statement in our program, and it's not that complicated. As with the previous statement, it's executable, which means we need to generate machine instructions to execute it. Because we haven't changed ax since we used it to initialize the variable i with the value 5, it still has that value. Therefore, to calculate the value of j, we can just add 3 to the value in ax by executing the instruction add ax,3. After the execution of this instruction, ax will contain i + 3. Now all we have to do is to store that value in j. As indicated in the translation of the statement short j; the address used to hold the value of j is 1002. Therefore, we can set j to the value in ax by executing the instruction mov [1002],ax.

By the way, don't be misled by this example into thinking that all machine language instructions are 3 bytes in length. It's just a coincidence that all of the ones I've used here are of that length. The actual size of an instruction on the Intel CPUs can vary considerably, from 1 byte to a theoretical maximum of 12 bytes. Most instructions in common use, however, range from 1 to 5 bytes.

Figure 3.10 shows what the “memory map” looks like now.

Figure 3.10. Compiling, part 4


Here's the rest of the discussion that we had about this little exercise:

Susan: In this case mov means add, right?

Steve: No, mov means “move” and add means “add”. When we write mov ax,5, it means “move the value 5 into the ax register”. The instruction add ax,3 means “add 3 to the current contents of ax, replacing the old contents with this new value”.

Susan: So you're moving 5 but adding 3? How do you know when to use mov and when to use add if they both kind of mean the same thing?

Steve: It depends on whether you want to replace the contents of a register without reference to whatever the contents were before (mov) or add something to the contents of the register (add).

Susan: OK, here is what gets me: how do you get from address 1000 and i=5 to ax? No, that's not it; I want you to tell me what is the relationship between ax and address 1000. I see ax as a register and that should contain the addresses, but here you are adding ax to the address. This doesn't make sense to me. Where are these places? Is address 1000 in RAM?

Steve: The ax register doesn't contain an address. It contains data. After the instruction mov ax,5, ax contains the number 5. After the instruction mov [1000],ax, memory location 1000 contains a copy of the 2-byte value in register ax; in this case, that is the value of the short variable i.

Susan: So do the machine addresses represent actual bytes?

Steve: The machine addresses specify the RAM locations where data (and programs) are stored.

Executing Our Little Program Fragment

Having examined what the compiler does at compile time with the preceding little program fragment, let's see what happens when the compiled program is executed at run time. When we start out, the sections of RAM we're concerned with will look like Figure 3.11; in each of these figures, the italic text indicates the next instruction to be executed.Now let's start executing the program. The first instruction, mov ax,5, as we saw earlier, means “set the contents of ax to the value 5”.

Figure 3.11. Before execution


Figure 3.12 shows the situation after mov ax,5 is executed. As you can see, executing mov ax,5 has updated the contents of ax, and we've advanced to the next instruction, mov [1000],ax. When we have executed that instruction, the situation looks like Figure 3.13, with the variable i set to 5. Figure 3.14 shows the result after the following instruction, add ax,3, is executed.

Figure 3.12. After the first instruction is executed


Figure 3.13. After execution of second instruction


Figure 3.14. After execution of third instruction


As expected, add ax,3 has increased the contents of ax by the value 3, producing 8. Now we're ready for the final instruction.

Figure 3.15 shows the situation after the final instruction, mov [1002],ax, has been executed.

Figure 3.15. After execution of final instruction


After executing the final instruction, mov [1002],ax, the variable i has the value 5 and the variable j has the value 8.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset