int main()

The next construct we have to examine is the line int main(), which has two new components. The first is the “return type”, which specifies the type of value that will be returned from the program when it ends. In this case, that type is int, which is an integral type exactly like short, except that its size depends on the compiler that you're using.[25] With the 32-bit compiler on the CD in this book, an int is 32 bits, or twice the size of a short. With a 16-bit compiler such as Borland C++ version 3.1, an int is the same size as a short, whereas on a 64-bit compiler an int would be 64 bits in length. I don't like to use ints where a short will do, because I want to minimize the changes in behavior of my code with compilers that use different word lengths. While it's true that there isn't much new development that uses 16-bit compilers anymore, it is also true that one of these days we'll all probably be using 64-bit compilers, and I would like my code to be as portable to them as possible.[26] However, we don't have much choice here, because the C++ language specifies that main has to have the return type int.

[25] An integral type is a type of data that can hold an integer. This includes short, int, and char, the last of these mostly for historical reasons.

[26] Unfortunately, there is no guarantee that shorts will still be 16 bits on a 64-bit compiler, but there isn't much I can do about that.

This brings us to the meaning of main(). This tells the compiler where to start executing the code: C++ has a rule that execution always starts at the place called main.[27]

[27] Actually, this is an oversimplification. Some code that we write may be executed before the beginning of main, but only under unusual circumstances that we will not encounter in this book. If you're burning with curiosity as to how this can be done (and why), there's an explanation in the section entitled “Executing Code before the Beginning of main” on page 740.

We'll get into this in more detail later in this chapter and in Chapter 5. For now, you'll just have to take my word that this is necessary; I promise I'll explain what it really means when you have enough background to understand the explanation.

There's one more construct I should tell you about here: the curly braces, { and }. The first one of these starts a section of code, in this case the code that belongs to main, and the second one ends that section of code. This is needed because otherwise the compiler wouldn't be able to tell where the code for main begins and ends. The curly braces also have other uses, one of which we'll get to later in this chapter.[28]

[28] If you look at someone else's C++ program, you're likely to see a different style for lining up the {} to indicate where a section of code begins and ends. As you'll notice, my style puts the { and } on separate lines rather than running them together with the code they enclose, to make them stand out, and indents them further than the conditional statement that controls the section of code. I find this the clearest, but this is a matter where there is no consensus. The compiler doesn't care how you indent your code or whether you do so at all; it's a stylistic issue.

You may also be puzzled by the function of the other statements in this program. If so, you're not alone. Let's see the discussion that Susan and I had about that topic.

Susan: Okay, in the example why did you have to write c2 = c1;? Why not B? Why make one thing the same thing as the other? Make it different. Why would you even want c2=c1; and not just say c1 twice, if that is what you want?

Steve: It's very hard to think up examples that are both simple enough to explain and realistic enough to make sense. You're right that this example doesn't do anything useful; I'm just trying to introduce what both the char type and the string type look like.

Susan: Also, do the names of the variables mean anything? Do the c in c1 and the s in s1 stand for anything?

Steve: Yes, the c and s stand for “char” and “string”, respectively.

Susan: Okay, that makes more sense now. But come to think of it, what does c1='A'; have to do with the statement s1= "This is a test ";? I don't see any relationship between one thing and the other.

Steve: This is the same problem as the last one. They have nothing to do with one another; I'm using an admittedly contrived example to show how these variables are used.

Susan: I am glad now that your example of chars and strings (put together) didn't make sense to me. That is progress; it wasn't supposed to.

What does this useless but hopefully instructive program do? As is always the case, we have to tell the compiler what the types of our variables are before we can use them. In this case, c1 and c2 are of type char, whereas s1 and s2 are strings. After taking care of these formalities, we can start to use the variables. In the first executable statement, c1 = 'A'; we set the char variable c1 to a literal value, in this case a capital A; we need to surround this with single quotation marks (') to tell the compiler that we mean the letter A rather than a variable named A. In the next line, c2 = c1; we set c2 to the same value as c1 , which of course is 'A' in this case. The next executable statement s1 = "This is a test "; as you might expect, sets the string variable s1 to the value "This is a test ",[29] which is a literal of a type called a C string literal. Don't confuse a C string literal with a string. A C string literal is a type of literal that we use to assign values to variables of type string. In the statement s1 = "This is a test "; we use a quotation mark, in this case the double quote ("), to tell the compiler where the literal value starts and ends.

[29] Please note that there is a space (blank) character at the end of that C string literal, after the word "test". That space is part of the literal value.

You may be wondering why we need two different kinds of quotes in these two cases. The reason is that there are actually two types of nonnumeric data, fixed-length data and variable-length data. Fixed-length data are relatively easy to handle in a program, as the compiler can set aside the correct amount of space in advance. Variables of type char are 1 byte long and can thus contain exactly one character; as a result, when we set a char to a literal value, as we do in the line c1 = 'A'; the code that executes that statement has the simple task of copying exactly 1 byte representing the literal 'A' to the address reserved for variable c1.[30]

[30] Warning: Every character inside the quotes has an effect on the value of the literal, whether the quotes are single or double; even “invisible” characters such as the space (' ') will change the literal's value. In other words, the line c1 = 'A'; is not the same as the line c1 = 'A ';. The latter statement may or may not be legal, depending on the compiler you're using, but it is virtually certain not to give you what you want, which is to set the variable c1 to the value equivalent to the character 'A'. Instead, c1 will have some weird value resulting from combining the 'A' and the space character. In the case of a string value contained in double quotes, multiple characters are allowed, so "A B" and "AB" both make sense, but the space still makes a difference; namely, it keeps the 'A' and 'B' from being next to one another.

However, C string literals such as "This is a test " are variable-length data, and dealing with such data isn't so easy. Since there could be any number of characters in a C string literal, the code that does the assignment of a literal value like "This is a test " to a string variable has to have some way to tell where the literal value ends. One possible way to provide this needed information would be for the compiler to store the length of the C string literal in memory somewhere, possibly in the location immediately before the first character in the literal. I would prefer this method; however, it is not the method used in the C language (and its descendant, the C++ language). To be fair, the inventors of C didn't make an arbitrary choice; they had reasons for their decision on how to indicate the length of a string. You see, if we were to reserve only 1 byte to store the actual length in bytes of the character data in the string, then the maximum length of a string would be limited to 255 bytes. This is because the maximum value that could be stored in the length byte, as in any other byte, is 255. Thus, if we had a string longer than 255 bytes, we would not be able to store the length of the string in the 1 byte reserved for that purpose. On the other hand, if we were to reserve 2 bytes for the length of each string, then programs that contain many strings would take more memory than they otherwise would.

While the extra memory consumption that would be caused by using a 2-byte length code may not seem significant today, the situation was considerably different when C was invented. At that time, conserving memory was very important; the inventors of C therefore chose to mark the end of a C string literal by a byte containing the value 0, which is called a null byte.[31] This solution has the advantage that only one extra byte is needed to indicate the end of a C string literal of any length. However, it also has some serious drawbacks. First, this solution makes it impossible to have a byte containing the value 0 in the middle of a C string literal, as all of the C string literal manipulation routines would treat that null byte as being the end of the C string literal. Second, it is a nontrivial operation to determine the length of a C string literal; the only way to do it is to scan through the C string literal until you find a null byte. As you can probably tell, I'm not particularly impressed with this mechanism; nevertheless, as it has been adopted into C++ for compatibility with C, we're stuck with it for literal strings in our programs.[32] Therefore, the literal string "ABCD" would occupy 5 bytes, 1 for each character, and 1 for the null byte that the compiler adds automatically at the end of the literal. But we've skipped one step: How do we represent characters in memory? There's no intuitively obvious way to convert the character 'A' into a value that can be stored in 1 byte of memory.

[31] I don't want to mislead you about this notion of a byte having the value 0; it is not the same as the representation of the decimal digit "0". As we'll see, each displayable character (and a number of invisible ones) is assigned a value to represent it when it's part of a string or literal value (i.e., a C string literal or char literal). The 0 byte I'm referring to is a byte with the binary value 0.

[32] Happily, we can improve on it in most other circumstances, as you'll see later.

The answer, at least for our purposes in English, is called the ASCII code standard. This stands for American Standard Code for Information Interchange, which as the name suggests was invented to allow the interchange of data between different programs and makes of computers. Before the invention of ASCII, such interchange was difficult or impossible, since every manufacturer made up its own code or codes. Here are the specific character codes that we have to be concerned with for the purposes of this book:

  1. The codes for the capital letters start with hex 41 for 'A', and run consecutively to hex 5a for 'Z'.

  2. The codes for the lower case letters start with hex 61 for 'a', and run consecutively to hex 7a for 'z'.[33]

    [33] You may wonder why I have to specify that the codes for each case of letters run consecutively. Believe it or not, there are a number of slightly differing codes collectively called EBCDIC (Extended Binary Coded Decimal Interchange Code), in which this is not true! Eric Raymond's amusing and interesting book, The New Hacker's Dictionary, has details on this and many other historical facts.

  3. The codes for the numeric digits start with hex 30 for '0', and run consecutively to hex 39 for '9'.

Given these rules, the memory representation of the string "ABCD" might look something like Figure 3.17.

Figure 3.17. Yet another small section of RAM


Now that we see how strings are represented in memory, I can explain why we need two kinds of quotes. The double quotes tell the compiler to add the null byte at the end of the string literal, so that when the assignment statement s1 = "This is a test "; is executed, the program knows when to stop copying the value to the string variable.

A Byte by Any Other Name...

Have you noticed that I've played a little trick here? The illustration of the string "ABCD" should look a bit familiar; its memory contents are exactly the same as in Figure 3.2, where we were discussing numeric variables. I did this to illustrate an important point: the contents of memory actually consists of uninterpreted bytes, which have meaning only when used in a particular way by a program. That is, the same bytes can represent numeric data or characters, depending on how they are referred to.

This is one of the main reasons why we need to tell the C++ compiler what types our variables have. Some languages allow variables to be used in different ways at different times, but in C++ any given variable always has the same type; for example, a char variable can't change into a short. At first glance, it seems that it would be much easier for programmers to be able to use variables any way they like; why is C++ so restrictive?

The C++ type system, as this feature of a language is called, is specifically designed to minimize the risk of misinterpreting or otherwise misusing a variable. It's entirely too easy in some languages to change the type of a variable without meaning to and the resulting errors can be very difficult to find, especially in a large program. In C++, the usage of a variable can be checked by the compiler. Such static type checking allows the compiler to tell you about many errors that otherwise would not be detected until the program is running (dynamic type checking). This is particularly important in systems that need to run continuously for long periods of time. While you can reboot your machine if your word processor crashes due to a run-time error, this is not acceptable as a solution for errors in the telephone network, for example.

Of course, you probably won't be writing programs demanding that degree of reliability any time soon, but strict static type checking is still worthwhile in helping eliminate errors at the earliest possible stage in the development of our programs.

Nonprinting Characters

After that infomercial for the advantages of static type checking, we can resume our examination of strings. You may have noticed that there's a space character at the end of the string "This is a test ". That's another reason why we have to use a special character like " (the double quote) to mark the beginning and end of a string; how else would the compiler know whether that space is supposed to be part of the string? The space character is one of the nonprinting characters (or nondisplay characters) that controls the format of our displayed or printed information; imagine how hard it would be to read this book without space characters! While we're on the subject, I should also tell you about some other characters that have special meaning to the compiler. They are listed in Figure 3.18.

Figure 3.18. Special characters for program text


I'll be more specific about the uses of parentheses later, when we have seen some examples. As for the backslash, if you wanted to (for example) insert a " in a string, you would have to use ", because just a plain " would indicate the end of the string. That is, if you wanted to display This is a "string"., you would have to write the value of the string as "This is a "string"."

I compiled Figure 3.18 at the instigation of guess who:

Susan: How about you line up all your cute little " ' ; things and just list their meanings? I forget what they are by the time I get to the next one. Your explanations of them are fine, but they are scattered all over; I want one place with all the explanations.

Steve: That's a good idea. As usual, you're doing a good job representing the novices; keep up the good work!

Our next task, after a little bit of practice with the memory representation of a C string literal, will be to see how we get the values of our strings to show up on the screen.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset