Initialization vs. Assignment

This is another of the places where it's important to differentiate between initialization and assignment. We can't assign a value to a const, but we can initialize it; in fact, because an uninitialized const is useless, the attempt to define a const without specifying its initial (and only) value is a compile-time error. In this case, we're initializing it to the value 256; if we just wrote const short BUFLEN;, we'd get an error report something like the one in Figure 8.35 when we tried to compile it.

Figure 8.35. Error from an uninitialized const (codestring5x.err)
STRING5X.cpp:
Error E2304 STRING5X.cpp 82: Constant variable 'BUFLEN' must be initialized in function
 operator >>(_STL::istream &,string &)
Error E2313 STRING5X.cpp 84: Constant expression required in function operator >>(_STL:
:istream &,string &)
*** 2 errors in Compile ***

Susan wanted some further explanation.

Susan: I still don't get why const is used here.

Steve: This is a different use of const than we've seen before; in this case, it's an instruction to the compiler meaning "the following 'variable' isn't really variable, but constant. Don't allow it to be modified." This allows us to use it where we would otherwise have to use a literal constant, like 256 itself. The reason that using a const is better than using a literal constant is that it makes it easier to change all the occurrences of that value. In the present case, for example, we use BUFLEN three times after its definition; if we used the literal constant 256 in all of those places, we'd have to change all of them if we decided to make the buffer larger or smaller. As it is, however, we only have to change the definition of BUFLEN and all of the places where it's used will use the new value automatically.

Susan: Okay, I think I have it now.

Now that we've disposed of that detail, let's continue with our examination of the implementation of operator >>. The next nonblank line is char Buf[BUFLEN];. This is a little different from any variable definition we've seen before; however, you might be able to guess something about it from its appearance. It seems to be defining a variable called Buf whose type is related in some way to char. But what about the [BUFLEN] part?[10]

[10] This is another common C practice; using "buf" as shorthand for "buffer", or "place to store stuff while we're working on it".

This is a definition of a variable of that dreaded type, the array; specifically, we're defining an array called Buf, which contains BUFLEN chars. As you may recall, this is somewhat like the Vec type that we've used before, except that it has absolutely no error checking; if we try to access a char that is past the end of the array, something will happen, but not anything good.[11] In this case, as in our previous use of pointers, we'll use this dangerous construct only in a very small part of our code, under controlled circumstances; the user of our string class won't be exposed to the array.

[11] See the discussion of arrays starting on page 485.

Before we continue analyzing this function, I should point out that C++ has a rule that the number of elements of an array must be known at compile time. That is, the program in Figure 8.36 isn't legal C++.

Figure 8.36. Use of a non-const array size (codestring5y.cpp)
int main()
{
  short BUFLEN = 256;
  char ch;

  char Buf[BUFLEN];

  ch = Buf[0];
}

I'll admit that I don't understand exactly why using a non-const array size is illegal; a C++ compiler has enough information to create and access an array whose length is known at run time. In fact, some compilers do allow it.[12] But it is not compliant with the standard, so we won't use it in our programs. Instead, we'll use the const value BUFLEN to specify the number of chars in the array Buf in the statement char Buf[BUFLEN];.

[12] According to Eric Raymond, there is no good reason for this limitation; it's a historical artifact. In fact, it may be removed in a future revision of the C++ standard, but for now we'll have to live with this limitation of C++.

The memset Standard Library Function

Now we're up to the first line of the executable part of the operator >> function in Figure 8.34 on page 537: memset(Buf,0,BUFLEN);. This is a call to a function called memset (short for “memory set”), which is in the standard C library. You may be able to guess from its name that it is related to the function memcmp that we used to compare two arrays of chars. If so, your guess would be correct; memset is C-talk for "set all the bytes in an area of memory to the same value". The first argument is the address of the area of memory to be set to a specified value, the second argument is the char value to which all the bytes will be set, and the third argument is the number of bytes to be set to that value, starting at the address given in the first argument. In other words, this statement will set all of the bytes in the array called Buf to 0. This is important because we're going to treat that array as a C string later. As you may recall, a C string is terminated by a null byte, so we want to make sure that the array Buf doesn't contain any junk that might be misinterpreted as part of the data we're reading in from the istream.

Next, we have an if statement controlling a function called ignore:

if (is.peek() == '
')
  is.ignore();

What exactly does this sequence do? It solves a problem with reading C string data from a file; namely, where do we stop reading? With a numeric variable, that's easy; the answer is "whenever we see a character that doesn't look like part of a number". However, with a data type like our string that can take just about any characters as part of its value, it's more difficult to figure out where we should stop reading. The solution I've adopted is to stop reading when we get to a newline (' ') character; that is, the end of a line.[13] This is no problem when reading from the keyboard, as long as each data item is on its own line, but what about reading from a file?

[13] Note that this is different from the behavior of the standard library string class, which won't keep reading data for a string from a stream past a blank.

When we read a C string from a file via the standard function getline (described in detail below), as we do in our operator >> implementation, the newline at the end of the line is discarded. As a result, the next C string to be read in starts at the beginning of the next line of the file, as we wish. This approach to handling newline characters works well as long as all of the variables being read in are strings. However, in the case of the StockItem class (for example), we needed to be able to mix shorts and strings in the file. In that case, reading a value for a short stops at the newline, because that character isn't a valid part of a numeric value. This is OK as long as the next variable to be read is also a short, because spaces and newlines at the beginning of the input data are ignored when we're reading a numeric value. However, when the next variable to be read after a short is a string, the leftover newline from the previous read is interpreted as the beginning of the data for the string, which terminates input for the string before we ever read anything into it. Therefore, we have to check whether the next available char in the input stream is a newline, in which case we have to skip it. On the other hand, if the next character to be read in is something other than a newline, we want to keep it as the first character of our string. That's what the if statement does. First, the s.peek() function call returns the next character in the input stream without removing it from the stream; then, if it turns out to be a newline, we tell the input stream to ignore it, so it won't mess up our reading of the actual data in the next line.

You won't be surprised to hear that Susan had a couple of questions about this function.

Susan: Where do peek and ignore come from?

Steve: They're defined in the iostream header file <iostream>.

Susan: How did you know that they were available?

Steve: By reading a book called C++ IOstreams Handbook by Steve Teale. However, that book is obsolete now that the standard library has been adopted. A newer book that I understand is very good is Standard C++ IOStreams and Locales Advanced Programmer's Guide and Reference by Angelika Langer & Klaus Kreft (Addison-Wesley, January 2000, ISBN 0-201-18395-1). Of course, there is also a fair amount of coverage of streams in both The C++ Programming Language and The C++ Standard Library, but those are both quite technical and not terribly well suited for beginning programmers.

Now that we've dealt with that detail, we're ready to read the data for our string. That's the job of the next line in the function: is.getline(Buf,BUFLEN,' '),. Since is is an istream, this is a member function of istream. To be precise, it's the member function that reads a number of characters into a char array. The arguments are as follows:

  1. The array into which to read characters

  2. The number of characters that the array can contain

  3. The "terminating character", where getline should stop reading characters

This function will read characters into the array (in this case Buf) until one of two events occurs:

  1. The size of the array is reached

  2. The "terminating character" is the next character to be read

Note that the terminating character is not read into the array.

Before continuing with the rest of the code for operator >>, let's take a closer look at the following two lines, so we can see why it's a bad idea to use the C string and memory manipulation library any more than we have to. The lines in question are

memset(Buf,0,BUFLEN);
is.getline(Buf,BUFLEN,'
'),

The problem is that we have to specify the length of the array Buf explicitly (as BUFLEN, in this case). In this small function, we can keep track of that length without much effort, but in a large program with many references to Buf, it would be all too easy to make a mistake in specifying its length. As we've already seen, the result of specifying a length that is greater than the actual length of the array would be a serious error in the functioning of the program; namely, some memory belonging to some other variable would be overwritten. Whenever we use the mem functions in the C library, we're liable to run into such problems. That's an excellent reason to avoid them except in strictly controlled situations, such as the present one, where the definition of the array is in the same small function as the uses of the array. By no coincidence, this is the same problem caused by the indiscriminate use of pointers; the difficulty with the C memory manipulation functions is that they use pointers (or arrays, which are essentially interchangeable with pointers), with all of the hazards that such use entails.

Now that I've nagged you sufficiently about the dangers of arrays, let's look at the rest of the operator >> code. The next statement is Str = Buf;, which sets the argument Str to the contents of the array Buf. Buf is the address of the first char in an array of chars, so its type is char*; Str, on the other hand, is a string. Therefore, this apparently innocent assignment statement actually calls string::string(char*) to make a temporary string, and then calls string::operator=(const string&) to copy that temporary string to Str. Because Str is a reference argument, this causes the string that the caller provided on the right of the >> to be set to the value of the temporary string that was just created.

Finally, we have the statement return is;. This simply returns the same istream that we got as an argument, so that the next input operator in the same statement can continue reading from the istream where we left off. Now our strings can be read from an input stream (such as cin) and written to an output stream (such as cout), just like variables known to the standard library. This allows our program that sorts strings to do some useful work.[14]

[14] The implementation of operator << will also work for any other output destination, such as a file; however, our current implementation of operator >> isn't really suitable for reading a string from an arbitrary input source. The reason is that we're counting on the input data being able to fit into the Buf array, which is 256 bytes in length. This is fine for input from the keyboard, at least under DOS, because the maximum line length in that situation is 128 characters. It will also work for our inventory file, because the lines in that file are shorter than 256 bytes. However, there's no way to limit the length of lines in any arbitrary data file we might want to read from, so this won't do as a general solution. Of course, increasing the size of the Buf array wouldn't solve the problem; no matter how large we make it, we couldn't be sure that a line from a file wouldn't be too long. One solution would be to handle long lines in sections.

Assuming that you've installed the software from the CD in the back of this book, you can try out this program. First, you have to compile it by following the compilation instructions on the CD. Then type strsort1 to run the program. You can also run it under the debugger, by following the usual instructions for that method.

Now that we've finished our upgrades to the string class, let's look back at what we've covered since our first review in this chapter.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset