Adding Further Facilities to our string class

At this point, we have a fairly minimal string class. We can create a string, assign it a literal value in the form of a C string literal, and copy the value of one string to another; we can even pass a string as a value argument. Now we'll use our existing techniques along with some new ones to improve the facilities that the string class provides.

To make this goal more concrete, let's suppose that we want to modify the sorting program of Chapter 4 to sort strings, rather than shorts. To use the sorting algorithm from that program, we'll need to be able to compare two strings to see which would come after the other in the dictionary, as we can compare two shorts to see which is greater. We also want to be able to use cout and << to display strings on the screen and cin and >> to read them from the keyboard.

Before we go into the changes needed in the string class to allow us to write a string sorting program, Figure 8.20 shows our goal: the selection sort adapted to sort a Vec of strings instead of one of shorts.

Assuming that you've installed the software from the CD in the back of this book, you can compile this program in the usual way, then run it by typing strsort1. You'll see that it indeed prints out the information in the StockItem object. You can also run it under the debugger, by following the usual instructions for that method.

Figure 8.20. Sorting a Vec of strings (codestrsort1.cpp)
#include <iostream>
using std::cin;
using std::cout;
using std::endl;

#include "string5.h"
#include "Vec.h"

int main()
{
  Vec<string> Name(5);
  Vec<string> SortedName(5);
  string LowestName;
  short FirstIndex;
  short i;
  short k;
  string HighestName = "zzzzzzzz";

  cout << "I'm going to ask you to type in five last names." << endl;

  for (i = 0; i < 5; i ++)
   {
   cout << "Please type in name #" << i+1 << ": ";
   cin >> Name[i];
   }

  for (i = 0; i < 5; i ++)
     {
     LowestName = HighestName;
     FirstIndex = 0;
     for (k = 0; k < 5; k ++)
        {
        if (Name[k] < LowestName)
            {
            LowestName = Name[k];
            FirstIndex = k;
            }
        }
     SortedName[i] = LowestName;
     Name[FirstIndex] = HighestName;
     }

  cout << "Here are the names, in alphabetical order: " << endl;
  for (i = 0; i < 5; i ++)
     cout << SortedName[i] << endl;

  return 0;
}

Susan had a couple of comments and questions about this program:

Susan: Why aren't you using caps when you initiate your variable of HighestName; I don't understand why you use "zzzzzzzzzz" instead of "ZZZZZZZZZZ"? Are you going to fix this later so that caps will work the same way as lower case letters?

Steve: If I were to make that change, the program wouldn't work correctly if someone typed their name in lower case letters because lower case letters are higher in ASCII value than upper case letters. That is, "abc" is higher than "ZZZ". Thus, if someone typed in their name in lower case, the program would fail to find their name as the lowest name. Actually, the way the string sorting function works, "ABC" is completely different from "abc"; they won't be next to one another in the sorted list. We could fix this by using a different method of comparing the strings that would ignore case, if that were necessary.

If you compare this program to the original one that sorts short values (Figure 4.6 on page 160), you'll see that they are very similar. This is good because that's what we wanted to achieve. Let's take a look at the differences between these two programs.

  1. First, we have several using declarations to tell the compiler what we mean by cin, cout, and endl. In the original program, we used a blanket using namespace std; declaration, so we didn't need specific using declarations for these names. Now we're being more specific about which names we want to use from that library and which are ours.

  2. The next difference is that we're sorting the names in ascending alphabetical order, rather than descending order of weight as with the original program. This means that we have to start out by finding the name that would come first in the dictionary (the "lowest" name). By contrast, in the original program we were looking for the highest weight, not the lowest one; therefore, we have to do the sort "backward" from the previous example.

  3. The third difference is that the Vecs Name and SortedName are collections of strings, rather than the corresponding Vecs of shorts in the first program: Weight and SortedWeight.

  4. The final difference is that we've added a new variable called HighestName, which plays the role of the value 0 that was used to initialize HighestWeight in the original program. That is, it is used to initialize the variable LowestName to a value that will certainly be replaced by the first name we find, just as 0 was used to initialize the variable HighestWeight to a value that had to be lower than the first weight we would find. The reason why we need a "really high" name rather than a "really low" one is because we're sorting the "lowest" name to the front, rather than sorting the highest weight to the front as we did originally.

You may think these changes to the program aren't very significant. That's a correct conclusion; we'll spend much more time on the changes we have to make to our string class before this program will run, or even compile. The advantage of making up our own data types (like strings) is that we can make them behave in any way we like. Of course, the corresponding disadvantage is that we have to provide the code to implement that behavior and give the compiler enough information to use that code to perform the operations we request. In this case, we'll need to tell the compiler how to compare strings, read them in via >> and write them out via <<. Let's start with Figure 8.21, which shows the new interface specification of the string class, including all of the new member functions needed to implement the comparison and I/O operators, as well as operator ==, which we'll implement later in the chapter.

Figure 8.21. The updated string class interface, including comparison and I/O operators (codestring5.h)
class string
{
friend std::ostream& operator << (std::ostream& os, const string& Str);
friend std::istream& operator >> (std::istream& is, string& Str);

public:
  string();
  string(const string& Str);
  string& operator = (const string& Str);
  ~string();

  string(char* p);
  short GetLength();
  bool operator < (const string& Str);
  bool operator == (const string& Str);

private:
  short m_Length;
  char* m_Data;
};

I strongly recommend that you print out the files that contain this interface and its implementation, as well as the test program, for reference as you are going through this part of the chapter; those files are string5.h, string5.cpp, and strtst5.cpp, respectively.

Implementing operator <

Our next topic is operator < (the "less than" operator), which we need so that we can use the selection sort to arrange strings by their dictionary order. The declaration of this operator is similar to that of operator =, except that rather than defining what it means to say x = y; for two strings x and y, we are defining what it means to say x < y. Of course, we want our operator < to act analogously to the < operator for short values; that is, our operator will compare two strings and return true if the first string would come before the second string in the dictionary and false otherwise, as needed for the selection sort.

All right, then, how do we actually implement this undoubtedly useful facility? Let's start by examining the function declaration bool string::operator < (const string& Str); a little more closely. This means that we're declaring a function that returns a bool and is a member function of class string; its name is operator <, and it takes a constant reference to a string as its argument. As we've seen before, operators don't look the same when we use them as when we define them. In the sorting program in Figure 8.20, the line if (Name[k] < LowestName) actually means if (Name[k].operator < (LowestName)). In other words, if the return value from the call to operator < is false, then the if expression will also be considered false and the controlled block of the if won't be executed. On the other hand, if the return value from the call to operator < is true, then the if expression will also be considered true and the controlled block of the if will be executed. To make this work correctly, our version of operator < will return the value true if the first string is less than the second and false otherwise.

Now that we've seen how the compiler will use our new function, let's look at its implementation, which follows these steps:

1.
Determine the length of the shorter of the two strings.

2.
Compare a character from the first string with the corresponding character from the second string.

3.
If the character from the first string is less than the character from the second string, then we know that the first string precedes the second in the dictionary, so we're done and the result is true.

4.
If the character from the first string is greater than the character from the second string, then we know that the first string follows the second in the dictionary. Therefore, we're done and the result is false.

5.
If the two characters are the same and we haven't come to the end of the shorter string, then move to the next character in each string, and go back to step 2.

6.
When we run out of characters to compare, if the strings are the same length then the answer is that they are identical, so we're done and the result is false.

7.
On the other hand, if the strings are different in length, and if we run out of characters in the shorter string before finding a difference between the two strings, then the longer string follows the shorter one in the dictionary. In this case, the result is true if the second string is longer and false if the first string is longer.

You may be wondering why we need special code to handle the case where the strings differ in length. Wouldn't it be simpler to compare up to the length of the longer string?

Details of the Comparison Algorithm

As it happens, that approach would work properly so long as both of the strings we're comparing have a null byte at their ends and neither of them have a null byte anywhere else. To see the reason for the limitation of that approach, let's look at what the memory layout might look like for two string variables x and y, with the contents "post" and "poster" respectively. In Figure 8.22, the letters in the box labeled "string contents" represent themselves, while the 0s represent the null byte, not the digit 0.

Figure 8.22. strings x and y in memory


If we were to compare the strings up to the longer of the two lengths with this memory layout, the sequence of events would be:

  1. Get character p from location 12345600.

  2. Get character p from location 1234560a.

  3. They are the same, so continue.

  4. Get character o from location 12345601.

  5. Get character o from location 1234560b.

  6. They are the same, so continue.

  7. Get character s from location 12345602.

  8. Get character s from location 1234560c.

  9. They are the same, so continue.

  10. Get character t from location 12345603.

  11. Get character t from location 1234560d.

  12. They are the same, so continue.

  13. Get a null byte from location 12345604.

  14. Get character e from location 1234560e.

  15. The character e from the second string is higher than the null byte from the first string, so we conclude (correctly) that the second string comes after the first one.

This works because the null byte, having an ASCII code of 0, in fact has a lower value than whatever non-null byte is in the corresponding position of the other string.

However, this plan wouldn't work reliably if we had a string with a null byte in the middle. To see why, let's change the memory layout slightly to stick a null byte in the middle of string y. Figure 8.23 shows the modified layout.

Figure 8.23. strings x and y in memory, with an embedded null byte


You may reasonably object that we don't have any way to create a string with a null byte in it. That's true at the moment, but one reason we're storing the actual length of the string rather than relying on the null byte to mark the end of a string, as is done with C strings, is that keeping track of the length separately makes it possible to have a string that has any characters whatever in it, even nulls.

For example, we could add a string constructor that takes an array of bytes and a length and copies the specified number of bytes from the array. Since an array of bytes can contain any characters in it, including nulls, that new constructor would obviously allow us to create a string with a null in the middle of it. If we tried to use the preceding comparison mechanism, it wouldn't work reliably, as shown in the following analysis.

  1. Get character p from location 12345600.

  2. Get character p from location 1234560a.

  3. They are the same, so continue.

  4. Get character o from location 12345601.

  5. Get character o from location 1234560b.

  6. They are the same, so continue.

  7. Get character s from location 12345602.

  8. Get character s from location 1234560c.

  9. They are the same, so continue.

  10. Get character t from location 12345603.

  11. Get character t from location 1234560d.

  12. They are the same, so continue.

  13. Get a null byte from location 12345604.

  14. Get a null byte from location 1234560e.

  15. They are the same, so continue.

  16. Get character t from location 12345605.

  17. Get character r from location 1234560f.

  18. The character t from the first string is greater than the character r from the second string, so we conclude that the first string comes after the second one.

Unfortunately, this conclusion is incorrect; what we have actually done is run off the end of the first string and started retrieving data from the next location in memory. Since we want to be able to handle the situation where one of the strings has one or more embedded nulls, we have to stop the comparison as soon as we get to the end of the shorter string. Whatever happens to be past the end of that string's data is not relevant to our comparison of the two strings.

Let's listen in on the discussion Susan and I had on this topic.

Susan: Why is the return value from operator < a bool?

Steve: Because there are only two possible answers that it can give: either the first string is less than the second string or it isn't. In the first case true is the appropriate answer, and in the second case, of course, false is the appropriate answer. Thus, a bool is appropriate for this use.

Susan: Again I am not seeing where we're using string::operator < (const string& Str); in the sorting program.

Steve: That's because all you have to say is a < b, just as with operator =; the compiler knows that a < b, where a and b are strings, means string::operator < (const string&).

Susan: Why are you bringing up this stuff about what the operator looks like and the way it is defined? Do you mean that's what is really happening even though it looks like built in code?

Steve: Yes.

Susan: Who puts those null bytes into memory?

Steve: The compiler supplies a null byte automatically at the end of every literal string, such as "abc".

Susan: I don't get where you are not using a null byte when storing the length; it looks to me that you are. This is confusing. Ugh.

Steve: I understand why that's confusing, I think. I am including the null byte at the end of a string when we create it from a C string literal, so that we can mix our strings with C strings more readily. However, because we store the length separately, it's possible to construct a string that has null bytes in the middle of it as well as at the end. This is not possible with a C string, because that has no explicit length stored with it; instead, the routines that operate on C strings assume that the first null byte means the C string is finished.

Susan: Why do you jump from a null byte to a t? Didn't it run out of letters? Is this what you mean by retrieving data from the next location in memory? Why was a t there?

Steve: Yes, this is an example of retrieving random information from the next location in memory. We got a t because that just happened to be there. The problem is that since we're using an explicit length rather than a null byte to indicate the end of our strings, we can't count on a null byte stopping the comparison correctly. Thus, we have to worry about handling the case where there is a null byte in the middle of a string.

Now that we've examined why the algorithm for operator < works the way it does, it will probably be easier to understand the code if we follow an example of how it is used. I've written a program called strtst5x.cpp for this purpose; Figure 8.24 has the code for that program.

Figure 8.24. Using operator < for strings (codestrtst5x.cpp)
#include <iostream>
#include "string5.h"
using std::cout;
using std::endl;

int main()
{
  string x;
  string y;

  x = "ape";
  y = "axes";

  if (x < y)
    cout << x << " comes before " << y << endl;
  else
    cout << x << " doesn't come before " << y << endl;

  return 0;
}

You can see that in this program the two strings being compared are "ape" and "axes", which are assigned to strings x and y respectively. As we've already discussed, the compiler translates a comparison between two strings into a call to the function string::operator <(const string& Str); in this case, the line that does that comparison is if (x < y).

Now that we've seen how to use this comparison operator, Figure 8.25 shows one way to implement it.

Figure 8.25. The implementation of operator < for strings (from codestring5a.cpp)
bool string::operator < (const string& Str)
{
  short i;
  bool Result;
  bool ResultFound;
  short CompareLength;

  if (Str.m_Length < m_Length)
      CompareLength = Str.m_Length;
  else
      CompareLength = m_Length;

  ResultFound = false;
  for (i = 0; (i < CompareLength) && (ResultFound == false); i ++)
     {
     if (m_Data[i] < Str.m_Data[i])
         {
         Result = true;
         ResultFound = true;
         }
     else
         {
         if (m_Data[i] > Str.m_Data[i])
             {
             Result = false;
             ResultFound = true;
             }
         }
     }

  if (ResultFound == false)
      {
      if (m_Length < Str.m_Length)
          Result = true;
      else
          Result = false;
      }

  return Result;
}

The variables we'll use in this function are:

  1. i, which is used as a loop index in the for loop that steps through all of the characters to be compared.

  2. Result, which is used to hold the true or false value that we'll return to the caller.

  3. ResultFound, which we'll use to keep track of whether we've found the result yet.

  4. CompareLength, which we'll use to determine the number of characters to compare in the two strings.

After defining variables, the next four lines of the code determine how many characters from each string we actually have to compare; the value of CompareLength is set to the lesser of the lengths of our string and the string referred to by Str. In this case, that value is 4, the length of our string (including the terminating null byte).

Now we're ready to do the comparison. This takes the form of a for loop that steps through all of the characters to be compared in each string. The header of the for loop is for (i = 0; (i < CompareLength) && (ResultFound == false); i ++). The first and last parts of the expression controlling the for loop should be familiar by now; they initialize and increment the loop control variable. But what about the continuation expression (i < CompareLength) && (ResultFound == false)?

The Logical AND Operator

That expression states a two-part condition for continuing the loop. The first part, (i < CompareLength), is the usual condition that allows the program to execute the loop as long as the index variable is within the correct range. The second part, (ResultFound == false), should also be fairly clear; we want to test whether we've already found the result we're looking for and continue only as long as that isn't the case (i.e., ResultFound is still false). The () around each of these expressions are used to tell the compiler that we want to evaluate each of these expressions first, before the && is applied to their results. That leaves the && symbol as the only mystery.

It's really not too mysterious. The && operator is the symbol for the "logical AND" operator, which means that we want to combine the truth or falsity of two expressions each of which has a logical value of true or false. The result of using && to combine the results of these two expressions will also be a logical value. Here is the way the value of that expression is determined:

  1. If both of the expressions connected by the && are true, then the value of the expression containing the && is also true;

  2. Otherwise, the value of the expression containing the && is false.

If you think about it for a minute, this should make sense. We want to continue the loop as long as both of the conditions are true; that is,

  1. i is less than CompareLength; and

  2. ResultFound is false (we haven't found what we're looking for).

That's why the && operator is called logical AND; it checks whether condition 1 and condition 2 are both true. If either is false, we want to stop the loop, and this continuation expression will do just that.[6]

[6] This operator follows a rule analogous to the one for ||: if the expression on the left of the && is false, then the answer must be false and the expression on the right is not executed at all. The reason for this "short-circuit evaluation rule" is that in some cases you may want to write a right-hand expression for && that will only be legal if the left-hand expression is true.

Now let's trace the path of execution through the for loop in Figure 8.25. On the first time through the loop, the index i is 0 and ResultFound is false. Therefore, the continuation expression allows us to execute the statements in the loop, where we test whether the current character in our string, namely m_Data[i], is less than the corresponding character from the string Str, namely Str.m_Data[i].

By the way, in case the expression in the if statement, if (m_Data[i] < Str.m_Data[i]), doesn't make sense immediately, perhaps I should remind you that the array notation m_Data[i] means the ith character of the data pointed to by m_Data; an index value of 0 means the first element, as is always the case when using an array or Vec. We've already covered this starting with the section entitled “The Equivalence of Arrays and Pointers” on page 487; you should go back and reread that section if you're not comfortable with the equivalence between pointers and arrays.

The code in Figure 8.26 compares characters from the two strings.

Figure 8.26. Is our character less than the one from the other string? (from codestring5a.cpp)
if (m_Data[i] < Str.m_Data[i])
    {
    Result = true;
    ResultFound = true;
    }

If the current character in our string were less than the corresponding character in Str, we would have our answer; our string would be less than the other string. If that were the case, we would set Result to true and ResultFound to true and would be finished with this execution of the for loop.

As it happens, in our current example both m_Data[0] and Str.m_Data[0] are equal to 'a', so they're equal to each other as well. What happens when the character from our string is the same as the one from the string Str?

In that case, the first if, whose condition is stated as if (m_Data[i] < Str.m_Data[i]), is false. So we continue with the else clause of that if statement, which looks like Figure 8.27.

Figure 8.27. The else clause in the comparison loop (from codestring5a.cpp)
else
 {
 if (m_Data[i] > Str.m_Data[i])
   {
   Result = false;
   ResultFound = true;
   }
 }

This clause contains another if statement that compares the character from our string to the one from Str. Since the two characters are the same, this if also comes out false so the controlled block of the if isn't executed. After this if statement, we've reached the end of the controlled block of the for statement. The next iteration of the for loop starts by incrementing i to 1. Then the continuation expression is evaluated again; i is still less than CompareLength and ResultFound is still false, so we execute the controlled block of the loop again with i equal to 1.

On this pass through the for loop, m_Data[1] (the character from our string) is 'p' and Str.m_Data[1] (the character from the other string) is 'x'. Therefore, the condition in the first if statement (that the character from our string is less than the character from the other string) is true, so we execute the controlled block of the if statement. This sets Result to true, and ResultFound also to true, as you can see in Figure 8.26.

We're now at the end of the for loop, so we return to the for statement to continue execution. First, i is incremented again, to 2. Then the continuation expression (i < CompareLength) && (ResultFound == false) is evaluated. The first part of the condition, i < CompareLength is true, since i is 2 and CompareLength is 4. However, the second part of the condition, ResultFound == false, is false, because we've just set ResultFound to true. Since the result of the && operator is true only when both subconditions are true, the for loop terminates, passing control to the next statement after the controlled block of the loop (Figure 8.28).

Figure 8.28. Handling the return value (from codestring5a.cpp)
if (ResultFound == false)
  {
  if (m_Length < Str.m_Length)
    Result = true;
  else
    Result = false;
  }

In the current scenario, ResultFound is true because we have found a character from m_Data that differs from the corresponding character from Str.m_Data; therefore, the condition in the first if is false, and we proceed to the next statement after the end of the if statement, return Result;. This shouldn't come as too much of a surprise; we know the answer to the comparison, namely, that our string is less than the other string, so we're ready to tell the caller the information that he requested by calling our routine.

Other Possible Results of the Comparison

The path of execution is almost exactly the same if, the first time we find a mismatch between the two strings, the character from our string is greater than the character from the other string. The only difference is that the if statement that handles this scenario sets Result to false rather than true (Figure 8.27), because our string is not less than the other string; of course, it still sets ResultFound to true, since we know the result that will be returned.

There's only one other possibility; that the two strings are the same up to the length of the shorter one (e.g., "post" and "poster"). In that case, the for loop will expire of natural causes when i gets to be greater than or equal to CompareLength. Then the final if statement shown in Figure 8.28 will evaluate to true, because ResultFound is still false. In this case, if the length of our string is less than the length of the other string, we will set Result to true, because a shorter string will precede a longer one in the dictionary if the two strings are the same up to the length of the shorter one.

Otherwise, we'll set Result to false, because our string is at least as long as the other one; since they're equal up to the length of the shorter one, our string can't precede the other string. In this case, either they're identical, or our string is longer than the other one and therefore should follow it. Either of these two conditions means that the result of operator < is false, so that's what we tell the caller via our return value.

Using a Standard Library Function to Simplify the Code

This implementation of operator < for strings works. However, there's a much simpler way to do it. Figure 8.29 shows the code.

Figure 8.29. Implementing operator < for strings (from codestring5.cpp)
bool string::operator < (const string& Str)
{
  short Result;
  short CompareLength;

  if (Str.m_Length < m_Length)
      CompareLength = Str.m_Length;
  else
      CompareLength = m_Length;

  Result = memcmp(m_Data,Str.m_Data,CompareLength);

  if (Result < 0)
      return true;

  if (Result > 0)
      return false;

  if (m_Length < Str.m_Length)
      return true;

  return false;
}

This starts out in the same way as our previous version, by figuring out how much of the two strings we actually need to compare character by character. Right after that calculation, though, the code is very different; where's that big for loop?

It's contained in the standard library function memcmp, a carryover from C, which does exactly what that for loop did for us. Although C doesn't have the kind of strings that we're implementing here, it does have primitive facilities for dealing with arrays of characters, including comparing one array with another, character by character. One type of character array supported by C is the C string, which we've already encountered. However, C strings have a serious drawback for our purposes here; they use a null byte to mark the end of a group of characters. This isn't suitable for our strings, whose length is explicitly stored; as noted previously, our strings could theoretically have null bytes in them. There are several C functions that compare C strings, but they rely on the null byte for their proper operation so we can't use them.

However, these limitations of C strings are so evident that the library writers have supplied another set of functions that act almost identically to the ones used for C strings, except that they don't rely on null bytes to determine how much data to process. Instead, whenever you use one of these functions, you have to tell it how many characters to manipulate. In this case, we're calling memcmp, which compares two arrays of characters up to a specified length. The first argument is the first array to be compared (corresponding to our string), the second argument is the second array to be compared (corresponding to the string Str), and the third argument is the length for which the two arrays are to be compared. The return value from memcmp is calculated by the following rules:

  1. It's less than 0 if the first array would precede the second in the dictionary, considering only the length specified;

  2. It's 0 if they are the same up to the length specified;

  3. It's greater than 0 if the first array would follow the second in the dictionary, considering only the length specified.

This is very convenient for us, because if the return value from memcmp is less than 0, we know that our result will be true, while if the return value from memcmp is greater than 0, then our result will be false. The only complication, which isn't very complicated, is that if the return value from memcmp is 0, meaning that the two arrays are the same up to the length of the shorter character array, we have to see which is longer. If the first one is shorter, then it precedes the second one; therefore, our result is true. Otherwise, it's false.

Susan had some questions about this version of operator <, including why we had to go through the previous exercise if we could just use memcmp.

Susan: What is this? I suppose there was a purpose to all the confusing prior discussion if you have an easier way of defining operator <? UGH! This new stuff just pops up out of the blue! What is going on? Please explain the reason for the earlier torture.

Steve: I thought we should examine the character-by-character version of operator < before taking the shortcut. That should make it easier to follow the explanation of the "string overrun" problem, as each character comparison shows up in the code.

Susan: So, memcmp is another library function, and does it stand for memory compare? Also, are the return values built into memcmp? This is very confusing, because you have return values in the code.

Steve: Yes, memcmp stands for "memory compare". As for return values; yes, it has them, but they aren't exactly the ones that we want. We have to return the value true for "less than" and false for "not less than", which aren't the values that memcmp returns. Also, memcmp doesn't do the whole job when the strings aren't the same length; in that case, we have to handle the trailing part of the longer string manually.

One small point that shouldn't be overlooked is that in this version of the operator < code, we have more than one return statement; in fact, we have four! That's perfectly legal and should be clear to a reader of this function. It's usually not a good idea to scatter return statements around in a large function, because it's easy to overlook them when trying to follow the flow of control through the function. In this case, though, that's not likely to be a problem; any reasonably fluent reader of C++ code will find this organization easy to understand.

Implementing operator ==

Although our current task requires only operator <, another comparison operator, operator ==, will make an interesting contrast in implementation; in addition, a concrete data type that allows comparisons should really implement more than just operator <. Since we've just finished one comparison operator, we might as well knock this one off now (Figure 8.30).

This function is considerably simpler than the previous one. Why is this, since they have almost the same purpose? It's because in this case we don't care which of the two strings is greater than the other, just whether they're the same or different. Therefore, we don't have to worry about comparing the two char arrays if they're of different lengths. Two arrays of different lengths can't be the same, so we can just return false. Once we have determined that the two arrays are the same length, we do the comparison via memcmp. This gives us the answer directly, because if Result is 0, then the two strings are equal; otherwise, they're different.

Figure 8.30. Implementing operator == for strings (from codestring5.cpp)
bool string::operator == (const string& Str)
{
  short Result;

  if (m_Length != Str.m_Length)
      return false;

  Result = memcmp(m_Data,Str.m_Data,m_Length);

  if (Result == 0)
      return true;

  return false;
}

Even though this function is simpler than operator <, it's not simple enough to avoid Susan's probing eye:

Susan: Does == only check to see if the lengths of the arrays are the same? Can it not ever be used for a value?

Steve: It compares the values in the arrays, but only if they are the same length. Since all it cares about is whether they are equal, and arrays of different length can't be equal, it doesn't have to compare the character data unless the arrays are of the same length.

Implementation vs. Declaration Revisited

Before moving on to see how we will display a string on the screen via operator <<, I should bring up a couple of points here because otherwise they might pass you by. First, we didn't have to change our interface header file string5.h (Figure 8.21) just because we changed the implementation of operator < between string5a.cpp and string5.cpp. Since the signature of this function didn't change, neither the header file nor the user program had to change. Second, we didn't even implement operator == in the string5a.cpp version of the string library and yet our test program still compiled without difficulty. How can this be?

In C++, you can declare all of the functions you want to, whether they are member functions or global functions, without actually defining them. As long as no one tries to actually use the functions, everything will work fine. In fact, the compiler doesn't even care whether any functions you do refer to are available; that's up to the linker to worry about. This is very handy when you know that you're going to add functions in a later revision of a class, as was the case here. Of course, you should warn your class users if you have listed functions in the interface header file that aren't available. It's true that they'll find out about the missing functions the first time they try to link a program that uses one of these functions, because the linker will report that it can't find the function; however, if they've spent a lot of time writing a program using one of these functions, they're likely to get mad at you for misleading them. So let them know what's actually implemented and what's "for later".

Now let's continue with our extensions to the string class, by looking at how we send a string out to the screen.

Using cout With User-defined Types

We've been using cout and its operator << for awhile, but have taken them for granted. Now we have to look under the hood a bit.

The first question is what type of object cout is. The answer is that it's an ostream (short for "output stream"), which is an object that you can use to send characters to some output device. I'm not sure of the origin of this term, but you can imagine that you are pushing the characters out into a "stream" that leads to the output device.

As you may recall from our uses of cout, you can chain a bunch of << expressions together in one statement, as in Figure 8.31. If you compile and execute that program, it will display:

On test #1, your mark is: A

Notice that it displays the short as a number and the char as a letter, just as we want it to do. This desirable event occurs because there's a separate version of << for each type of data that can be displayed; in other words, operator << uses function overloading, just like the constructors for the StockItem class and the string class. We'll also use function overloading to add support for our string class to the I/O facilities supplied by the iostream library.

Figure 8.31. Chaining several operator << expressions together (codecout1.cpp)
#include <iostream>
using namespace std;

int main()
{
  short x;
  char y;

  x = 1;
  y = 'A';

  cout << "On test #" << x << ", your mark is: " << y << endl;

  return 0;
}

How cout Works With Pre-existing Types

Before we examine how to accomplish this goal, though, we'll have to go into some detail about how the pre-existing output functions behave. Let's start with a simple case using a version of operator << supplied by the iostream header file. The simplest possible use of ostream's operator <<, of course, uses only one occurrence of the operator. Here's an example where the value is a char:

cout << 'a';

As you may remember, using an operator such as << on an object is always equivalent to a "normal" function call. This particular example is equivalent to the following:

cout.operator << ('a'),

which calls ostream::operator << (char) (i.e., the version of the operator << member function of the iostream class that takes a char as its input) for the predefined destination cout, which writes the char on the screen.

That takes care of the single occurrence of operator <<. However, as we've already seen, it's possible to string together any number of occurrences of operator <<, with the output of each successive occurrence following the output created by the one to its left. We want our string output function to behave just like the ones predefined in iostream, so let's look next at an example that illustrates multiple uses of operator <<, taking a char and a C string:

cout << 'a' << " string";

This is equivalent to

(cout.operator << ('a')).operator << (" string");

What does this mean? Well, since an expression in parentheses is evaluated before anything outside the parentheses, the first thing that happens is that ostream::operator << (char) is called for the predefined destination cout, which writes the 'a' to the screen. Now here's the tricky part: the return value from every version of ostream::operator << is a reference to the ostream that it operates on (cout, in this case). Therefore, after the 'a' has been written on the screen, the rest of the expression reduces to this:

cout.operator << (" string");

That is, the next output operation behaves exactly like the first one. In this case, ostream::operator << (char*) is the function called, because char* is the type of the argument to be written out. It too returns a reference to the ostream for which it was called, so that any further << calls can add their data to that same ostream. It should be fairly obvious how the same process can be extended to handle any number of items to be displayed.

Writing Our Own Standard Library-Compatible operator <<

That illustrates how the designers of ostream could create member functions that would behave in this convenient way. However, we can't use the same mechanism that they did; we can't modify the definition of the ostream class in the library, because we didn't write it in the first place and don't have access to its source code.[7] Is there some way to give our strings convenient input and output facilities?

[7] Even if we did have the source code to the ostream class, we wouldn't want to modify it, for a number of reasons. One excellent reason is that every time a new version of the library came out, we'd have to make our changes again. Also, there are other ways to reuse the code from the library for our own purposes using mechanisms that we'll get to later in this book, although we won't use them with the iostream classes.

In fact, there is. To do this, we create a global function called operator << that accepts an ostream& (that is, a reference to an ostream), adds the contents of our string to the ostream, and then returns a reference to the same ostream. This will support multiple occurrences of operator << being chained together in one statement, just like the operator << member functions from the iostream library. The implementation of this function is shown in Figure 8.32.

As usual, we should first examine the function declaration; in this case, a couple of points are worth noting. We've already seen that the first argument is an ostream&, to which we will add the characters from the string that is the second argument. Also notice that the second argument is a const string&, that is, a reference to a constant string. This is the best way to declare this argument because we aren't going to change the argument, and there's no reason to make a copy of it.

Figure 8.32. An operator << function to output a string (from codestring5.cpp)
std::ostream& operator << (std::ostream& os, const string& Str)
{
  short i;

  for (i=0; i < Str.m_Length-1; i ++)
     os << Str.m_Data[i];

  return os;
}

But possibly the most important point about the function declaration is that this operator << is not a member function of the string class, which explains why it isn't called string::operator <<. It's a global function that can be called anywhere in a program that needs to use it, so long as that program has included the header file that defines it. Its operation is pretty simple. Since there is no ostream function to write out a specified number of characters from a char array, we have to call ostream::operator << (char) for each character in the array.

Therefore, we use the statement

os << Str.m_Data[i];

to write out each character from the array called m_Data that we use to store the data for our string on the ostream called os, which is just another name for the ostream that is the first argument to this function.

After all the characters have been written to the ostream, we return it so that the next operator << call in the line can continue producing output.

However, there's a loose end here. How can a global function, which by definition isn't a member function of class string, get at the internal workings of a string? We declared that m_Length and m_Data were private, so that they wouldn't be accessible to just any old function that wandered along to look at them. Is nothing sacred?

The friend Keyword

In fact, private data aren't accessible to just any function. However, operator << (std::ostream&, const string&) isn't just any function. Take a look at string5.h in Figure 8.21 to see why. The line we're interested in here is this one:

friend std::ostream& operator << (std::ostream& os, const string& Str);

The key word here is friend. We're telling the compiler that a function with the signature std::ostream& operator << (std::ostream&, const string&) is permitted to access the information normally reserved for member functions of the string class; i.e., anything that isn't marked public. It's possible to make an entire class a friend to another class; here, we're specifying one function that is a friend to this class.[8]

[8] The signature of the function is important here, as elsewhere in C++; this friend declaration would not permit a function with the same name and a different signature, for example std::ostream& operator << (std::ostream&, int) to access non-public members of string.

You probably won't be surprised to learn that Susan had some questions about this operator. Let's see how the discussion went:

Susan: Let's start with friend. . . what is that?

Steve: A friend is a function or class that is allowed to access internals of this class, as though the friend were a member function. In other words, the private access specifier doesn't have any effect on friends.

Susan: What is an ostream? How is it related to istream?

Steve: An ostream is a stream that is used for output; streams can be either input (istream) or output (ostream).

Susan: Why does it have std:: in front of it?

Steve: Because we are specifying that we mean the ostream that is in the standard library. It's a good idea to avoid using declarations in commonly used header files, as I've explained previously, and this is another way of telling the compiler exactly which ostream we mean.

Susan: This stream character seems to have a lot of relatives.

Steve: You're right; there are lots of classes in the stream family, including istream, ostream, ifstream, and ofstream. And it really is a family, in the C++ sense at least; these classes are related by inheritance, which we'll get to in Chapter 9.

That explains why this global function can access our non-public data. But why did we have to create a global function in the first place, rather than just adding a member function to our string class?

Because a member function of a class has to be called for an object of that class, whose address then becomes the this pointer; in the case of the << operator, the class of the object is ostream, not string. Figure 8.33 is an example.

Figure 8.33. Why operator << has to be implemented via a global function
string x = "this is it";
cout << x;

The line cout << x; is the same as cout.operator << (x);. Notice that the object to which the operator << call is applied is cout, not x. Since cout is an ostream, not a string, we can't use a member function of string to do our output, but a global function is perfectly suitable.

Reading a string from an istream

Now that we have an output function that will write our string variables to an ostream, such as cout, it would be very handy to have an input function that could read a string from an istream, such as cin. You might expect that this would be pretty simple now that we've worked through the previous exercise, and you'd be mostly right. As usual, though, there are a few twists in the path.

Let's start by looking at the code in Figure 8.34.

Figure 8.34. A operator >> function to input a string (from codestring5.cpp)
std::istream& operator >> (std::istream& is, string& Str)
{
  const short BUFLEN = 256;

  char Buf[BUFLEN];
  memset(Buf,0,BUFLEN);

  if (is.peek() == '
')
      is.ignore();
  is.getline(Buf,BUFLEN,'
'),
  Str = Buf;

  return is;
}

The header is pretty similar to the one from the operator << function, which is reasonable, since they're complementary functions. In this case, we're defining a global function with the signature std::istream& operator >> (std::istream& is, string& Str). In other words, this function, called operator >>, has a first argument that is a reference to an istream, which is just like an ostream except that we read data from it rather than writing data to it. One significant difference between this function signature and the one for operator << is that the second argument is a non-const reference, rather than a const reference, to the string into which we want to read the data from the istream. That's because the whole purpose of this function is to modify the string passed in as the second argument; to be exact, we're going to fill it in with the characters taken out of the istream.

Continuing with the analysis of the function declaration, the return value is another istream reference, which is passed to the next operator >> function to the right, if there is one; otherwise it will just be discarded.

After decoding the header, let's move to the first line in the function body, const short BUFLEN = 256;. While we've encountered const before, specifying that we aren't going to change an argument passed to us, that can't be the meaning here. What does const mean in this context?

It specifies that the item being defined, which in this case is short BUFLEN, isn't a variable, but a constant, or const value. That is, its value can't be changed. Of course, a logical question is how we can use a const, if we can't set its value.[9]

[9] In case you were wondering how I came up with the name BUFLEN, it's short for "buffer length". Also, I should mention the reason that it is all caps rather than mixed case or all lower case: an old C convention (carried over into C++) specifies that named constants should be named in all caps to enable the reader to distinguish them from variables at a glance.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset