Case-Insensitive Searching

Now we're ready to discuss the first regular member function in the xstring class: find_nocase. We need this function to determine whether a given xstring contains a particular sequence of characters. For example, if we have an xstring containing the value “red, blue, and green”, describing the colors of a sofa, we want to be able to determine whether the letters “b”, “l”, “u”, and “e” appear consecutively in that string. If they do, it is sometimes also useful to know where that sequence of characters starts in the xstring.

Susan wanted to know why we would need to know where a sequence of characters was found in a string:

Susan: I understand why we need to know if we can find some characters in a string, but why would we care where they were?

Steve: Well, what if we wanted to change all the occurrences of “blue” to “purple” because we were changing our decor? In that case, there are string functions we could use to take apart a string, remove “blue”, replace it with “purple”, and put it back together. But to do that we would need to know where we had to take the string apart; i.e., at what point in the string the characters “blue” appeared.

Because this new function finds a sequence of chars in an xstring, its name should be something like find. Because it is going to be case-insensitive (e.g., RED, Red, and red will all be considered equivalent), we'll call it find_nocase.[6]

[6] I made this function case-insensitive after watching Susan's attempt to use a version of the HomeInventory example program that used a case-sensitive searching function. This illustrates why it is absolutely necessary to watch a (hopefully representative) user of a program actually try it before making the assumption that it is “ready for prime time”.

What exactly do I mean by “case-insensitive”? That when this new find_nocase function looks for a sequence of chars within an xstring, it considers upper- and lower-case letters equivalent. We will employ this function in the HomeInventory class functions that will enable us to, for example, search through the home inventory for all HomeItem objects containing the word “purple” in their description fields. Before we look at how this is implemented, let's see how it can be used (Figure 12.19).

Figure 12.19. Using xstring::find_nocase (codexstrtstb.cpp)
#include <iostream>
#include <string>
#include "xstring.h"
using namespace std;

int main()
{
  xstring x = "purple";
  xstring y = "A Purple couch";
  short where;

  where = y.find_nocase(x);
  cout << "The string " << x <<
    " can be found starting at position " <<
    where << " in the string " << endl;
  cout << y << "." << endl;

  where = x.find_nocase("rp");
  cout << "The string 'rp' can be found starting at position " <<
    where << " in the string " << x << "." << endl;

  where = x.find_nocase("rpx");
  cout << "The string 'rpx' can be found starting at position " <<
    where << " in the string " << x << "." << endl;

  return 0;
}

This program starts out by defining some xstring variables called x and y and initializing them to the values “purple” and “A Purple couch”, respectively. Then it defines a short value called where that will hold the result of each search for an included sequence of chars. The next line, where = y.find_nocase(x);, calls the find_nocase member function of the xstring class to locate an occurrence of the value “purple” in the xstring y, which has the value “A Purple couch”. The next three lines display the results of that search; as you can see, the return value of this function is equal to the position in xstring y where the string to be found, “purple”, was indeed found, even though the word “Purple” was capitalized in the xstring we were looking through.

The other two similar sequences search the same xstring value (in y) for the literal values “rp” and “rpx”, respectively, and display the results of these searches. The first of these is very similar to the previous search for the word “purple”, but serves to point out that we don't have to search for a word — any sequence of characters will do.

The last sequence, however, is somewhat different because we are searching for a literal value (“rpx”) that is not present in the xstring we're examining (“A Purple couch”). The question, of course, is what value the find_nocase function should return when this happens. Perhaps the most obvious possibility is 0, but that is unfortunately not appropriate because it violates the C and C++ convention that the first position of a string-like variable is considered position 0; that is, the return value 0 would signify that the string we were searching for was found at the beginning of the xstring we were searching in. Therefore, find_nocase returns the value –1 to indicate that the desired value has not been found in the xstring being examined.

Susan wanted to know how I picked -1 to indicate that we couldn't find a particular sequence of characters:

Susan: How did you come up with the number -1 to mean “not found”?

Steve: Well, what number should mean that? Zero isn't a good choice, because that would mean we had found the desired sequence of characters at the beginning of the target xstring. Remember, in C++, we start counting at 0. So I had to pick a number that couldn't possibly be a position in an xstring, and the standard convention in C++ is to use -1 to mean “not a valid position”.

Now that we've seen how to use find_nocase, let's take a look at its implementation, which is shown in Figure 12.20.

Figure 12.20. The implementation of xstring::find_nocase (from codexstring.cpp)
short xstring::find_nocase(const xstring& Str)
{
 short i;
 short thislength = size();
 short strlength = Str.size();
 const char* thisdata = c_str();
 const char* strdata = Str.c_str();

 for (i = 0; i < thislength-strlength+1; i ++)
  {
  if (strnicmp(thisdata+i,strdata,strlength) == 0)
    return i;
  }

 return -1;
}

To make the discussion simpler, let's call the xstring that might contain the desired value (i.e., the one pointed to by this) the target string and the argument xstring Str the search string. So this function starts out by defining some variables called thislength and strlength to hold the actual number of chars in the target and the search xstrings, respectively. Then it uses a special function of the standard library string class called c_str() to set the variables thisdata and strdata to the addresses of the char data for the target xstring and the search xstring, respectively.[7]

[7] According to the C++ standard, the address returned by the c_str function may not actually be the address of the char data for the string itself, but the address of a copy of that data. However, this implementation detail doesn't affect us.

Now we get to the heart of the function: the loop that uses strnicmp to compare each possible section of the target xstring with the search xstring we're looking for. We haven't discussed the strnicmp function yet, but it's quite similar to memcmp, with three differences:[8]

[8] We've discussed memcmp under the heading “Using a Standard Library Function to Simplify the Code” on page 524 in Chapter 8.

  1. strnicmp ignores case in its comparison, so that (for example) RED, Red, and red all compare as equal.

  2. strnicmp is a C string function rather than a C memory manipulation function like memcmp, so it stops when it encounters a null byte.

  3. strnicmp isn't part of the C++ standard library. However, it is very commonly supplied by compiler manufacturers, so that shouldn't cause you much trouble.

The first of these characteristics of strnicmp is the reason that we have to use strnicmp rather than memcmp, which is case-sensitive. The second characteristic isn't an advantage when dealing with xstrings, which can theoretically contain null bytes. However, this isn't a problem in the find_nocase function, as that function applies only to ASCII text that doesn't contain null bytes anyway.

Now let's get back to the discussion of find_nocase. On the first time through the loop, the value of i is 0; therefore, the function call strnicmp(thisdata+i,strdata,strlength) compares strlength bytes from the beginning of the target xstring to the same number of bytes in the search xstring. If the two sets of bytes are equal (ignoring case), the result of the comparison is 0, in which case we have found what we were looking for, and so we exit the loop.

On the other hand, if the result of the comparison is not 0, we have to keep looking. The next step is to increment the value of the loop index i. The second time through the loop, the value of i is 1, so strnicmp(thisdata+i,strdata,strlength) compares strlength bytes starting at the second byte of the target xstring with the same number of bytes starting at the beginning of the search xstring. If this comparison is successful, we stop and indicate success; if not, we continue executing the loop until we find a match or run out of data in the target xstring.

Let's look at an example in more detail. Suppose we are searching through the target xstring “A Purple couch”, looking for the search xstring “purple”. The first time through the loop, we compare the first 6 bytes in the target xstring to the 6 bytes in the search xstring. Since the first byte of the target xstring is 'A' and the first byte of the search xstring is 'p', strnicmp returns a non-zero value to let us know that we haven't yet found a match. Therefore, we have to re-execute the loop. The second time through, we start the comparison at the second byte of the target xstring and the first byte of the search xstring; the second byte of the target xstring is a space, which isn't the same as the 'p' from the search xstring, so strnicmp returns a non-zero value to let us know that we still haven't found the search xstring. The third time through the loop, we start the comparison at the third byte of the target xstring and (as always) the first byte of the search xstring. Both of these have the value 'p' (if we ignore case), so strnicmp continues by comparing the fourth byte of the target xstring with the second byte of the search xstring. Those also match, so strnicmp continues to compare the rest of the bytes in the two strings until it gets to the end of the search xstring. This time they all match, so strnicmp returns 0 to let us know that we have found the search xstring.

Of course, the other possibility is that the search xstring isn't present in the target xstring. In that case, strnicmp won't return 0 on any of these passes through the loop, so eventually i will exceed its limit, causing the loop to stop executing. However, there's one thing I haven't explained yet: how we calculate the maximum number of times that we have to execute the loop. If we look at the for loop, we see that the continuation expression is i < thislength – strlength + 1. Is this the right limit for i, and if so, why?

Well, if the target xstring is the same length as the search xstring, then we know that we have to execute the loop only once because there's only one possible place to start comparing two strings of the same length — at the beginning of both strings. If we start i at 0 on the first time through the loop, it will be 1 at the beginning of the second time through the loop, so thislength – strlength + 1 gives the correct limit (of 1) if thislength and strlength have the same value. This demonstrates that the expression thislength – strlength + 1 is correct for the case where the search and target strings are the same length. Now, what about the case where the target xstring is 1 byte longer than the search xstring? In that case, there is one extra position in which the search xstring could be found — namely, starting at the second character of the target xstring. Continuing with this analysis, each additional character in the target xstring beyond the length of the search xstring adds one possible position in which the search xstring might be found in the target xstring, and therefore adds 1 to the number of times we might have to go through the loop. Since adding 1 to the value of thislength will add 1 to the value of the expression thislength – strlength + 1, that expression will produce the correct limit for any value of thislength and strlength.[9]

[9] If this comparison procedure still isn't clear to you, you might want to try drawing a diagram of a search string and target string with appropriate contents, and tracing execution of find_nocase. The reason I haven't provided such diagrams is that they would really need to be animated to be of much use, and I'm afraid that book publishing technology isn't quite up to that yet!

There's one more function in the xstring class that we need to discuss: less_nocase. Its code is shown in Figure 12.21.

Figure 12.21. The less_nocase function (from codexstring.cpp)
bool xstring::less_nocase(const xstring& Str)
{
 short Result;
 short CompareLength;

 short thislength = size();
 short strlength = Str.size();
 const char* thisdata = c_str();
 const char* strdata = Str.c_str();

 if (strlength < thislength)
     CompareLength = strlength;
 else
     CompareLength = thislength;

 Result = strnicmp(thisdata,strdata,CompareLength);

 if (Result < 0)
     return true;

 if (Result > 0)
     return false;

 if (thislength < strlength)
     return true;

 return false;
}

This function is almost exactly the same as the operator < function in the string class that we created earlier in this book, except that it uses the strnicmp function rather than the memcmp function used in that implementation of operator <.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset