Chapter 8. Regular Expressions

Regular expressions provide an extremely powerful text-processing system that every programmer should at least be familiar with. This chapter is not a full tutorial on regular expression syntax, but if you are new to them, this will get you started with some useful tips.

Search Text

Solution: Using regular expressions gives you the power to find complex patterns in text. .NET’s regular expression classes can be found in the System.Text.RegularExpressions namespace. A text search in its most fundamental form looks like this:

image

Here’s the output:

"we" at position 8
"we" at position 22

A simple search like this isn’t taking advantage of regular expressions, however. Here’s a more interesting one:

//find all words 7 characters or longer
Regex regex = new Regex("[a-zA-Z]{7,}");

And here’s the output:

"brothers" at position 33

Extract Groups of Text

Solution: Let’s extract all the street names:

image

The following output is produced:

image

Replace Text

Solution: You can do a straight replacement of text with regular expressions, like this:

image

There is a more powerful option, however. By using a MatchEvaluator, you can have complex text replacements that depend on the value of the match found. Here is a simple example where the word we is swapped with the word that appears after it:

image

This produces the following output:

result: few we, happy we few, band we of brothers

Match and Validate

Solution: This section contains some common validation expressions. The general usage, from which the sample output is taken, is as follows:

image

To see the full example, look at the MatchAndValidate project in this chapter’s accompanying source code.

Social Security Number

Validating these numbers is fairly simple. They are expressed as nine digits, optionally separated into groups with hyphens.

Regex = new Regex(@"^d{3}-?d{2}-?d{4}$");

Sample output:

image

Phone Number

Phone numbers are also commonly validated. The next example validates a standard 10-digit US telephone number.

image

Sample output:

image

Zip Codes

Zip codes can be either 5 digits or 9 digits, with an optional hyphen.

Regex regex = new Regex(@"^d{5}(-?d{4})?$");

Sample output:

image

Dates

This regular expression validates US-format dates in the form MM/DD/YYYY.

image

Sample output:

image

Match an Email (or Not)

So far, we’ve seen some simple examples of regular expression validation, but what about something a little more complex, such as an email address? As it happens, this isn’t a little more complex—it’s a lot more complex! You will see that as the syntax gets more complex, regular expressions can become quite large and unwieldy. In fact, it is not possible to do an absolutely correct email validation with regular expressions. That doesn’t stop people from trying, though. If you do a web search, you will come across many regular expressions for email addresses that range from a few lines to a few pages.

Beware of turning to such things as a solution. Chances are, with such an enormous regular expression, you will not be able to understand it once written. There is also the possibility that the data format is just too complex and the regular expression can’t capture all of it. Also, you do not want to be in the position of refusing to accept information from a user because she has an unusual email address. That’s a user who may not come back! (You will notice that most websites don’t validate email addresses anyway—they just ask the user to enter the address twice to make sure it’s correct.)

If you do need to use large, unreadable regular expressions, ensure that you have very thorough unit tests to cover your input cases. With unit tests, you’ll be able to modify the regular expression with confidence that you’re not breaking existing functionality.

Help Regular Expressions Perform Better

Solution: If you tell .NET to compile them into their own assembly, they’ll run a bit faster, especially for complex expressions. Here’s an example:

Regex regex = new Regex(pattern, RegexOptions.Compiled);

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset