Regular expressions provide an extremely powerful text-processing system that every programmer should at least be familiar with. This chapter is not a full tutorial on regular expression syntax, but if you are new to them, this will get you started with some useful tips.
Solution: Using regular expressions gives you the power to find complex patterns in text. .NET’s regular expression classes can be found in the System.Text.RegularExpressions namespace. A text search in its most fundamental form looks like this:
Here’s the output:
"we" at position 8
"we" at position 22
A simple search like this isn’t taking advantage of regular expressions, however. Here’s a more interesting one:
//find all words 7 characters or longer
Regex regex = new Regex("[a-zA-Z]{7,}");
And here’s the output:
"brothers" at position 33
Solution: Let’s extract all the street names:
The following output is produced:
Solution: You can do a straight replacement of text with regular expressions, like this:
There is a more powerful option, however. By using a MatchEvaluator
, you can have complex text replacements that depend on the value of the match found. Here is a simple example where the word we is swapped with the word that appears after it:
This produces the following output:
result: few we, happy we few, band we of brothers
Solution: This section contains some common validation expressions. The general usage, from which the sample output is taken, is as follows:
To see the full example, look at the MatchAndValidate project in this chapter’s accompanying source code.
Validating these numbers is fairly simple. They are expressed as nine digits, optionally separated into groups with hyphens.
Regex = new Regex(@"^d{3}-?d{2}-?d{4}$");
Sample output:
Phone numbers are also commonly validated. The next example validates a standard 10-digit US telephone number.
Sample output:
Zip codes can be either 5 digits or 9 digits, with an optional hyphen.
Regex regex = new Regex(@"^d{5}(-?d{4})?$");
Sample output:
This regular expression validates US-format dates in the form MM/DD/YYYY.
Sample output:
So far, we’ve seen some simple examples of regular expression validation, but what about something a little more complex, such as an email address? As it happens, this isn’t a little more complex—it’s a lot more complex! You will see that as the syntax gets more complex, regular expressions can become quite large and unwieldy. In fact, it is not possible to do an absolutely correct email validation with regular expressions. That doesn’t stop people from trying, though. If you do a web search, you will come across many regular expressions for email addresses that range from a few lines to a few pages.
Beware of turning to such things as a solution. Chances are, with such an enormous regular expression, you will not be able to understand it once written. There is also the possibility that the data format is just too complex and the regular expression can’t capture all of it. Also, you do not want to be in the position of refusing to accept information from a user because she has an unusual email address. That’s a user who may not come back! (You will notice that most websites don’t validate email addresses anyway—they just ask the user to enter the address twice to make sure it’s correct.)
If you do need to use large, unreadable regular expressions, ensure that you have very thorough unit tests to cover your input cases. With unit tests, you’ll be able to modify the regular expression with confidence that you’re not breaking existing functionality.
Solution: If you tell .NET to compile them into their own assembly, they’ll run a bit faster, especially for complex expressions. Here’s an example:
Regex regex = new Regex(pattern, RegexOptions.Compiled);