Tcl’s quoting characters allow special interpretation of the characters they quote. There are also quoting characters for regular expressions used in the regexp and regsub commands. Most troublesome are the quoting characters that are special to both Tcl and regular expressions.
Regular expression processing with regexp and regsub makes short work of parsing strings. However, regular expressions can be daunting to read and construct. Mastering Regular Expressions, by Jeffrey E.F. Friedl (O’Reilly & Associates) explains regular expressions in detail, including one chapter devoted to Tcl regular expressions.
Care must be taken when constructing regular expressions, keeping in mind that unquoted regular expression strings also make their normal trip through Tcl’s parser. Since the backslash (“”) character quotes both Tcl and regular expression characters, it must be doubled for use in regular expressions. In order to match a single backslash character in a regular expression, four backslash characters are required.
The following table lists examples of matching certain characters, the regular expression, and the Tcl coding of regexp.
Character to Match | Regular Expression | Tcl with Unquoted Argument | Tcl with Quoted Argument |
Single character |
|
|
|
Single character [ |
|
|
|
single character $ |
|
|
|
Additional quoting gymnastics occur when a Tcl variable is included in the regular expression. It’s often useful to build up regular expressions in Tcl variables, then use the final variable as part of the regexp or regsub command:
# find phone numbers 888-555-1212, 888.555.1212, (888) 555-1212 set n {[0-9]} ;# re to match a single digit set n3 $n$n$n ;# a group of three digits set n4 $n$n$n$n ;# and four digits set phone1 "$n3-$n3-$n4" set phone2 "$n3\.$n3\.$n4" set phone3 "\($n3\) ?$n3-$n4" set allPhones "$phone1|$phone2|$phone3" regexp $allPhones $teststring
The key to remember is that each command makes one trip through Tcl’s variable and command expansion prior to the command’s execution. In the case of regexp and regsub, another round of command-specific string interpretation is performed.