Chapter 22. Regular Expressions

Each character matches itself, unless it is one of the special characters +?.*^$()[{|. The special meaning of these characters can be escaped using a .

The multiline and single-line modes are discussed in the section Chapter 23.

.

Matches any character, but not a newline. In singleline mode, matches newlines as well.

( . . . )

Groups a series of pattern elements to a single element. The text the group matches is captured for later use. It is also assigned immediately to $^N to be used during the match, e.g., in a (?{ ... }).

^

Matches the beginning of the target. In multiline mode, also matches after every newline character.

$

Matches the end of the line, or before a final newline character. In multiline mode, also matches before every newline character.

[ . . . ]

Denotes a class of characters to match. [^ . . . ] negates the class.

... | ... | ...

Matches the alternatives from left to right, until one succeeds.

(?# text )

Comment.

(? [ modifier ] : pattern )

Acts like (pattern) but does not capture the text it matches. modifier can be one or more of i, m, s, or x. Modifiers can be switched off by preceding the letter(s) with a minus sign, e.g., si-xm. See page 37 for the meaning of the modifiers.

(?= pattern )

Zero-width positive look-ahead assertion.

(?! pattern )

Zero-width negative look-ahead assertion.

(?<= pattern )

Zero-width positive look-behind assertion.

(?<! pattern )

Zero-width negative look-behind assertion.

(?{ code })

Executes Perl code while matching. Always succeeds with zero width. Can be used as the condition in a conditional pattern selection. If not, the result of executing code is stored in $^R.

(??{ code })

Executes Perl code while matching. Interprets the result as a pattern.

(?> pattern )

Like (?: pattern ), but prevents backtracking inside.

(?( cond ) ptrue [ | pfalse ] )

Selects a pattern depending on the condition. cond should be the number of a parenthesized subpattern, or one of the zero-width look-ahead, look-behind, and evaluate assertions.

(? modifier )

Embedded pattern-match modifier. modifier can be one or more of i, m, s, or x. Modifiers can be switched off by preceding the letter(s) with a minus sign, e.g., (?si-xm).

Quantified subpatterns match as many times as possible. When followed with a ? they match the minimum number of times. These are the quantifiers:

+

Matches the preceding pattern element one or more times.

?

Matches zero or one times.

*

Matches zero or more times.

{n,m}

Denotes the minimum n and maximum m match count. {n} means exactly n times; {n,} means at least n times.

Patterns are processed as double-quoted strings, so standard string escapes have their usual meaning (see Chapter 6). An exception is , which matches word boundaries, except in a character class, where it denotes a backspace again.

A escapes any special meaning of nonalphanumeric characters, but it turns most alphanumeric characters into something special:

1. . . 9

Refer to matched subexpressions, grouped with (). 10 and up can also be used if the pattern has that many subexpressions.

w

Matches alphanumeric plus _. W matches non-w.

s

Matches whitespace. S matches nonwhitespace.

d

Matches numeric. D matches nonnumeric.

A

Matches the beginning of the string.



Matches the end of the string or before a newline at the end of the string.

z

Matches the physical end of the string.



Matches word boundaries. B matches nonboundaries.

G

Matches where the previous search with a g modifier left off.

pp

Matches a named property. Pp matches non-p. Use p{prop} for names longer than one single character.

X

Matches extended Unicode combining character sequence.

C

Matches a single 8-bit byte.

1 and up, d, D, p, P, s, S, w, and W may be used inside and outside character classes.

POSIX classes are used inside character classes, like [[:alpha:]]. These are the POSIX classes and their Unicode property names:

[:alpha:] p{IsAlpha}

Matches one alphabetic character.

[:alnum:] p{IsAlnum}

Matches one alphanumeric character.

[:ascii:] p{IsASCII}

Matches one ASCII character.

[:blank:] p{IsSpace}

Matches one whitespace character, almost like s.

[:cntrl:] p{IsCntrl}

Matches one control character.

[:digit:] p{IsDigit}

Matches one numeric character, like d.

[:graph:] p{IsGraph}

Matches one alphanumeric or punctuation character.

[:lower:] p{IsLower}

Matches one lowercase character.

[:print:] p{IsPrint}

Matches one alphanumeric or punctuation character or space character.

[:punct:] p{IsPunct}

Matches one punctuation character.

[:space:] p{IsSpace}

Matches one whitespace character, almost like s.

[:upper:] p{IsUpper}

Matches one uppercase character.

[:word:] p{IsWord}

Matches one word character, like w.

[:xdigit:] p{IsXDigit}

Matches one hexadecimal digit.

In general, the "Is" prefix may be omitted for property names.

The equivalent of s is p{IsSpacePerl}.

The POSIX classes can be negated with a ^, e.g., [:^print:], the named properties by using P, e.g., P{IsPrint}.

See also $1. . . $9, $+, $` , $&, $', $^R, and $^N, and @- and @+.

With modifier x, whitespace and comments can be embedded in the patterns.

Regular expression patterns can be compiled and used as values with the qr quoting operator: qr/string/modifiers compiles string as a pattern according to the (optional) modifiers, and returns the compiled pattern as a scalar value.

perlre, perlretut, perlrequick, perlunicode.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset