17. Some Advanced Perl Techniques

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 17. Some Advanced Perl Techniques

What we’ve put in the rest of this book is the core of Perl, the part that every Perl user should understand. But there are a few other techniques that, while not obligatory, are still valuable tools to have in your toolbox. We’ve gathered the most important of those for this chapter.

Don’t be misled by the title of the chapter, though; the techniques here aren’t especially more difficult to understand than what we have elsewhere. They are “advanced” merely in the sense that they aren’t necessary for beginners. The first time you read this book, you may want to skip (or skim) this chapter so you can get right to using Perl. Come back to it a month or two later, when you’re ready to get even more out of Perl. Consider this entire chapter a huge footnote^[1].

Trapping Errors with eval

Sometimes, your ordinary, everyday code can cause a fatal error in your program. Each of these typical statements could crash a program:

$barney = $fred / $dino;         # divide-by-zero error?

print "match
" if /^($wilma)/;  # illegal regular expression error?

open CAVEMAN, $fred              # user-generated error from die?
  or die "Can't open file '$fred' for input: $!";

You could go to some trouble to catch some of these, but it’s hard to get them all. (How could you check the string $wilma from that example to ensure that it makes a valid regular expression?) Fortunately, Perl provides a simple way to catch fatal errors: wrap the code in an eval block:

eval { $barney = $fred / $dino } ;

Now, even if $dino is zero, that line won’t crash the program. The eval is actually an expression (not a control structure, like while or foreach) so that semicolon is required at the end of the block.

When a normally fatal error happens during the execution of an eval block, the block is done running, but the program doesn’t crash. So that means that right after an eval finishes, you’ll be wanting to know whether it exited normally or whether it caught a fatal error for you. The answer is in the special $@ variable. If the eval caught a fatal error, $@ will hold what would have been the program’s dying words, perhaps something like: Illegal division by zero at my_program line 12. If there was no error, $@ will be empty. Of course, that means that $@ is a useful Boolean (true/false) value, true if there was an error, so you’ll sometimes see code like this after an eval block:

print "An error occurred: $@" if $@;

The eval block is a true block, so it makes a new scope for lexical (my) variables. This piece of a program shows an eval block hard at work:

foreach my $person (qw/ fred wilma betty barney dino pebbles /) {
  eval {
    open FILE, "<$person"
      or die "Can't open file '$person': $!";

    my($total, $count);

    while (<FILE>) {
      $total += $_;
      $count++;
    }

    my $average = $total/$count;
    print "Average for file $person was $average
";

    &do_something($person, $average);
  };

  if ($@) {
    print "An error occurred ($@), continuing
";
  }
}

How many possible fatal errors are being trapped here? If there is an error in opening the file, that error is trapped. Calculating the average may divide by zero, so that error is trapped. Even the call to the mysteriously named &do_something subroutine will be protected against fatal errors, because an eval block traps any otherwise-fatal errors that occur during the time that it’s active. (This feature is handy if you have to call a subroutine written by someone else, and you don’t know whether they’ve coded defensively enough to avoid crashing your program.)

If an error occurs during the processing of one of the files, we’ll get an error message, but the program will go on to the next file without further complaint.

You can nest eval blocks inside other eval blocks. The inner one traps errors while it runs, keeping them from reaching the outer blocks. (Of course, after the inner eval finishes, if it caught an error, you may wish to re-post the error by using die, thereby letting the outer eval catch it.) An eval block traps any errors that occur during its execution, including errors that happen during subroutine calls (as we saw in the example earlier).

We mentioned earlier that the eval is an expression, which is why the trailing semicolon is needed after the closing curly brace. But since it’s an expression, it has a return value. If there’s no error, it’s like a subroutine: the return value is the last expression evaluated, or it’s returned early with an optional return keyword. Here’s another way to do the math without having to worry about divide-by-zero:

my $barney = eval { $fred / $dino };

If the eval traps a fatal error, the return value is either undef or an empty list, depending upon the context. So in the previous example, $barney is either the correct result from dividing, or it’s undef; we don’t really need to check $@ (although it’s probably a good idea to check defined($barney) before we use it further).

There are four kinds of problems that eval can’t trap. The first group are the very serious errors that crash Perl itself, such as running out of memory or getting an untrapped signal. Since Perl itself isn’t running, there’s no way it can trap these errors.^[2]

Of course, syntax errors inside the eval block are caught at compile time—they’re never returned in $@.

The exit operator terminates the program at once, even if it’s called from a subroutine inside an eval block. (This correctly implies that when writing a subroutine, you should use die rather than exit to signal when something goes wrong.)

The fourth and final kind of problem that an eval block can’t trap are warnings, either user-generated ones (from warn) or Perl’s internally generated warnings (requested with the -w command-line option or the use warnings pragma). There’s a separate mechanism from eval for trapping warnings; see the discussion of the __WARN__ pseudosignal in the Perl documentation for the details.

We should also mention that there’s another form of eval that can be dangerous if it’s mishandled. In fact, you’ll sometimes run across someone who will say that you shouldn’t use eval in your code for security reasons. They’re (mostly) right that eval should be used only with great care, but they’re talking about the other form of eval, sometimes called "eval of a string”. If the keyword eval is followed directly by a block of code in curly braces, as we’re doing here, there’s no need to worry—that’s the safe kind of eval.

Picking Items from a List with grep

Sometimes you’ll want only certain items from a list. Maybe it’s only the odd numbers selected from a list of numbers, or maybe it’s only the lines mentioning Fred from a file of text. As we’ll see in this section, picking some items from a list can be done simply with the grep operator.

Let’s try that first one and get the odd numbers from a large list of numbers. We don’t need anything new to do that:

my @odd_numbers;

foreach (1..1000) {
  push @odd_numbers, $_ if $_ % 2;
}

That code uses the modulus operator (%), which we saw in Chapter 2. If a number is even, that number “mod two” gives zero, which is false. But an odd number will give one; since that’s true, only the odd numbers will be pushed onto the array.

Now, there’s nothing wrong with that code as it stands—except that it’s a little longer to write and slower to run than it might be, since Perl provides the grep operator:

my @odd_numbers = grep { $_ % 2 } 1..1000;

That line gets a list of 500 odd numbers in one quick line of code. How does it work? The first argument to grep is a block that uses $_ as a placeholder for each item in the list, and returns a Boolean (true/false) value. The remaining arguments are the list of items to search through. The grep operator will evaluate the expression once for each item in the list, much as our original foreach loop did. For the ones where the last expression of the block returns a true value, that element is included in the list that results from grep.

While the grep is running, $_ is aliased to one element of the list after another. We’ve seen this behavior before, in the foreach loop. It’s generally a bad idea to modify $_ inside the grep expression, because this will damage the original data.

The grep operator shares its name with a classic Unix utility that picks matching lines from a file by using regular expressions. We can do that with Perl’s grep, which is much more powerful. Here we pull only the lines mentioning fred from a file:

my @matching_lines = grep { /fred/i } <FILE>;

There’s a simpler syntax for grep, too. If all you need for the selector is a simple expression (rather than a whole block), you can just use that expression, followed by a comma, in place of the block. Here’s the simpler way to write that latest example:

my @matching_lines = grep /fred/i, <FILE>;

Transforming Items from a List with map

Another common task is transforming items from a list. For example, suppose you have a list of numbers that should be formatted as “money numbers” for output, as with the subroutine &big_money (from Chapter 15). But we don’t want to modify the original data; we need a modified copy of the list just for output. Here’s one way to do that:

my @data = (4.75, 1.5, 2, 1234, 6.9456, 12345678.9, 29.95);
my @formatted_data;

foreach (@data) {
  push @formatted_data, &big_money($_);
}

That looks similar in form to the example code used at the beginning of the section on grep, doesn’t it? So it may not surprise you that the replacement code resembles the first grep example:

my @data = (4.75, 1.5, 2, 1234, 6.9456, 12345678.9, 29.95);

my @formatted_data = map { &big_money($_) } @data;

The map operator looks much like grep because it has the same kind of arguments: a block that uses $_, and a list of items to process. And it operates in a similar way, evaluating the block once for each item in the list, with $_ aliased to a different original list element each time. But the last expression of the block is used differently; instead of giving a Boolean value, the final value actually becomes part of the resulting list.^[3]

Any grep or map statement could be rewritten as a foreach loop pushing items onto a temporary array. But the shorter way is typically more efficient and more convenient. Since the result of map or grep is a list, it can be passed directly to another function. Here we can print that list of formatted “money numbers” as an indented list under a heading:

print "The money numbers are:
",
  map { sprintf("%25s
", $_) } @formatted_data;

Of course, we could have done that processing all at once, without even the temporary array @formatted_data:

my @data = (4.75, 1.5, 2, 1234, 6.9456, 12345678.9, 29.95);
print "The money numbers are:
",
  map { sprintf("%25s
", &big_money($_) ) } @data;

As we saw with grep, there’s also a simpler syntax for map. If all you need for the selector is a simple expression (rather than a whole block), you can just use that expression, followed by a comma, in place of the block:

print "Some powers of two are:
",
  map "	" . ( 2 ** $_ ) . "
", 0..15;

Unquoted Hash Keys

Perl offers many shortcuts that can help the programmer. Here’s a handy one: you may omit the quote marks on some hash keys.

Of course, you can’t omit the quote marks on just any key, since a hash key may be any arbitrary string. But keys are often simple. If the hash key is made up of nothing but letters, digits, and underscores without starting with a digit, you may be able to omit the quote marks. This kind of simple string without quote marks is called a bareword , since it stands alone without quotes.

One place you are permitted to use this shortcut is the most common place a hash key appears: in the curly braces of a hash element reference. For example, instead of $score{"fred"}, you could write simply $score{fred}. Since many hash keys are simple like this, not using quotes is a real convenience. But beware; if there’s anything inside the curly braces besides a bareword, Perl will interpret it as an expression.

Another place where hash keys appear is when assigning an entire hash using a list of key-value pairs. The big arrow (=>) is especially useful between a key and a value, because (again, only if the key is a bareword) the big arrow quotes it for you:

# Hash containing bowling scores
my %score = (
  barney   => 195,
  fred     => 205,
  dino     => 30,
);

This is the one important difference between the big arrow and a comma; a bareword to the left of the big arrow is implicitly quoted. (Whatever is on the right is left alone, though.) This feature of the big arrow doesn’t have to be used only for hashes, although that’s the most frequent use.

More Powerful Regular Expressions

After already reading three chapters about regular expressions, you know that they’re a powerful feature in the core of Perl. But there are even more features that the Perl developers have added; we’ll see some of the most important ones in this section. At the same time, you’ll see a little more about the internal operation of the regular expression engine.

Non-greedy Quantifiers

The four quantifiers we’ve already seen (in Chapter 8) are all greedy. That means that they match as much as they can, only to reluctantly give some back if that’s necessary to allow the overall pattern to succeed. Here’s an example: Suppose you’re using the pattern /fred.+barney/ on the string fred and barney went bowling last night. Of course, we know that the regular expression will match that string, but let’s see how it goes about it.^[4]

First, of course, the subpattern fred matches the identical literal string. The next part of the pattern is the .+, which matches any character except newline, at least one time. But the plus quantifier is greedy; it prefers to match as much as possible. So it immediately matches all of the rest of the string, including the word night. (This may surprise you, but the story isn’t over yet.)

Now the subpattern barney would like to match, but it can’t—we’re at the end of the string. But since the .+ could still be successful even if it matched one fewer character, it reluctantly gives back the letter t at the end of the string. (It’s greedy, but it wants the whole pattern to succeed even more than it wants to match everything all by itself.)

The subpattern barney tries again to match, and still can’t. So the .+ gives back the letter h and lets it try again. One character after another, the .+ gives back what it matched until finally it gives up all of the letters of barney. Now, finally, the subpattern barney can match, and the overall match succeeds.

Regular expression engines do a lot of backtracking like that, trying every different way of fitting the pattern to the string until one of them succeeds, or until none of them has.^[5] But as you could see from this example, that can involve a lot of backtracking, as the quantifier gobbles up too much of the string and has to be forced to return some of it.

For each of the greedy quantifiers, though, there’s also a non-greedy quantifier available. Instead of the plus (+), we can use the non-greedy quantifier +?, which matches one or more times (just as the plus does), except that it prefers to match as few times as possible, rather than as many as possible. Let’s see how that new quantifier works when the pattern is rewritten as /fred.+?barney/.

Once again, fred matches right at the start. But this time the next part of the pattern is .+?, which would prefer to match no more than one character, so it matches just the space after fred. The next subpattern is barney, but that can’t match here (since the string at the current position begins with and barney...). So the .+? reluctantly matches the a and lets the rest of the pattern try again. Once again, barney can’t match, so the .+? accepts the letter n and so on. Once the .+? has matched five characters, barney can match, and the pattern is a success.

There was still some backtracking, but since the engine had to go back and try again just a few times, it should be a big improvement in speed. Well, it’s an improvement if you’ll generally find barney near fred. If your data often had fred near the start of the string and barney only at the end, the greedy quantifier might be a faster choice. In the end, the speed of the regular expression depends upon the data.

But the non-greedy quantifiers aren’t just about efficiency. Although they’ll always match (or fail to match) the same strings as their greedy counterparts, they may match different amounts of the strings. For example, suppose you had some HTML-like^[6] text, and you want to remove all of the tags <BOLD> and </BOLD>, leaving their contents intact. Here’s the text:

I'm talking about the cartoon with Fred and <BOLD>Wilma</BOLD>!

And here’s a substitution to remove those tags. But what’s wrong with it?

s#<BOLD>(.*)</BOLD>#$1#g;

The problem is that the star is greedy.^[7] What if the text had said this instead?

I thought you said Fred and <BOLD>Velma</BOLD>, not <BOLD>Wilma</BOLD>

In that case, the pattern would match from the first <BOLD> to the last </BOLD>, leaving intact the ones in the middle of the line. Oops! Instead, we want a non-greedy quantifier. The non-greedy form of star is *?, so the substitution now looks like this:

s#<BOLD>(.*?)</BOLD>#$1#g;

And it does the right thing.

Since the non-greedy form of the plus was +? and the non-greedy form of the star was *?, you’ve probably realized that the other two quantifiers look similar. The non-greedy form of any curly-brace quantifier looks the same, but with a question mark after the closing brace, like {5,10}? or {8,}?.^[8] And even the question-mark quantifier has a non-greedy form: ??. That matches either once or not at all, but it prefers not to match anything.

Matching Multiple-line Text

Classic regular expressions were used to match just single lines of text. But since Perl can work with strings of any length, Perl’s patterns can match multiple lines of text as easily as single lines. Of course, you have to include an expression that holds more than one line of text. Here’s a string that’s four lines long:

$_ = "I'm much better
than Barney is
at bowling,
Wilma.
";

Now, the anchors ^ and $ are normally anchors for the start and end of the whole string (see Section 8.3 in Chapter 8). But the /m regular expression option lets them match at internal newlines as well (think m for multiple lines). This makes them anchors for the start and end of each line, rather than the whole string. So this pattern can match:

print "Found 'wilma' at start of line
" if /^wilma/im;

Similarly, you could do a substitution on each line in a multiline string. Here, we read an entire file into one variable,^[9] then add the file’s name as a prefix at the start of each line:

open FILE, $filename
  or die "Can't open '$filename': $!";
my $lines = join '', <FILE>;
$lines =~ s/^/$filename: /gm;

Slices

It often happens that we need to work with only a few elements from a given list. For example, the Bedrock Library keeps information about their patrons in a large file.^[10] Each line in the file describes one patron with six colon-separated fields: a person’s name, library card number, home address, home phone number, work phone number, and number of items currently checked out. A little bit of the file looks something like this:

fred flintstone:2168:301 Cobblestone Way:555-1212:555-2121:3
barney rubble:709918:3128 Granite Blvd:555-3333:555-3438:0

One of the library’s applications needs only the card numbers and number of items checked out; it doesn’t use any of the other data. It could use code something like this to get only the fields it needs:

while (<FILE>) {
  chomp;
  my @items = split /:/;
  my($card_num, $count) = ($items[1], $items[5]);
  ...  # now work with those two variables
}

But the array @items isn’t needed for anything else; it seems like a waste.^[11] Maybe it would be better to assign the result of split to a list of scalars, like this:

my($name, $card_num, $addr, $home, $work, $count) = split /:/;

Well, that avoids the unneeded array @items—but now we have four scalar variables that we didn’t really need. For this situation, some people used to make up a number of dummy variable names, like $dummy_1, that showed that they really didn’t care about that element from the split. But Larry thought that that was too much trouble, so he added a special use of undef . If an item in a list being assigned to is undef, that means simply to ignore the corresponding element of the source list:

my(undef, $card_num, undef, undef, undef, $count) = split /:/;

Is this any better? Well, it has an advantage that there aren’t any unneeded variables. But it has the disadvantage that you have to count undefs to tell which element is $count. And this becomes quite unwieldy if there are more elements in the list. For example, some people who wanted just the mtime value from stat were writing code like this:

my(undef, undef, undef, undef, undef, undef, undef, 
  undef, undef, $mtime) = stat $some_file;

If you use the wrong number of undefs, you’ll get the atime or ctime by mistake, and that’s a tough one to debug. There’s a better way: Perl can index into a list as if it were an array. This is a list slice. Here, since the mtime is item 9 in the list returned by stat,^[12] we can get it with a subscript:

my $mtime = (stat $some_file)[9];

Those parentheses are required around the list of items (in this case, the return value from stat). If you wrote it like this, it wouldn’t work:

my $mtime = stat($some_file)[9];  # Syntax error!

A list slice has to have a subscript expression in square brackets after a list in parentheses. The parentheses holding the arguments to a function call don’t count.

Going back to the Bedrock Library, the list we’re working with is the return value from split. We can now use a slice to pull out item 1 and item 5 with subscripts:

my $card_num = (split /:/)[1];
my $count = (split /:/)[5];

Using a scalar-context slice like this (pulling just a single element from the list) isn’t bad, but it would be more efficient and simpler if we didn’t have to do the split twice. So let’s not do it twice; let’s get both values at once by using a list slice in list context:

my($card_num, $count) = (split /:/)[1, 5];

The indices pull out element 1 and element 5 from the list, returning those as a two-element list. When that’s assigned to the two my variables, we get exactly what we wanted. We do the slice just once, and we set the two variables with a simple notation.

A slice is often the simplest way to pull a few items from a list. Here, we can pull just the first and last items from a list, using the fact that index -1 means the last element:^[13]

my($first, $last) = (sort @names)[0, -1];

The subscripts of a slice may be in any order and may even repeat values. This example pulls five items from a list of ten:

my @names = qw{ zero one two three four five six seven eight nine };
my @numbers = ( @names )[ 9, 0, 2, 1, 0 ];
print "Bedrock @numbers
";  # says Bedrock nine zero two one zero

Array Slice

That previous example could be made even simpler. When slicing elements from an array (as opposed to a list), the parentheses aren’t needed. So we could have done the slice like this:

my @numbers = @names[ 9, 0, 2, 1, 0 ];

This isn’t merely a matter of omitting the parentheses; this is actually a different notation for accessing array elements: an array slice. Earlier (in Chapter 3), we said that the at-sign on @names meant “all of the elements.” Actually, in a linguistic sense, it’s more like a plural marker, much like the letter “s” in words like “cats” and “dogs.” In Perl, the dollar sign means there’s just one of something, but the at-sign means there’s a list of items.

A slice is always a list, so the array slice notation uses an at-sign to indicate that. When you see something like @names[ ... ] in a Perl program, you’ll need to do just as Perl does and look at the at-sign at the beginning as well as the square brackets at the end. The square brackets mean that you’re indexing into an array, and the at-sign means that you’re getting a whole list^[14] of elements, not just a single one (which is what the dollar sign would mean). See Figure 17-1.

Figure 17-1. Array slices versus single elements

The punctuation mark at the front of the variable reference (either the dollar sign or at-sign) determines the context of the subscript expression. If there’s a dollar sign in front, the subscript expression is evaluated in a scalar context to get an index. But if there’s an at-sign in front, the subscript expression is evaluated in a list context to get a list of indices.

So we see that @names[ 2, 5 ] means the same list as ($names[2], $names[5]) does. If you want that list of values, you can simply use the array slice notation. Any place you might want to write the list, you can instead use the simpler array slice.

But the slice can be used in one place where the list can’t: a slice may be interpolated directly into a string:

my @names = qw{ zero one two three four five six seven eight nine };
print "Bedrock @names[ 9, 0, 2, 1, 0 ]
";

If we were to interpolate @names, that would give all of the items from the array, separated by spaces. If instead we interpolate @names[ 9, 0, 2, 1, 0 ], that gives just those items from the array, separated by spaces.^[15]

Let’s go back to the Bedrock Library for a moment. Maybe now our program is updating Mr. Slate’s address and phone number in the patron file, because he just moved into a large new place in the Hollyrock hills. If we’ve got a list of information about him in @items, we could do something like this to update just those two elements of the array:

my $new_home_phone = "555-6099";
my $new_address = "99380 Red Rock West";
@items[2, 3] = ($new_address, $new_home_phone);

Once again, the array slice makes a more compact notation for a list of elements. In this case, that last line is the same as an assignment to ($items[2], $items[3]), but more compact and efficient.

Hash Slice

In a way exactly analogous to an array slice, we can also slice some elements from a hash in a hash slice. Remember when three of our characters went bowling, and we kept their bowling scores in the %score hash? We could pull those scores with a list of hash elements or with a slice. These two techniques are equivalent, although the second is more concise and efficient:

my @three_scores = ($score{"barney"}, $score{"fred"}, $score{"dino"});

my @three_scores = @score{ qw/ barney fred dino/ };

A slice is always a list, so the hash slice notation uses an at-sign to indicate that.^[16] When you see something like @score{ ... } in a Perl program, you’ll need to do just as Perl does and look at the at-sign at the beginning as well as the curly braces at the end. The curly braces mean that you’re indexing into a hash; the at-sign means that you’re getting a whole list of elements, not just a single one (which is what the dollar sign would mean). See Figure 17-2.

Figure 17-2. Hash slices versus single elements

As we saw with the array slice, the punctuation mark at the front of the variable reference (either the dollar sign or at-sign) determines the context of the subscript expression. If there’s a dollar sign in front, the subscript expression is evaluated in a scalar context to get a single key.^[17] But if there’s an at-sign in front, the subscript expression is evaluated in a list context to get a list of keys.

It’s normal at this point to wonder why there’s no percent sign (”%“) here, when we’re talking about a hash. That’s the marker that means there’s a whole hash; a hash slice (like any other slice) is always a list, not a hash.^[18] In Perl, the dollar sign means there’s just one of something, but the at-sign means there’s a list of items, and the percent sign means there’s an entire hash.

As we saw with array slices, a hash slice may be used instead of the corresponding list of elements from the hash, anywhere within Perl. So we can set our friends’ bowling scores in the hash (without disturbing any other elements in the hash) in this simple way:

my @players = qw/ barney fred dino /;
my @bowling_scores = (195, 205, 30);
@score{ @players } = @bowling_scores;

That last line does the same thing as if we had assigned to the three-element list ($score{"barney"}, $score{"fred"}, $score{"dino"}).

A hash slice may be interpolated, too. Here, we print out the scores for our favorite bowlers:

print "Tonight's players were: @players
";
print "Their scores were: @score{@players}
";

Exercise

See Section A.16 for an answer to the following exercise:

[30] Make a program that reads a list of strings from a file, one string per line, and then lets the user interactively enter patterns that may match some of the strings. For each pattern, the program should tell how many strings from the file matched, then which ones those were. Don’t re-read the file for each new pattern; keep the strings in memory. The filename may be hard-coded in the file. If a pattern is invalid (for example, if it has unmatched parentheses), the program should simply report that error and let the user continue trying patterns. When the user enters a blank line instead of a pattern, the program should quit. (If you need a file full of interesting strings to try matching, try the file sample_text in the files you’ve surely downloaded by now from the O’Reilly website; see the Preface.)

^[1]We contemplated doing that in one of the drafts, but got firmly rejected by O’Reilly’s editors.

^[2]Some of these errors are listed with an (X) code on the perldiag manpage, if you’re curious.

^[3]One other important difference is that the expression used by map is evaluated in a list context and may return any number of items, not necessarily one each time.

^[4]The regular expression engine makes a few optimizations that make the true story different than we tell it here, and those optimizations change from one release of Perl to the next. You shouldn’t be able to tell from the functionality that it’s not doing as we say, though. If you want to know how it really works, you should read the latest source code. Be sure to submit patches for any bugs you find.

^[5]In fact, some regular expression engines try every different way, even continuing on after they find one that fits. But Perl’s regular expression engine is primarily interested in whether the pattern can or cannot match, so finding even one match means that the engine’s work is done. Again, see Jeffrey Friedl’s Mastering Regular Expressions.

^[6]Once again, we aren’t using real HTML because you can’t correctly parse HTML with simple regular expressions. If you really need to work with HTML or a similar markup language, use a module that’s made to handle the complexities.

^[7]There’s another possible problem: we should have used the /s modifier as well, since the end tag may be on a different line than the start tag. It’s a good thing that this is just an example; if we were writing something like this for real, we would have taken our own advice and used a well-written module.

^[8]In theory, there’s also a non-greedy quantifier form that specifies an exact number, like {3}?. But since that says to match exactly three of the preceding item, it has no flexibility to be either greedy or non-greedy.

^[9]Hope it’s a small one. The file, that is, not the variable.

^[10]It should really be a full-featured database rather than a flat file. They plan to upgrade their system, right after the next Ice Age.

^[11]It’s not much of a waste, really. But stay with us. All of these techniques are used by programmers who don’t understand slices, so it’s worthwhile to see all of them here.

^[12]It’s the tenth item, but the index number is 9, since the first item is at index 0. This is the same kind of zero-based indexing that we’ve used already with arrays.

^[13]Sorting a list merely to find the extreme elements isn’t likely to be the most efficient way. But Perl’s sort is fast enough that this is generally acceptable, as long as the list doesn’t have more than a few hundred elements.

^[14]Of course, when we say “a whole list,” that doesn’t necessarily mean more elements than one—the list could be empty, after all.

^[15]More accurately, the items of the list are separated by the contents of Perl’s $" variable, whose default is a space. This should not normally be changed. When interpolating a list of values, Perl internally does join $", @list, where @list stands in for the list expression.

^[16]If it sounds as if we’re repeating ourselves here, it’s because we want to emphasize that hash slices are analogous to array slices. If it sounds as if we’re not repeating ourselves here, it’s because we want to emphasize that hash slices are analogous to array slices.

^[17]There’s an exception you’re not likely to run across, since it isn’t used much in modern Perl code. See the entry for $; in the perlvar manpage.

^[18]A hash slice is a slice (not a hash) in the same way that a house fire is a fire (not a house), while a fire house is a house (not a fire). More or less.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 17. Some Advanced Perl Techniques

Create new playlist

Sign In

Sign Up