This appendix contains the answers to the excerses that appear throughout the book.
Here’s one way to do it:
#!/usr/bin/perl -w $pi = 3.141592654; $circ = 2 * $pi * 12.5; print "The circumference of a circle of radius 12.5 is $circ. ";
As you see, we started this program with a typical
#!
line; your path to Perl may vary. We also
turned on warnings.
The first real line of code sets the value of $pi
to our value of π. There are several reasons a good programmer
will prefer to use a constant[1] value like
this: it takes time to type 3.141592654
into your
program if you ever need it more than once. It may be a mathematical
bug if you accidentally used 3.141592654
in one
place and 3.14159
in another. There’s only
one line to check on to make sure you didn’t accidentally type
3.141952654
and send your space probe to the wrong
planet. It’s easier to type $pi
than π,
especially if you don’t have Unicode. And it will be easy to
maintain the program in case the value of π ever
changes.[2]
Next we calculate the circumference, storing it into
$circ
, and we print it out in a nice message. The
message ends with a newline character, because every line of a good
program’s output should end with a newline. Without it, you
might end up with output looking something like this, depending upon
your shell’s prompt:
The circumference of a circle of radius 12.5 is 78.53981635.bash-2.01$[]
The box represents the input cursor, blinking at the end of the line,
and that’s the shell’s prompt at the end of the
message.[3] Since the
circumference isn’t really
78.53981635.bash-2.01$
, this should probably be
construed as a bug. So use
at the end of each
line of output.
Here’s one way to do it:
#!/usr/bin/perl -w $pi = 3.141592654; print "What is the radius? "; chomp($radius = <STDIN>); $circ = 2 * $pi * $radius; print "The circumference of a circle of radius $radius is $circ. ";
This is just like the last one, except that now we ask the user for
the radius, and then we use $radius
in every place
where we previously used the hard-coded value
12.5
. If we had written the first program with
more foresight, in fact, we would have had a variable named
$radius
in that one as well. Note that we
chomp
ed the line of input. If we hadn’t, the
mathematical formula would still have worked, because a string like
"12.5
"
is converted to the number
12.5
without any problem. But when we print out
the message, it would look like this:
The circumference of a circle of radius 12.5 is 78.53981635.
Notice that the newline character is still in
$radius
, even though we’ve used that
variable as a number. Since we had a space between
$radius
and the word
"is
" in the print
statement, there’s a space at the beginning of the second line
of output. The moral of the story is: chomp
your
input unless you have a reason not to do that.
Here’s one way to do it:
#!/usr/bin/perl -w $pi = 3.141592654; print "What is the radius? "; chomp($radius = <STDIN>); $circ = 2 * $pi * $radius; if ($radius < 0) { $circ = 0; } print "The circumference of a circle of radius $radius is $circ. ";
Here we added the check for a bogus radius. Even if the given radius was impossible, the returned circumference will at least be nonnegative. You could have changed the given radius to be zero, and then calculated the circumference, too; there’s more than one way to do it. In fact, that’s the Perl motto: There Is More Than One Way To Do It. And that’s why each exercise answer starts with “Here’s one way to do it.”
Here’s one way to do it:
print "Enter first number: "; chomp($one = <STDIN>); print "Enter second number: "; chomp($two = <STDIN>); $result = $one * $two; print "The result is $result. ";
Notice that we’ve left off the #!
line for
this answer. In fact, from here on, we’ll assume that you know
it’s there, so you don’t need to read it each time.
Perhaps those are poor choices for variable names. In a large
program, a maintenance programmer might think that
$two
should have the value of
2
. In this short program, it probably
doesn’t matter, but in a large one we could have called them
something more descriptive, with names like
$first_response
.
In this program, it wouldn’t make any difference if we forgot
to chomp
the two variables $one
and $two
, since we never use them as strings once
they’ve been set. But if next week our maintenance programmer
edits the program to print a message like: The result of
multiplying $one by $two is $result.
, those pesky
newlines will come back to haunt us. Once again,
chomp
unless you have a reason not to
chomp
[4]—like in the next exercise.
Here’s one way to do it:
print "Enter a string: "; $str = <STDIN>; print "Enter a number of times: "; chomp($num = <STDIN>); $result = $str x $num; print "The result is: $result";
This program is almost the same as the last one, in a sense.
We’re “multiplying” a string by a number of times.
So we’ve kept the structure of the previous exercise. In this
case, though, we didn’t want to chomp
the
first input item—the string—because the exercise asked
for the strings to appear on separate lines. So, if the user entered
fred
and a newline for the string, and
3
for the number, we’d get a newline after
each fred
just as we wanted.
In the print
statement at the end, we put the
newline before $result
because we wanted to have
the first fred,
printed on a line of its own. That
is, we didn’t want output like this, with only two of the three
fred
s aligned in a column:
The result is: fred fred fred
At the same time, we didn’t need to put another newline at the
end of the print
output because
$result
should already end with a newline.
In most cases, Perl won’t mind where you put spaces in your
program; you can put in spaces or leave them out. But it’s
important not to accidentally spell the wrong thing! If the
x
runs up against the preceding variable name
$str
, Perl will see $strx
,
which won’t work.
Here’s one way to do it:
print "Enter some lines, then press Ctrl-D: "; # or maybe Ctrl-Z @lines = <STDIN>; @reverse_lines = reverse @lines; print @reverse_lines;
...or, even more simply:
print "Enter some lines, then press Ctrl-D: "; print reverse <STDIN>;
Most Perl programmers would prefer the second one, as long as you don’t need to keep the list of lines around for later use.
Here’s one way to do it:
@names = qw/ fred betty barney dino wilma pebbles bamm-bamm /; print "Enter some numbers from 1 to 7, one per line, then press Ctrl-D: "; chomp(@numbers = <STDIN>); foreach (@numbers) { print "$names[ $_ - 1 ] "; }
We have to subtract one from the index number so that the user can
count from 1 to 7 even though the array is indexed from 0 to 6.
Another way to accomplish this would be to have a dummy item in the
@names
array, like this:
@names = qw/ dummy_item fred betty barney dino wilma pebbles bamm-bamm /;
Give yourself extra credit if you checked to make sure that the user’s choice of index was in fact in the range 1 to 7.
Here’s one way to do it, if you want the output all on one line:
chomp(@lines = <STDIN>); @sorted = sort @lines; print "@sorted ";
...or, to get the output on separate lines:
print sort <STDIN>;
Here’s one way to do it:
sub total { my $sum; # private variable foreach (@_) { $sum += $_; } $sum; }
This subroutine uses $sum
to keep a running total.
At the start of the subroutine, $sum
is
undef
, since it’s a new variable. Then, the
foreach
loop steps through the parameter list
(from @_
), using $_
as the
control variable. (Note: once again, there’s no automatic
connection between @_
, the parameter array, and
$_
, the default variable for the
foreach
loop.)
The first time through the foreach
loop, the first
number (in $_
) is added to
$sum
. Of course, $sum
is
undef
, since nothing has been stored in there. But
since we’re using it as a number, which Perl sees because of
the numeric operator +=
, Perl acts as if
it’s already initialized to 0
. Perl thus
adds the first parameter to 0
, and puts the total
back into $sum
.
Next time through the loop, the next parameter is added to
$sum
, which is no longer undef
.
The sum is placed back into $sum
, and on through
the rest of the parameters. Finally, the last line returns
$sum
to the caller.
There’s a potential bug in this subroutine, depending upon how
you think of things. Suppose that this subroutine was called with an
empty parameter list (as we considered with the rewritten subroutine
&max
in the chapter text). In that case,
$sum
would be undef
, and that
would be the return value. But in this subroutine, it would probably
be “more correct” to return 0
as the
sum of the empty list, rather than undef
. (Of
course, if you wished to distinguish the sum of an empty list from
the sum of, say, (3, -5, 2)
, returning
undef
would be the right thing to do.)
If you don’t want a possibly undefined return value, though,
it’s easy to remedy: simply initialize $sum
to zero rather than using the default of undef
:
my $sum = 0;
Now the subroutine will always return a number, even if the parameter list were empty.
Here’s one way to do it:
# Remember to include &total from previous exercise! print "The numbers from 1 to 1000 add up to ", &total(1..1000), ". ";
Note that we can’t call the subroutine from inside the
double-quoted string,[5] so the subroutine call is another separate item being
passed to print
. The total should be
500500
, a nice round number. And it
shouldn’t take any noticeable time at all to run this program;
passing a parameter list of 1000 values is an everyday task for Perl.
Here’s one way to do it:
my %last_name = qw{ fred flintstone barney rubble wilma flintstone }; print "Please enter a first name: "; chomp(my $name = <STDIN>); print "That's $name $last_name{$name}. ";
In this one, we used a qw//
list (with curly
braces as the delimiter) to initialize the hash. That’s fine
for this simple data set, and it’s easy to maintain because
each data item is a simple given name and simple family name, with
nothing tricky. But if your data might contain spaces—for
example, if robert
de niro
or
mary kay place
were to visit Bedrock—this
simple method wouldn’t work so well.
You might have chosen to assign each key/value pair separately, something like this:
my %last_name; $last_name{"fred"} = "flintstone"; $last_name{"barney"} = "rubble"; $last_name{"wilma"} = "flintstone";
Note that (if you chose to declare the hash with
my
, perhaps because use
strict
was in effect), you must declare the hash
before assigning any elements. You can’t use
my
on only part of a variable, like this:
my $last_name{"fred"} = "flintstone"; # Oops!
The my
operator works only with
entire variables, never with just one element of
an array or hash. Speaking of lexical variables, you may have noticed
that the lexical variable $name
is being declared
inside of the chomp
function call; it is fairly
common to declare each my
variable as it is
needed, like this.
This is another case where chomp
is vital. If
someone enters the five-character string "fred
"
and we fail to chomp
it, we’ll be looking
for "fred
"
as an element of the hash—and
it’s not there. Of course, chomp
alone
won’t make this bulletproof; if someone enters "fred
"
(with a trailing space), we don’t have a way with
what we’ve seen so far to tell that they meant
fred
.
If you added a check whether the given key exists
in the hash, so that you’ll give the user an explanatory
message when they misspell a name, give yourself extra points for
that.
Here’s one way to do it:
my(@words, %count, $word); # (optionally) declare our variables chomp(@words = <STDIN>); foreach $word (@words) { $count{$word} += 1; # or $count{$word} = $count{$word} + 1; } foreach $word (keys %count) { # or sort keys %count print "$word was seen $count{$word} times. "; }
In this one, we declared all of the variables at the top. People who
come to Perl from a background in languages like Pascal (where
variables are always declared “at the top”) may find that
way more familiar than declaring variables as they are needed. Of
course, we’re declaring these because we’re pretending
that use strict
may be in effect; by default, Perl
won’t require such declarations.
Next, we use the line-input operator,
<STDIN>
, in a list context to read all of
the input lines into @words
, and then we
chomp
those all at once. So
@words
is our list of words from the input (if the
words were all on separate lines, as they should have been, of
course).
Now, the first foreach loop goes through all of the words. That loop
contains the most important statement of the entire program, the
statement that says to add one to $count{$word}
,
and put the result back into $count{$word}
.
Although you could write it either the short way (with the
+=
operator) or the long way, the short way is
just a little bit more efficient, since Perl has to look up
$word
in the hash just once.[6]
For each word in the first foreach
loop, we add
one to $count{$word}
. So, if the first word is
fred
, we add one to
$count{"fred"}
. Of course, since this is the first
time we’ve seen $count{"fred"}
, it’s
undef
. But since we’re treating it as a
number (with the numeric +=
operator, or with
+
, if you wrote it the long way), Perl converts
undef
to 0
for us,
automatically. The total is 1
, which is then
stored back into $count{"fred"}
.
The next time through that foreach
loop,
let’s say the word is barney
. So, we add one
to $count{"barney"}
, bumping it up from
undef
to 1,
as well.
Now let’s say the next word is fred
again.
When we add one to $count{"fred"}
, which is
already 1
, we get 2
. This goes
back into $count{"fred"}
, meaning that we’ve
now seen fred
twice.
When we finish the first foreach
loop, then,
we’ve counted how many times each word has appeared. The hash
has a key for each (unique) word from the input, and the
corresponding value is the number of times that word appeared.
So now, the second foreach
loop goes through the
keys of the hash, which are the unique words from the input. In this
loop, we’ll see each different word once.
For each one, it says something like "fred was seen 3
times.
"
If you want the extra credit on this problem, you could put
sort
before keys
to print out
the keys in order. If there will be more than a dozen items in an
output list, it’s generally a good idea for them to be sorted,
so that a human being who is trying to debug the program will fairly
quickly be able to find the item he or she wants.
Here’s one way to do it:
print reverse <>;
Well, that’s pretty simple! But it works because
print
is looking for a list of strings to print,
which it gets by calling reverse
in a list
context. And reverse
is looking for a list of
strings to reverse, which it gets by using the diamond operator in
list context. So, the diamond returns a list of all of the lines from
all of the files of the user’s choice. That list of lines is
just what cat would print out. Now
reverse
reverses the list of lines, and
print
prints them out.
Here’s one way to do it:
print "Enter some lines, then press Ctrl-D: "; # or Ctrl-Z chomp(my @lines = <STDIN>); print "1234567890" x 7, "12345 "; # ruler line to column 75 foreach (@lines) { printf "%20s ", $_; }
Here, we start by reading in and chomping all of the lines of text.
Then we print the ruler line. Since that’s a debugging aid,
we’d generally comment-out that line when the program is done.
We could have typed "1234567890"
again and again,
or even used copy-and-paste to make a ruler line as long as we
needed, but we chose to do it this way because it’s kind of
cool.
Now, the foreach
loop iterates over the list of
lines, printing each one with the %20s
conversion.
If you chose to do so, you could have created a format to print the
list all at once, without the loop:
my $format = "%20s " x @lines; printf $format, @lines;
It’s a common mistake to get 19-character columns. That happens
when you say to yourself,[7] “Hey, why do we
chomp
the input if we’re only going to add
the newlines back on later?” So you leave out the
chomp
and use a format of
"%20s"
(without a newline).[8] And now,
mysteriously, the output is off by one space. So, what went wrong?
The problem happens when Perl tries to count the spaces needed to
make the right number of columns. If the user enters
hello
and a newline, Perl sees
six characters, not five, since newline is a
character. So it prints fourteen spaces and a six-character string,
sure that it gives the twenty characters you asked for in
"%20s"
. Oops.
Of course, Perl isn’t looking at the contents of the string to determine the width; it merely checks the raw number of characters. A newline (or another special character, such as a tab or a null character) will throw things off.[9]
Here’s one way to do it:
print "What column width would you like? "; chomp(my $width = <STDIN>); print "Enter some lines, then press Ctrl-D: "; # or Ctrl-Z chomp(my @lines = <STDIN>); print "1234567890" x (($width+9)/10), " "; # ruler line as needed foreach (@lines) { printf "%${width}s ", $_; }
This is much like the previous one, but we ask for a column width first. We ask for that first because we can’t ask for more input after the end-of-file indicator, at least on some systems. Of course, in the real world, you’ll generally have a better end-of-input indicator when getting input from the user, as we’ll see in later chapters.
Another change from the previous exercise’s answer is the ruler
line. We used some math to cook up a ruler line that’s at least
as long as we need, as suggested as an “extra credit”
part of the exercise. Proving that our math is correct is an
additional challenge. (Hint: Consider possible widths of
50
and 51
, and remember that
the right side operand to x
is truncated, not
rounded.)
To generate the format this time, we used the expression
"%${width}s
"
, which interpolates
$width
. The curly braces are required to
“insulate” the name from the following
s
; without the curly braces, we’d be
interpolating $widths
, the wrong variable. If you
forgot how to use curly braces to do this, though, you could have
written an expression like '%' . $width . "s
"
to
get the same format string.
The value of $width
brings up another case where
chomp
is vital. If the width isn’t
chomped, the resulting format string would resemble
"%30
s
"
. That’s not useful.
People who have seen printf
before may have
thought of another solution. Because printf
comes to us from C, which doesn’t have string interpolation, we
can use the same trick that C programmers use. If an asterisk
(”*
“) appears in place of a numeric
field width in a conversion, a value from the list of parameters will
be used:
printf "%*s ", $width, $_;
Here’s one way to do it:
/fred/
Of course, you have to put that into the test program! This is pretty
simple. The more important part of this exercise is trying it out on
the sample strings. It doesn’t match Fred
,
showing that regular expressions are case-sensitive. (We’ll see
how to change that later.) It does match frederick
and Alfred
, since both of those strings contain
the four-letter string fred
.. (Matching whole
words only, so that frederick
and
Alfred
wouldn’t match, is another feature
we’ll see later.)
If the test program is working correctly,[10] it should show those two
matches as something like |<fred>erick|
and
|Al<fred>|
, using the angle brackets to show
where fred
was found inside each string.
Here’s one way to do it:
/a+b*/
That matches the letter a
one or more times
(that’s the plus), followed by b
zero or
more times (that’s the star). Well, that’s what the
exercise asked for, but you may have come up with something
different. After all, if you’re looking for
any number of b
’s, you
know you’ll always find what you’re looking for. So you
could have written /a+/
instead, and matched the
same strings.[11]
For that matter, when you want to match one or more
a
’s, you know that the match will succeed
when you find even the first one. So, /a/
will
match the same set of strings as the first two. The description
“any string containing at least one a
followed by any number of b
’s” means
the exact same thing as “any string containing
a
.” Of the sample strings, this matches all
of them except fred
.
There are even more ways to make this pattern than we show here. Often, in trying to write a pattern, you will need to decide which one of many possible patterns best suits your needs.
Here’s one way to do it:
/\***/
That’s what the text asked for: a backslash (typed twice, since we mean a real backslash[12]) zero or more times (that’s the first star), followed by an asterisk (backslashed, since star is a metacharacter) zero or more times (that’s the last star). Whew!
And what about the sample strings? Did it match any of them? You bet: it matches all of them! It’s because the backslashes and asterisks aren’t required in the pattern; that is, this pattern can match the empty string. Here’s a rule you can rely upon: when a pattern may freely match the empty string, it’ll always match, since the empty string can be found in any string. In fact, it’ll always match in the first place that you look.
So, this pattern matches all four characters in
\**
, as you’d expect. It matches the empty
string at the beginning of fred
, which you may not
have expected. In the string barney \***
, it
matches the empty string at the beginning. You might wish it would
hunt down the backslashes and stars at the end of that string, but it
doesn’t bother. It looks at the beginning, sees zero
backslashes followed by zero asterisks, declares the match a success,
and goes home to watch television. And in *wilma
,
it matches just the star at the beginning; as you see, this pattern
never gets away from the beginning of the string, since it always
matches at the first opportunity.
Now, if someone asked you for a pattern to match any number of backslashes followed by any number of asterisks, you’d be technically correct to give them this one. But chances are, that’s not what they really wanted. Spoken languages like English may be ambiguous and not say exactly what they mean, but regular expressions always mean exactly what they say they mean.
In this case, maybe the person who asked for the pattern forgot to
say that he or she always wants to match at least one character, when
the pattern matches at all. We can do that. If there’s at least
one backslash, /\+**/
will match. (That’s
just like what we had before, but there’s a plus in place of
the first star, meaning one or more backslashes.) If there’s
not at least one backslash, then in order to match at least one
character, we’ll need at least one asterisk, so we want
/*+/
. When you put those two possibilities
together, you get:
/\+**|*+/
Ugly, isn’t it? Regular expressions are powerful but not beautiful. And they’ve contributed to Perl being maligned as a “write-only language.” To be sure that no one criticizes your code in that way, though, it’s kind to put an explanatory comment near any pattern that’s not obvious. On the other hand, when you’ve been using these for a year, you will have a different definition of “obvious” than you have today.
How does this new pattern work with the sample strings? With
\**
, it matches all four characters, just like
the last one. It won’t match fred
, which is
probably the right behavior given the problem description. For
barney \***
, it matches the six characters at
the end, as you hoped. And for *wilma
, it matches
the asterisk at the beginning.
Here’s one way to do it:
while (<>) { if (/wilma/) { print; } }
This is a grep-like program. For each line of text
(contained in $_
), we check to see whether the
pattern matches. If it matches, we print it. This program uses
print
’s default: if you don’t tell
it to print something else, it prints $_
. So we
have written a program that uses $_
all the way
through, but never mentions it anywhere. Perl folks love to use the
defaults and save time typing, so you’ll see a lot of programs
that do this.
And if, for extra credit, you wanted to match a capitalized
Wilma
as well, /wilma|Wilma/
would do the job. Or, more simply, you could have written
/(w|W)ilma/
. People who have used other regular
expression implementations and already know about character classes,
which we’ll discuss in the next chapter, could make that last
one even shorter (and more efficient).[13]
Here’s one way to do it:
while (<>) { if (/wilma/) { if (/fred/) { print; } } }
This tests /fred/
only after we find
/wilma/
matches, but fred
could
appear before or after wilma
in the line; each
test is independent of the other.
If you wanted to avoid the extra nested if
test,
you might have written something like this:[14]
while (<>) { if (/wilma.*fred|fred.*wilma/) { print; } }
This works because we’ll either have wilma
before fred
, or fred
before
wilma
. If we had written just
/wilma.*fred/
, that wouldn’t have matched a
line like fred and wilma flintstone
, even though
that line mentions both of them.
We made this an extra-credit exercise because many folks have a
mental block here. We showed you an “or” operation (with
the vertical bar, "|
“), but we never
showed you an “and” operation. That’s because there
isn’t one in regular expressions.[15] If you want to know whether one pattern and another are
both successful, just test both of them.
Here’s one way to do it:
/(fred|wilma)s+flintstone/
If you forgot to use the word-boundary anchors,
take off half a point; without those, this would mistakenly match
strings like
alfred
flintstones
. The exercise description said to
match words.
The point of this exercise may not be obvious, but in the real world, you’ll often have to do something similar. Someday, you’ll be unlucky enough to have a confusing program to maintain, and you’ll wonder what the author was trying to accomplish.[16]
/"([^"]*)"/
matches a simple string in double
quotes. By a “simple” string, we don’t mean one
like Perl’s double-quoted strings, which could contain a
backslashed double-quote mark or other backslash magic. This matches
just a double-quote mark, the contents of the string (which
can’t contain a double quote), and a closing quote mark. The
contents may be empty. The parentheses aren’t needed for
grouping, so they seem to be memory parentheses; as we’ll see
in the next chapter, this regular expression memory, which holds the
quoted substring, is probably being saved for some later use. Perhaps
this pattern would be used in reading a configuration file with
quoted strings, although in that case it should probably use anchors.
/^0?[0-3]?[0-7]{1,2}$/
matches if the string has
nothing but an octal number (perhaps with a leading zero) in the
range from 0
through 0377
. Note
that this one is anchored at both ends, so it doesn’t allow
anything else in the string before or after the number. (The previous
pattern wasn’t anchored; it could match anywhere in the
string.)
/^[w.]{1,12}$/
matches strings made up of
nothing but letters, digits, underscores, and dots, but never
starting or ending with a dot. Also, the strings are limited to a
maximum of 12 characters.
You may have noticed that the dot inside the character class is not special, so it doesn’t need to be backslashed. That makes the character class match ordinary letters, digits, and underscores, and also dots.
The way we can be sure that this one won’t allow a string to start or end with a dot is that it has both a word-boundary anchor and a start-of-string or end-of-string anchor at each end of the string. The word-boundary anchor can match only if there’s a “word” starting or ending there, and a dot can’t be part of a word.
So, this would match strings like perl.tar.gz
, but
not some_excessively_long_filename
or
perl.tar.
or .profile
or
..
.[17] This pattern could be useful for
validating user-chosen filenames.
Here’s one way to do it:
/^$[A-Za-z_]w*$/
The dollar sign at the start has to be backslashed to mean a real dollar sign. What follows must be a letter or underscore, then zero or more letters, digits, or underscores.
This pattern is surprisingly tricky to get right. Here’s how we construct it, step by step.
We start out by needing to match a word, so that’s
/w+/
. Of course, we want to remember that word
for later, so we add parentheses: /(w+)/
. And we
want to match it when it occurs two or more times, so that’s
/(w+)1+/
. (The plus sign at the end means
one or more times—but that’s in
addition to the one time that the word occurred originally.)
But we’re not done yet. Now we need to allow for the whitespace
which may come between the words. We don’t want to memorize the
whitespace (since it may vary), so we’ll put it outside the
parentheses: /(w+)s1+/
. Oh, but there could be
any number of whitespace characters, so long as there’s at
least one, so we’ll add a plus sign. So now we have
/(w+)s+1+/
.
But that’s not right; the final plus sign is modifying the
backreference alone. We need it to apply to both the backreference
(that is, our repeated word) and the whitespace in front of it:
/(w+)(s+1)+/
. So, now we can match a triple
word. First, the part in the first parenthesis pair matches the first
occurrence, then the part in the second parenthesis pair can twice
match some whitespace followed by that same word. When we try it out,
it matches all of our sentences with doubled words, so we happily put
it into our program and move on to the next project.
Then, the next week, we get a bug report! The pattern reports a match
on the sentence This is a test
, even though
there’s clearly no doubled word there. In moments, we’ve
fired up the pattern test program[18]
to see what part of the string is matching: |Th<is is>
a test|
. There it is, a doubled word is
,
hidden in an ordinary string.
Clearly, this is a job for a word boundary anchor; we can’t
have our word start in the middle of another word. So we fix the
program to use /(w+)(s+1)+/
, and sit back,
confident that we’ve got it right this time.
And then, just when you got started on another project,
another bug report comes in. This time,
we’ve matched the doubled word the
in the
phrase the
theory
. Yes, we need
a word boundary at the end of the pattern to
keep from matching a partial word there:
/(w+)(s+1)+/
. Now we’ve finally
gotten it right.
What you’ve just read is a true story. The regular expression has been changed, but the bug reports are real. It does happen, more often than we’d like to admit, that even after you’ve been writing these patterns for years, you can make a pattern which has a bug, you can test it with a number of test cases, you can put it into a long-running program, the Perl documentation, or even a best-selling Perl book, and not realize that the bug is there until much later.
The moral of the story is that regular expressions can be challenging. If you’re serious about learning about regular expressions, though (and all Perl programmers should be), we highly recommend the book Mastering Regular Expressions, by Jeffry Friedl (O’Reilly & Associates, Inc.).
Here’s one way to do it:
/($what){3}/
Once $what
has been interpolated, this gives a
pattern resembling /(fred|barney){3}/
. Without the
parentheses, the pattern would be something like
/fred|barney{3}/
, which is the same as
/fred|barneyyy/
. So, the parentheses are required.
Here’s one way to do it:
@ARGV = '/path/to/perlfunc.pod'; # or mentioned on the command line while (<>) { if (/^=items+([a-z_]w*)/i) { print "$1 "; # print out that identifier name } }
With what we’ve shown you so far, the only way to open an
arbitrary file for input is to use the diamond operator (or to use
input redirection, perhaps). So we put the path to
perlfunc.pod
into @ARGV
.
The heart of this program is the pattern, which looks for an
identifier name on an =item
line. The exercise
description was ambiguous, in that it didn’t say whether
=item
had to be in lower case; the author of this
pattern seems to have decided that it should be a case-insensitive
pattern. If you interpreted it otherwise, you could have used the
pattern /^=items+([a-zA-Z_]w*)/
.
Here’s one way to do it:
@ARGV = '/path/to/perlfunc.pod'; # or mentioned on the command line my %seen; # (optionally) declaring the hash while (<>) { if (/^=items+([a-z_]w*)/i) { $seen{$1} += 1; # a tally for each item } } foreach (sort keys %seen) { if ($seen{$_} > 2) { # more than twice print "$_ was seen $seen{$_} times. "; } }
This one starts out much like the previous one, but declares the hash
%seen
(in case use strict
might
be in effect). This is called %seen
because it
tells us which identifier names we’ve seen so far in the
program, and how many times. This is a common use of a hash. The
first loop now counts each identifier name, as an entry in
%seen
, instead of printing it out.
The second loop goes through the keys of %seen
,
which are the different identifier names we’ve seen. It sorts
the list, which (although not specified in the exercise description)
is a courtesy to the user, who might otherwise have to search for the
desired item in a long list.
Although it may not be obvious, this program is pretty close to a real-world problem that most of us are likely to see. Imagine that your webserver’s 400-megabyte logfile has some information you need. There’s no way you’re going to read that file on your own; you’ll want a program to match the information you need (with a pattern) and then print it out in some nice report format. Perl is good for putting together quick programs to do that sort of thing.
Here’s one way to do it:
my $secret = int(1 + rand 100); # This next line may be un-commented during debugging # print "Don't tell anyone, but the secret number is $secret. "; while (1) { print "Please enter a guess from 1 to 100: "; chomp(my $guess = <STDIN>); if ($guess =~ /quit|exit|^s*$/i) { print "Sorry you gave up. The number was $secret. "; last; } elsif ($guess < $secret) { print "Too small. Try again! "; } elsif ($guess == $secret) { print "That was it! "; last; } else { print "Too large. Try again! "; } }
The first line picks out our secret number from 1
to 100
. Here’s how it works. First,
rand
is Perl’s random number function, so
rand 100
gives us a random number in the range
from zero up to (but not including) 100
. That is,
the largest possible value of that expression is something like
99.999
.[19] Adding one gives a number
from 1
to 100.999
, then the
int
function truncates that, giving a result from
1
to 100
, as we needed.
The commented-out line can be helpful during development and
debugging, or if you like to cheat. The main body of this program is
the infinite while
loop. That will keep asking for
guesses until we execute last
.
It’s important that we test the possible strings before the
numbers. If we didn’t, do you see what would happen when the
user types quit
? That would be interpreted as a
number (probably giving a warning message, if warnings were turned
on), and since the value as a number would be zero, the poor user
would get the message that their guess was too small. We might never
get to the string tests, in that case.
Another way to make the infinite loop here would be to use a naked
block with redo
. It’s no more or less
efficient; merely another way to write it. Generally, if you expect
to mostly loop, it’s good to write while
,
since that loops by default. If looping will be the exception, a
naked block may be a better choice.
Here’s one way to do it:
sub get_line { # prompts for, reads, chomps, and returns a line of input print $_[0]; chomp(my $line = <STDIN>); $line; } my $source = &get_line("Which source file? "); open IN, $source or die "Can't open '$source' for input: $!"; my $dest = &get_line("What destination file? "); die "Won't overwrite existing file" if -e $dest; # optional safety test open OUT, ">$dest" or die "Can't open '$dest' for output: $!"; my $pattern = &get_line("What search pattern: "); my $replace = &get_line("What replacement string: "); while (<IN>) { s/$pattern/$replace/g; print OUT $_; }
This one needs to ask the user for several things, so we decided to make a subroutine to take care of some of the work. The subroutine prints out the prompt, which is the first (and only) parameter to the subroutine. Then it reads a line of input, chomps it, and returns it. That makes it easy to ask for each parameter, one after the other.
Once we know what the user wants for the source file, we try opening
it. An earlier version of this program asked for all of the
parameters first, but if the source file name is incorrect,
there’s no point in having the user type more parameters. This
way can save the user some time and trouble. Note that the die
message reports the file name inside quote marks; this can be helpful
in diagnosing a problem when a string has whitespace characters. If
you opened "<$source"
instead of just plain
$source
, that’s fine, too. (There’s no
reason to worry that the user of this program will do something
nefarious, since anything they can do with this program, they could
accomplish just as well without it. If this program were made to run
over the web, to give one example, we’d need to be
much more cautious about opening the
user’s choice of file.)
As we hope you discovered when you tried it, it’s easy to
overwrite an existing file simply by opening it for output. So we put
in a safety test using the -e
file test. The
corresponding die message doesn’t include $!
because we’re not reporting a failed request of the system. By
the way, this test for overwrite is fine here, but it would be
insufficient in an environment where many copies of the same program
(or different programs all working with the same files) might be
running at once. This typically happens with programs on the web: Two
processes check the same filename for existence at approximately the
same time, and both see that it doesn’t exist. So one of them
creates the file, and an instant later the other one overwrites it
with a file of its own. This kind of concurrency problem can’t
be solved with the -e
file test; some kind of file
locking (which is beyond the scope of this book) is needed.
With that safety test, the user won’t accidentally overwrite an existing file. Is that test a good idea? Well, if the user comes to see you next week and says, “Golly, I’m glad you put in that safety test. It kept me from accidentally overwriting my file!”, then you know that it was the right thing to do. But if the user says, “Dagnabit, your program is hard to use! I told it what filename I wanted to use for output, and it wouldn’t let me use it until I first deleted that file!”, then it was the wrong thing to do. Making decisions like this is often the toughest part of a programmer’s job. Perhaps we should make the program ask something like, “Are you sure you want to overwrite the existing file `barney'?” by default, but have a command-line option for the power user that says to overwrite without asking. Next version.
Once we’ve asked for everything and opened the files, the rest
of the program is pretty simple. The heart of the program is the loop
at the end, which reads lines, updates them, and prints them out.
Note that the substitution uses the /g
option—if you left that out, your program is broken, since the
exercise asked that the program replace every
occurrence of the search pattern, not just the first one on each
line.
Were you able to use regular expression metacharacters in the search
pattern? Sure; the substitution interpolates
$pattern
to make the search pattern. Were you able
to use memory variables and backslash escapes in the replacement
string? Nope; $replace
is interpolated to make the
replacement string, but it’s not
re-interpolated to interpret any magical
characters. If $replace
holds
$1
, that’s a dollar sign and a numeral one
in the replacement string. If Perl always kept re-interpolating, you
could never put a dollar sign or backslash into the replacement
string, since they’d always make something magical happen.
(Actually, though, if you need one additional level of interpolation,
it is possible. See the perlfaq
manpages for some
suggestions on how to do this.)
Here’s one way to do it:
foreach my $file (@ARGV) { my $attribs = &attributes($file); print "'$file' $attribs. "; } sub attributes { # report the attributes of a given file my $file = shift @_; return "does not exist" unless -e $file; my @attrib; push @attrib, "readable" if -r $file; push @attrib, "writable" if -w $file; push @attrib, "executable" if -x $file; return "exists" unless @attrib; 'is ' . join " and ", @attrib; # return value }
In this one, once again it’s convenient to use a subroutine.
The main loop prints one line of attributes for each file, perhaps
telling us that 'cereal-killer' is executable
or
that 'sasquatch' does not exist
.
The subroutine tells us the attributes of the given filename. Of course, if the file doesn’t even exist, there’s no need for the other tests, so we test for that first. If there’s no file, we’ll return early.
If the file does exist, we’ll build a list of attributes. (Give
yourself extra credit points if you used the special
_
filehandle instead of $file
on these tests, to keep from calling the system separately for each
new attribute.) It would be easy to add additional tests like the
three we show here. But what happens if none of the attributes is
true? Well, if we can’t say anything else, at least we can say
that the file exists, so we do. The unless
clause
uses the fact that @attrib
will be true (in a
Boolean context, which is a special case of a scalar context) if
it’s got any elements.
But if we’ve got some attributes, we’ll join them with
" and "
and put "is "
in front,
to make a description like is readable and
writable
. This isn’t perfect however; if there are
three attributes, it says that the file is readable and
writable and executable
, which has too many
and
s, but we can get away with it. If you wanted
to add more attributes to the ones this program checks for, you
should probably fix it to say something like is readable,
writable, executable, and nonempty
. If that matters to you.
Note that if you somehow didn’t put any filenames on the command line, this produces no output. This makes sense; if you ask for information on zero files, you should get zero lines of output. But let’s compare that to what the next program does in a similar case, in the discussion below.
Here’s one way to do it:
die "No file names supplied! " unless @ARGV; my $oldest_name = shift @ARGV; my $oldest_age = -M $oldest_name; foreach (@ARGV) { my $age = -M; ($oldest_name, $oldest_age) = ($_, $age) if $age > $oldest_age; } printf "The oldest file was %s, and it was %.1f days old. ", $oldest_name, $oldest_age;
This one starts right out by complaining if it didn’t get any filenames on the command line. That’s because it’s supposed to tell us the oldest filename—and there ain’t one if there aren’t any files to check.
Once again, we’re using the “high-water-mark”
algorithm. The first file is certainly the oldest one seen so far. We
have to keep track of its age as well, so that’s in
$oldest_age
.
For each of the remaining files, we’ll determine the age with
the -M
file test, just as we did for the first one
(except that here, we’ll use the default argument of
$_
for the file test). The last-modified time is
generally what people mean by the “age” of a file,
although you could make a case for using a different one. If the age
is more than $oldest_age
, we’ll use a list
assignment to update both the name and age. We didn’t have to
use a list assignment, but it’s a convenient way to update
several variables at once.
We stored the age from -M
into the temporary
variable $age
. What would have happened if we had
simply used -M
each time, rather than using a
variable? Well, first, unless we used the special
_
filehandle, we would have been asking the
operating system for the age of the file each time, a potentially
slow operation (not that you’d notice unless you have hundreds
or thousands of files, and maybe not even then). More importantly,
though, we should consider what would happen if someone updated a
file while we’re checking it. That is, first we see the age of
some file, and it’s the oldest one seen so far. But before we
can get back to use -M
a second time, someone
modifies the file and resets the timestamp to the current time. Now
the age that we save into $oldest_age
is actually
the youngest age possible. The result would be
that we’d get the oldest file among the files tested from that
point on, rather than the oldest overall; this would be a tough
problem to debug!
Finally, at the end of the program, we use
printf
to print out the name and age, with the
age rounded off to the nearest tenth of a day. Give yourself extra
credit if you went to the trouble to convert the age to a number of
days, hours, and minutes.
Here’s one way to do it, with a glob:
print "Which directory? (Default is your home directory) "; chomp(my $dir = <STDIN>); if ($dir =~ /^s*$/) { # A blank line chdir or die "Can't chdir to your home directory: $!"; } else { chdir $dir or die "Can't chdir to '$dir': $!"; } my @files = <*>; foreach (@files) { print "$_ "; }
First, we show a simple prompt, and read the desired directory,
chomping it as needed. (Without a chomp, we’d be trying to head
for a directory that ends in a newline—legal in Unix, and
therefore cannot be presumed to simply be extraneous by the
chdir
function.)
Then, if the directory name is nonempty, we’ll change to that directory, aborting on a failure. If empty, the home directory is selected instead.
Finally, a glob on “star” pulls up all the names in the (new) working directory, automatically sorted to alphabetical order, and they’re printed one at a time.
Here’s one way to do it:
print "Which directory? (Default is your home directory) "; chomp(my $dir = <STDIN>); if ($dir =~ /^s*$/) { # A blank line chdir or die "Can't chdir to your home directory: $!"; } else { chdir $dir or die "Can't chdir to '$dir': $!"; } my @files = <.* *>; ## now includes .* foreach (sort @files) { ## now sorts print "$_ "; }
Two differences from previous one: first, the glob now includes “dot star”, which matches all the names that do begin with a dot. And second, we now must sort the resulting list, because some of the names that begin with a dot must be interleaved appropriately either before or after the list of things without a beginning dot.
Here’s one way to do it:
print "Which directory? (Default is your home directory) "; chomp(my $dir = <STDIN>); if ($dir =~ /^s*$/) { # A blank line chdir or die "Can't chdir to your home directory: $!"; } else { chdir $dir or die "Can't chdir to '$dir': $!"; } opendir DOT, "." or die "Can't opendir dot: $!"; foreach (sort readdir DOT) { # next if /^./; ## if we were skipping dot files print "$_ "; }
Again, same structure as the previous two programs, but now
we’ve chosen to open a directory handle. Once we’ve
changed the working directory, we want to open the current directory,
and we’ve shown that as the DOT
directory
handle.
Why DOT
? Well, if the user asks for an absolute
directory name, like /etc
, there’s no
problem opening it. But if the name is relative, like
fred
, let’s see what would happen. First, we
chdir
to fred
, and then we want
to use opendir
to open it. But that would open
fred
in the new directory, not
fred
in the original directory. The only name we
can be sure will mean “the current directory” is
".
“, which always has that meaning (on
Unix and similar systems, at least).
The readdir
function pulls up all the names of
the directory, which are then sorted, and displayed. If we had done
the first exercise this way, we would have skipped over the dot
files, and that’s handled by uncommenting the commented-out
line in the foreach
loop.
You may find yourself asking, “Why did we
chdir
first? You can use
readdir
and friends on any directory, not merely
on the current directory.” Primarily, we wanted to give the
user the convenience of being able to get to their home directory
with a single keystroke. But this could be the start of a general
file-management utility program; maybe the next step would be to ask
the user which of the files in this directory should be moved to
offline tape storage, say.
Here’s one way to do it:
unlink @ARGV;
...or, if you want to warn the user of any problems:
foreach (@ARGV) { unlink $_ or warn "Can't unlink '$_': $!, continuing... "; }
Here, each item from the command-invocation line is placed
individually into $_
, which is then used as the
argument to unlink
. If something goes wrong, the
warning gives the clue about why.
Here’s one way to do it:
use File::Basename; use File::Spec; my($source, $dest) = @ARGV; if (-d $dest) { my $basename = basename $source; $dest = File::Spec->catfile($dest, $basename); } rename $source, $dest or die "Can't rename '$source' to '$dest': $! ";
The workhorse in this program is the last statement, but the
remainder of the program is necessary when we are renaming into a
directory. First, after declaring the modules we’re using, we
name the command-line arguments sensibly. If $dest
is a directory, we need to extract the basename from the
$source
name and append it to the directory
($dest
). Finally, once $dest
is
patched up if needed, the rename
does the deed.
Here’s one way to do it:
use File::Basename; use File::Spec; my($source, $dest) = @ARGV; if (-d $dest) { my $basename = basename $source; $dest = File::Spec->catfile($dest, $basename); } link $source, $dest or die "Can't link '$source' to '$dest': $! ";
As the hint in the exercise description said, this program is much
like the previous one. The difference is that we’ll
link
rather than rename
. If
your system doesn’t support hard links, you might have written
this as the last statement:
print "Would link '$source' to '$dest'. ";
Here’s one way to do it:
use File::Basename; use File::Spec; my $symlink = $ARGV[0] eq '-s'; shift @ARGV if $symlink; my($source, $dest) = @ARGV; if (-d $dest) { my $basename = basename $source; $dest = File::Spec->catfile($dest, $basename); } if ($symlink) { symlink $source, $dest or die "Can't make soft link from '$source' to '$dest': $! "; } else { link $source, $dest or die "Can't make hard link from '$source' to '$dest': $! "; }
The first few lines of code (after the two use
declarations) look at the first command-line argument, and if
it’s "-s
“, we’re making a
symbolic link, so we note that as a true value for
$symlink
. If we saw that
"-s
“, we then need to get rid of it
(in the next line). The next few lines are cut-and-pasted from the
previous exercise answers. Finally, based on the truth of
$symlink
, we’ll choose either to create a
symbolic link or a hard link. We also updated the dying words to make
it clear which kind of link we were attempting.
Here’s one way to do it:
foreach (<.* *>) { my $dest = readlink $_; print "$_ -> $dest " if defined $dest; }
Each item resulting from the glob ends up in $_
one by one. If the item is a symbolic link, then
readlink
returns a defined value, and the
location is displayed. If not, then the condition fails, and we skip
over it.
Here’s one way to do it:
chdir "/" or die "Can't chdir to root directory: $!"; exec "ls", "-l" or die "Can't exec ls: $!";
The first line changes the current working directory to the root
directory, as our particular hard-coded directory. The second line
uses the multiple-argument exec
function to send
the result to standard output. We could have used the single-argument
form just as well, but it doesn’t hurt to do it this way.
Here’s one way to do it:
open STDOUT, ">ls.out" or die "Can't write to ls.out: $!"; open STDERR, ">ls.err" or die "Can't write to ls.err: $!"; chdir "/" or die "Can't chdir to root directory: $!"; exec "ls", "-l" or die "Can't exec ls: $!";
The first and second lines reopen STDOUT
and
STDERR
to a file in the current directory (before
we change directories). Then, after the directory change, the
directory listing command executes, sending the data back to the
files opened in the original directory.
Where would the message from the last die
go?
Why, it would go into ls.err, of course, since
that’s where STDERR
is going at that point.
The die
from chdir
would go
there, too. But where would the message go if we can’t re-open
STDERR
on the second line? It goes to the old
STDERR
. For the three standard filehandles,
STDIN
, STDOUT
, and
STDERR
, if re-opening them fails, the old
filehandle is still open.
Here’s one way to do it:
if (`date` =~ /^S/) { print "go play! "; } else { print "get to work! "; }
Well, since both Saturday and Sunday start with an S, and the day of
the week is the first part of the output of the
date command, this is pretty simple. Just check
the output of the date command to see if it starts
with S
. There are many harder ways to do this
program, and we’ve seen most of them in our classes.
If we had to use this in a real-world program, though, we’d
probably use the pattern /^(Sat|Sun)/
. It’s
a tiny bit less efficient, but that hardly matters; besides,
it’s so much easier for the maintenance programmer to
understand.
Here’s one way to do it:
my @numbers; push @numbers, split while <>; foreach (sort { $a <=> $b } @numbers) { printf "%20g ", $_; }
That second line of code is too confusing, isn’t it? Well, we did that on purpose. Although we recommend that you write clear code, some people like writing code that’s as hard to understand as possible,[20] so we want you to be prepared for the worst. Someday, you’ll need to maintain confusing code like this.
Since that line uses the while
modifier,
it’s the same as if it were written in a loop like this:
while (<>) { push @numbers, split; }
That’s better, but maybe it’s still a little unclear.
(Nevertheless, we don’t have a quibble about writing it this
way. This one is on the correct side of the “too hard to
understand at a glance” line.) The while
loop is reading the input a line at a time (from the user’s
choice of input sources, as shown by the diamond operator), and
split
is, by default, splitting that on whitespace
to make a list of words—or, in this case, a list of numbers.
The input is just a stream of numbers separated by whitespace, after
all. Either way that you write it, then, that
while
loop will put all of the numbers from the
input into @numbers
.
The foreach
loop takes the sorted list and prints
each one on its own line, using the %20g
numeric
format to put them in a right-justified column. You could have used
%20s
instead. What difference would that make?
Well, that’s a string format, so it would have left the strings
untouched in the output. Did you notice that our sample data included
both 1.50
and 1.5
, and both
04
and 4
? If you printed those
as strings, the extra zero characters will still be in the output;
but %20g
is a numeric format, so equal numbers
will appear identically in the output. Either format could
potentially be correct, depending upon what you’re trying to
do.
Here’s one way to do it:
# don't forget to incorporate the hash %last_name, # either from the exercise text or the downloaded file my @keys = sort { "L$last_name{$a}" cmp "L$last_name{$b}" # by last name or "L$a" cmp "L$b" # by first name } keys %last_name; foreach (@keys) { print "$last_name{$_}, $_ "; # Rubble,Bamm-Bamm }
There’s not much to say about this one; we put the keys in order as needed, then print them out. We chose to print them in last-name-comma-first-name order just for fun; the exercise description left that up to you.
Here’s one way to do it:
print "Please enter a string: "; chomp(my $string = <STDIN>); print "Please enter a substring: "; chomp(my $sub = <STDIN>); my @places; for (my $pos = -1; ; ) { # tricky use of three-part for loop $pos = index($string, $sub, $pos + 1); # find next position last if $pos == -1; push @places, $pos; } print "Locations of '$sub' in '$string' were: @places ";
This one starts out simply enough, asking the user for the strings
and declaring an array to hold the list of substring positions. But
once again, as we see in the for
loop, the code
seems to have been “optimized for cleverness”, which
should be done only for fun, never in production code. But this
actually shows a valid technique which could be useful in some cases,
so let’s see how it works.
The my
variable $pos
is
declared private to the scope of the for
loop, and
it starts with a value of -1
. So as not to keep
you in suspense about this variable, we’ll tell you right now
that it’s going to hold a position of the substring in the
larger string. The test and increment sections of the
for
loop are empty, so this is an infinite loop.
(Of course, we’ll eventually break out of it, in this case with
last
).
The first statement of the loop body looks for the first occurrence
of the substring at or after position $pos + 1
.
That means that on the first iteration, when $pos
is still -1
, the search will start at position
0
, the start of the string. The location of the
substring is stored back into $pos
. Now, if that
was -1
, we’re done with the
for
loop, so last
breaks out of
the loop in that case. If it wasn’t -1
, then
we save the position into @places
and go around
the loop again. This time, $pos + 1
means that
we’ll start looking for the substring just after the previous
place where we found it. And so we get the answers we wanted and the
world is once again a happy place.
If you didn’t want that tricky use of the
for
loop, you could accomplish the same result as
shown here:
{ my $pos = -1; while (1) { ... # Same loop body as the for loop used above } }
The naked block on the outside restricts the scope of
$pos
. You don’t have to do that, but
it’s often a good idea to declare each variable in the smallest
possible scope. This means we have fewer variables
“alive” at any given point in the program, making it less
likely that we’ll accidentally reuse the name
$pos
for some new purpose. For the same reason, if
you don’t declare a variable in a small scope, you should
generally give it a longer name that’s thereby less likely to
be reused by accident. Maybe something like
$substring_position
would be appropriate in this
case.
On the other hand, if you were trying to obfuscate your code (shame on you!), you could create a monster like this (shame on us!):
for (my $pos = -1; -1 != ($pos = index +$string, +$sub, +$pos +1 ); push @places, ((((+$pos))))) { 'for ($pos != 1; # ;$pos++) { print "position $pos ";#;';#' } pop @places; }
That even trickier code works in place of the original tricky
for
loop. By now, you should know enough to be
able to decipher that one on your own, or to obfuscate code in order
to amaze your friends and confound your enemies. Be sure to use these
powers only for good, never for evil.
Oh, and what did you get when you searched for t
in This is a test.
? It’s at positions
10
and 13
. It’s not at
position 0
; since the capitalization doesn’t
match, the substring doesn’t match.
Here’s one way to do it:
open PF, '/path/to/perlfunc.pod' or die "Can't open perlfunc.pod: $!"; dbmopen my %DB, "pf_data", 0644 or die "Can't create dbm file: $!"; %DB = ( ); # wipe existing data, if any while (<PF>) { if (/^=items+([a-z_]w*)/i) { $DB{$1} = $DB{$1} || $.; } } print "Done! ";
This one is similar to the previous ones with
perlfunc.pod. Here, though, we open a DBM file
called pf_data as the DBM hash
%DB
. In case that file had any leftover data, we
set the hash to an empty list. That’s normally a rare thing to
do, but we want to wipe out the entire database, in case a previous
run of this program left incorrect or out-of-date data in the file.
(After all, there’s a new perlfunc.pod
with each new release of Perl.)
When we find an identifier, we need to store its line number (from
$.
) into the database. The statement that does
that uses the high-precedence short-circuit ||
operator. If the database entry already has a value, that value is
true, so the old value is used. If the database entry is empty,
that’s false, so the value on the right ($.
)
is used instead. We could have written that line in a shorter way,
like this:
$DB{$1} ||= $.;
When the program is done, it says so. That’s not required by the exercise description, but it lets us know that the program did something; without that line, there would be no output at all. But how can we tell that it worked correctly? That’s the next exercise.
Here’s one way to do it:
dbmopen my %DB, "pf_data", undef or die "Can't open dbm file: $!"; my $line = $DB{$ARGV[0]} || "not found"; print "$ARGV[0]: $line ";
Once we have the database, it’s simple to look something up in
it. Note that in this program, the third argument to
dbmopen
is undef
, since that
file must already exist for this program to work.
If the entry for $ARGV[0]
(the first command-line
parameter) isn’t found in the database, we’ll say
it’s not found
, using the high-precedence
short-circuit ||
.
Here’s one way to do it:
dbmopen my %DB, "pf_data", undef or die "Can't open dbm file: $!"; if (my $line = $DB{$ARGV[0]}) { exec 'less', "+$line", '/path/to/perlfunc.pod' or die "Can't exec pager: $!"; } else { die "Entry unknown: '$ARGV[0]'. "; }
This starts out like the previous one, but uses
exec
to start up a pager program if it can, and
dies if it can’t.
Here’s one way to do it:
my $filename = 'path/to/sample_text'; open FILE, $filename or die "Can't open '$filename': $!"; chomp(my @strings = <FILE>); while (1) { print "Please enter a pattern: "; chomp(my $pattern = <STDIN>); last if $pattern =~ /^s*$/; my @matches = eval { grep /$pattern/, @strings; }; if ($@) { print "Error: $@"; } else { my $count = @matches; print "There were $count matching strings: ", map "$_ ", @matches; } print " "; }
This one uses an eval
block to trap any failure
that might occur when using the regular expression. Inside that
block, a grep
pulls the matching strings from the
list of strings.
Once the eval
is finished, we can report either
the error message or the matching strings. Note that we
“unchomped” the strings for output by using
map
to add a newline to each string
.
[1] If you’d prefer a
more formal sort of constants, the constant
pragma
may be what you’re looking for.
[2] It nearly did change by a legislative act in the state of Indiana. http://www.urbanlegends.com/legal/pi_indiana.html
[3] We asked O’Reilly to spend the extra money to print the input cursor with blinking ink, but they wouldn’t do it for us.
[4] Chomping is like chewing—not always needed, but most of the time it doesn’t hurt.
[5] We can’t do this without advanced trickiness, that is. It’s rare to find anything that you absolutely can’t do in Perl.
[6] Also,
at least in some versions of Perl, the shorter way will avoid a
warning about using an undefined value that may crop up with the
longer one. The warning may also be avoided by using the ++
operator to increment the variable, although we
haven’t shown you that operator yet.
[7] Or to Larry, if he’s standing nearby.
[8] Unless Larry told you not to do that.
[9] As Larry should have explained to you by now.
[10] If the test program didn’t work correctly, you probably didn’t download it as we suggested. And you probably didn’t test what you typed, as we also suggested. But in that case, you probably didn’t do the exercises either; you’re just reading these answers in the back of the book, and so the test program (which you didn’t actually run) performed flawlessly. In that case, this footnote is pointless.
[11] To be sure, you’ll match
different parts of the strings. But any string that matches
/a+b*/
will also match /a+/
,
and vice versa.
[12] Whenever you mean a real backslash in Perl, type two of them. A lone backslash may try to do something magical, but two of them will always mean a real backslash.
[13] If you made
the whole pattern case-insensitive, shame on you. We haven’t
learned that yet. Besides, that would match WILMA
,
which shouldn’t match, according to the exercise
description.
[14] Folks who
know about the logical-and operator, which we’ll see in Chapter 10, could do both tests /fred/
and /wilma/
in the same if
conditional. That’s more efficient, more scalable,
and an all-around better way than the ones given here. But we
haven’t seen logical-and yet.
[15] But there are some tricky and advanced ways of doing what some folks would call an “and” operation. These are generally less efficient than using Perl’s logical-and, though, depending upon what optimizations Perl and its regular expression engine can make.
[16] If you’re especially unlucky, this happens when you look at your own code ten minutes after writing it.
[17] You may know that file and
directory names beginning with a dot are not displayed by default on
Unix systems, and that the special directory name ..
always means the directory one level higher in the
hierarchy.
[18] We told you that it would come in handy, and we weren’t kidding.
[19] The actual largest possible value depends upon your system; see http://www.cpan.org/doc/FMTEYEWTK/random if you really need to know.
[20] Well, we don’t recommend it for normal coding purposes, but it can be a fun game to write confusing code, and it can be educational to take someone else’s obfuscated code examples and spend a weekend or two figuring out just what they do. If you want to see some fun snippets of such code and maybe get a little help with decoding them, ask around at the next Perl Mongers’ meeting. Or search for JAPHs on the Web, or see how well you can decipher the example obfuscated code block near the end of this chapter’s answers.