Appendix A. Exercise Answers

This appendix contains the answers to the excerses that appear throughout the book.

Answers to Chapter 2 Exercises

  1. Here’s one way to do it:

    #!/usr/bin/perl -w
    $pi = 3.141592654;
    $circ = 2 * $pi * 12.5;
    print "The circumference of a circle of radius 12.5 is $circ.
    ";

    As you see, we started this program with a typical #! line; your path to Perl may vary. We also turned on warnings.

    The first real line of code sets the value of $pi to our value of π. There are several reasons a good programmer will prefer to use a constant[1] value like this: it takes time to type 3.141592654 into your program if you ever need it more than once. It may be a mathematical bug if you accidentally used 3.141592654 in one place and 3.14159 in another. There’s only one line to check on to make sure you didn’t accidentally type 3.141952654 and send your space probe to the wrong planet. It’s easier to type $pi than π, especially if you don’t have Unicode. And it will be easy to maintain the program in case the value of π ever changes.[2]

    Next we calculate the circumference, storing it into $circ, and we print it out in a nice message. The message ends with a newline character, because every line of a good program’s output should end with a newline. Without it, you might end up with output looking something like this, depending upon your shell’s prompt:

    The circumference of a circle of radius 12.5 is
    78.53981635.bash-2.01$[]

    The box represents the input cursor, blinking at the end of the line, and that’s the shell’s prompt at the end of the message.[3] Since the circumference isn’t really 78.53981635.bash-2.01$, this should probably be construed as a bug. So use at the end of each line of output.

  2. Here’s one way to do it:

    #!/usr/bin/perl -w
    $pi = 3.141592654;
    print "What is the radius? ";
    chomp($radius = <STDIN>);
    $circ = 2 * $pi * $radius;
    print "The circumference of a circle of radius $radius is $circ.
    ";

    This is just like the last one, except that now we ask the user for the radius, and then we use $radius in every place where we previously used the hard-coded value 12.5. If we had written the first program with more foresight, in fact, we would have had a variable named $radius in that one as well. Note that we chomped the line of input. If we hadn’t, the mathematical formula would still have worked, because a string like "12.5 " is converted to the number 12.5 without any problem. But when we print out the message, it would look like this:

    The circumference of a circle of radius 12.5
     is 78.53981635.

    Notice that the newline character is still in $radius, even though we’ve used that variable as a number. Since we had a space between $radius and the word "is" in the print statement, there’s a space at the beginning of the second line of output. The moral of the story is: chomp your input unless you have a reason not to do that.

  3. Here’s one way to do it:

    #!/usr/bin/perl -w
    $pi = 3.141592654;
    print "What is the radius? ";
    chomp($radius = <STDIN>);
    $circ = 2 * $pi * $radius;
    if ($radius < 0) {
      $circ = 0;
    }
    print "The circumference of a circle of radius $radius is $circ.
    ";

    Here we added the check for a bogus radius. Even if the given radius was impossible, the returned circumference will at least be nonnegative. You could have changed the given radius to be zero, and then calculated the circumference, too; there’s more than one way to do it. In fact, that’s the Perl motto: There Is More Than One Way To Do It. And that’s why each exercise answer starts with “Here’s one way to do it.”

  4. Here’s one way to do it:

    print "Enter first number: ";
    chomp($one = <STDIN>);
    print "Enter second number: ";
    chomp($two = <STDIN>);
    $result = $one * $two;
    print "The result is $result.
    ";

    Notice that we’ve left off the #! line for this answer. In fact, from here on, we’ll assume that you know it’s there, so you don’t need to read it each time.

    Perhaps those are poor choices for variable names. In a large program, a maintenance programmer might think that $two should have the value of 2. In this short program, it probably doesn’t matter, but in a large one we could have called them something more descriptive, with names like $first_response.

    In this program, it wouldn’t make any difference if we forgot to chomp the two variables $one and $two, since we never use them as strings once they’ve been set. But if next week our maintenance programmer edits the program to print a message like: The result of multiplying $one by $two is $result. , those pesky newlines will come back to haunt us. Once again, chomp unless you have a reason not to chomp [4]—like in the next exercise.

  5. Here’s one way to do it:

    print "Enter a string: ";
    $str = <STDIN>;
    print "Enter a number of times: ";
    chomp($num = <STDIN>);
    $result = $str x $num;
    print "The result is:
    $result";

    This program is almost the same as the last one, in a sense. We’re “multiplying” a string by a number of times. So we’ve kept the structure of the previous exercise. In this case, though, we didn’t want to chomp the first input item—the string—because the exercise asked for the strings to appear on separate lines. So, if the user entered fred and a newline for the string, and 3 for the number, we’d get a newline after each fred just as we wanted.

    In the print statement at the end, we put the newline before $result because we wanted to have the first fred, printed on a line of its own. That is, we didn’t want output like this, with only two of the three freds aligned in a column:

    The result is: fred
    fred
    fred

    At the same time, we didn’t need to put another newline at the end of the print output because $result should already end with a newline.

    In most cases, Perl won’t mind where you put spaces in your program; you can put in spaces or leave them out. But it’s important not to accidentally spell the wrong thing! If the x runs up against the preceding variable name $str, Perl will see $strx, which won’t work.

Answers to Chapter 3 Exercises

  1. Here’s one way to do it:

    print "Enter some lines, then press Ctrl-D:
    "; # or maybe Ctrl-Z
    @lines = <STDIN>;
    @reverse_lines = reverse @lines;
    print @reverse_lines;

    ...or, even more simply:

    print "Enter some lines, then press Ctrl-D:
    ";
    print reverse <STDIN>;

    Most Perl programmers would prefer the second one, as long as you don’t need to keep the list of lines around for later use.

  2. Here’s one way to do it:

    @names = qw/ fred betty barney dino wilma pebbles bamm-bamm /;
    print "Enter some numbers from 1 to 7, one per line, then press Ctrl-D:
    ";
    chomp(@numbers = <STDIN>);
    foreach (@numbers) {
      print "$names[ $_ - 1 ]
    ";
    }

    We have to subtract one from the index number so that the user can count from 1 to 7 even though the array is indexed from 0 to 6. Another way to accomplish this would be to have a dummy item in the @names array, like this:

    @names = qw/ dummy_item fred betty barney dino wilma pebbles bamm-bamm /;

    Give yourself extra credit if you checked to make sure that the user’s choice of index was in fact in the range 1 to 7.

  3. Here’s one way to do it, if you want the output all on one line:

    chomp(@lines = <STDIN>);
    @sorted = sort @lines;
    print "@sorted
    ";

    ...or, to get the output on separate lines:

    print sort <STDIN>;

Answers to Chapter 4 Exercises

  1. Here’s one way to do it:

    sub total {
      my $sum;  # private variable
      foreach (@_) {
        $sum += $_;
      }
      $sum;
    }

    This subroutine uses $sum to keep a running total. At the start of the subroutine, $sum is undef, since it’s a new variable. Then, the foreach loop steps through the parameter list (from @_), using $_ as the control variable. (Note: once again, there’s no automatic connection between @_, the parameter array, and $_, the default variable for the foreach loop.)

    The first time through the foreach loop, the first number (in $_) is added to $sum. Of course, $sum is undef, since nothing has been stored in there. But since we’re using it as a number, which Perl sees because of the numeric operator +=, Perl acts as if it’s already initialized to 0. Perl thus adds the first parameter to 0, and puts the total back into $sum.

    Next time through the loop, the next parameter is added to $sum, which is no longer undef. The sum is placed back into $sum, and on through the rest of the parameters. Finally, the last line returns $sum to the caller.

    There’s a potential bug in this subroutine, depending upon how you think of things. Suppose that this subroutine was called with an empty parameter list (as we considered with the rewritten subroutine &max in the chapter text). In that case, $sum would be undef, and that would be the return value. But in this subroutine, it would probably be “more correct” to return 0 as the sum of the empty list, rather than undef. (Of course, if you wished to distinguish the sum of an empty list from the sum of, say, (3, -5, 2), returning undef would be the right thing to do.)

    If you don’t want a possibly undefined return value, though, it’s easy to remedy: simply initialize $sum to zero rather than using the default of undef:

    my $sum = 0;

    Now the subroutine will always return a number, even if the parameter list were empty.

  2. Here’s one way to do it:

    # Remember to include &total from previous exercise!
    print "The numbers from 1 to 1000 add up to ", &total(1..1000), ".
    ";

    Note that we can’t call the subroutine from inside the double-quoted string,[5] so the subroutine call is another separate item being passed to print. The total should be 500500, a nice round number. And it shouldn’t take any noticeable time at all to run this program; passing a parameter list of 1000 values is an everyday task for Perl.

Answers to Chapter 5 Exercises

  1. Here’s one way to do it:

    my %last_name = qw{
      fred flintstone
      barney rubble
      wilma flintstone
    };
    print "Please enter a first name: ";
    chomp(my $name = <STDIN>);
    print "That's $name $last_name{$name}.
    ";

    In this one, we used a qw// list (with curly braces as the delimiter) to initialize the hash. That’s fine for this simple data set, and it’s easy to maintain because each data item is a simple given name and simple family name, with nothing tricky. But if your data might contain spaces—for example, if robert de niro or mary kay place were to visit Bedrock—this simple method wouldn’t work so well.

    You might have chosen to assign each key/value pair separately, something like this:

    my %last_name;
    $last_name{"fred"} = "flintstone";
    $last_name{"barney"} = "rubble";
    $last_name{"wilma"} = "flintstone";

    Note that (if you chose to declare the hash with my, perhaps because use strict was in effect), you must declare the hash before assigning any elements. You can’t use my on only part of a variable, like this:

    my $last_name{"fred"} = "flintstone";  # Oops!

    The my operator works only with entire variables, never with just one element of an array or hash. Speaking of lexical variables, you may have noticed that the lexical variable $name is being declared inside of the chomp function call; it is fairly common to declare each my variable as it is needed, like this.

    This is another case where chomp is vital. If someone enters the five-character string "fred " and we fail to chomp it, we’ll be looking for "fred " as an element of the hash—and it’s not there. Of course, chomp alone won’t make this bulletproof; if someone enters "fred " (with a trailing space), we don’t have a way with what we’ve seen so far to tell that they meant fred.

    If you added a check whether the given key exists in the hash, so that you’ll give the user an explanatory message when they misspell a name, give yourself extra points for that.

  2. Here’s one way to do it:

    my(@words, %count, $word);     # (optionally) declare our variables
    chomp(@words = <STDIN>);
    
    foreach $word (@words) {
      $count{$word} += 1;          # or $count{$word} = $count{$word} + 1;
    }
    
    foreach $word (keys %count) {  # or sort keys %count
      print "$word was seen $count{$word} times.
    ";
    }

    In this one, we declared all of the variables at the top. People who come to Perl from a background in languages like Pascal (where variables are always declared “at the top”) may find that way more familiar than declaring variables as they are needed. Of course, we’re declaring these because we’re pretending that use strict may be in effect; by default, Perl won’t require such declarations.

    Next, we use the line-input operator, <STDIN>, in a list context to read all of the input lines into @words, and then we chomp those all at once. So @words is our list of words from the input (if the words were all on separate lines, as they should have been, of course).

    Now, the first foreach loop goes through all of the words. That loop contains the most important statement of the entire program, the statement that says to add one to $count{$word}, and put the result back into $count{$word}. Although you could write it either the short way (with the += operator) or the long way, the short way is just a little bit more efficient, since Perl has to look up $word in the hash just once.[6]

    For each word in the first foreach loop, we add one to $count{$word}. So, if the first word is fred, we add one to $count{"fred"}. Of course, since this is the first time we’ve seen $count{"fred"}, it’s undef. But since we’re treating it as a number (with the numeric += operator, or with +, if you wrote it the long way), Perl converts undef to 0 for us, automatically. The total is 1, which is then stored back into $count{"fred"}.

    The next time through that foreach loop, let’s say the word is barney. So, we add one to $count{"barney"}, bumping it up from undef to 1, as well.

    Now let’s say the next word is fred again. When we add one to $count{"fred"}, which is already 1, we get 2. This goes back into $count{"fred"}, meaning that we’ve now seen fred twice.

    When we finish the first foreach loop, then, we’ve counted how many times each word has appeared. The hash has a key for each (unique) word from the input, and the corresponding value is the number of times that word appeared.

    So now, the second foreach loop goes through the keys of the hash, which are the unique words from the input. In this loop, we’ll see each different word once. For each one, it says something like "fred was seen 3 times."

    If you want the extra credit on this problem, you could put sort before keys to print out the keys in order. If there will be more than a dozen items in an output list, it’s generally a good idea for them to be sorted, so that a human being who is trying to debug the program will fairly quickly be able to find the item he or she wants.

Answers to Chapter 6 Exercises

  1. Here’s one way to do it:

    print reverse <>;

    Well, that’s pretty simple! But it works because print is looking for a list of strings to print, which it gets by calling reverse in a list context. And reverse is looking for a list of strings to reverse, which it gets by using the diamond operator in list context. So, the diamond returns a list of all of the lines from all of the files of the user’s choice. That list of lines is just what cat would print out. Now reverse reverses the list of lines, and print prints them out.

  2. Here’s one way to do it:

    print "Enter some lines, then press Ctrl-D:
    ";  # or Ctrl-Z
    chomp(my @lines = <STDIN>);
    
    print "1234567890" x 7, "12345
    ";  # ruler line to column 75
    
    foreach (@lines) {
      printf "%20s
    ", $_;
    }

    Here, we start by reading in and chomping all of the lines of text. Then we print the ruler line. Since that’s a debugging aid, we’d generally comment-out that line when the program is done. We could have typed "1234567890" again and again, or even used copy-and-paste to make a ruler line as long as we needed, but we chose to do it this way because it’s kind of cool.

    Now, the foreach loop iterates over the list of lines, printing each one with the %20s conversion. If you chose to do so, you could have created a format to print the list all at once, without the loop:

    my $format = "%20s
    " x @lines;
    printf $format, @lines;

    It’s a common mistake to get 19-character columns. That happens when you say to yourself,[7] “Hey, why do we chomp the input if we’re only going to add the newlines back on later?” So you leave out the chomp and use a format of "%20s" (without a newline).[8] And now, mysteriously, the output is off by one space. So, what went wrong?

    The problem happens when Perl tries to count the spaces needed to make the right number of columns. If the user enters hello and a newline, Perl sees six characters, not five, since newline is a character. So it prints fourteen spaces and a six-character string, sure that it gives the twenty characters you asked for in "%20s". Oops.

    Of course, Perl isn’t looking at the contents of the string to determine the width; it merely checks the raw number of characters. A newline (or another special character, such as a tab or a null character) will throw things off.[9]

  3. Here’s one way to do it:

    print "What column width would you like? ";
    chomp(my $width = <STDIN>);
    
    print "Enter some lines, then press Ctrl-D:
    ";  # or Ctrl-Z
    chomp(my @lines = <STDIN>);
    
    print "1234567890" x (($width+9)/10), "
    ";      # ruler line as needed
    
    foreach (@lines) {
      printf "%${width}s
    ", $_;
    }

    This is much like the previous one, but we ask for a column width first. We ask for that first because we can’t ask for more input after the end-of-file indicator, at least on some systems. Of course, in the real world, you’ll generally have a better end-of-input indicator when getting input from the user, as we’ll see in later chapters.

    Another change from the previous exercise’s answer is the ruler line. We used some math to cook up a ruler line that’s at least as long as we need, as suggested as an “extra credit” part of the exercise. Proving that our math is correct is an additional challenge. (Hint: Consider possible widths of 50 and 51, and remember that the right side operand to x is truncated, not rounded.)

    To generate the format this time, we used the expression "%${width}s ", which interpolates $width. The curly braces are required to “insulate” the name from the following s; without the curly braces, we’d be interpolating $widths, the wrong variable. If you forgot how to use curly braces to do this, though, you could have written an expression like '%' . $width . "s " to get the same format string.

    The value of $width brings up another case where chomp is vital. If the width isn’t chomped, the resulting format string would resemble "%30 s ". That’s not useful.

    People who have seen printf before may have thought of another solution. Because printf comes to us from C, which doesn’t have string interpolation, we can use the same trick that C programmers use. If an asterisk (”*“) appears in place of a numeric field width in a conversion, a value from the list of parameters will be used:

    printf "%*s
    ", $width, $_;

Answers to Chapter 7 Exercises

  1. Here’s one way to do it:

    /fred/

    Of course, you have to put that into the test program! This is pretty simple. The more important part of this exercise is trying it out on the sample strings. It doesn’t match Fred, showing that regular expressions are case-sensitive. (We’ll see how to change that later.) It does match frederick and Alfred, since both of those strings contain the four-letter string fred.. (Matching whole words only, so that frederick and Alfred wouldn’t match, is another feature we’ll see later.)

    If the test program is working correctly,[10] it should show those two matches as something like |<fred>erick| and |Al<fred>|, using the angle brackets to show where fred was found inside each string.

  2. Here’s one way to do it:

    /a+b*/

    That matches the letter a one or more times (that’s the plus), followed by b zero or more times (that’s the star). Well, that’s what the exercise asked for, but you may have come up with something different. After all, if you’re looking for any number of b’s, you know you’ll always find what you’re looking for. So you could have written /a+/ instead, and matched the same strings.[11]

    For that matter, when you want to match one or more a’s, you know that the match will succeed when you find even the first one. So, /a/ will match the same set of strings as the first two. The description “any string containing at least one a followed by any number of b’s” means the exact same thing as “any string containing a.” Of the sample strings, this matches all of them except fred.

    There are even more ways to make this pattern than we show here. Often, in trying to write a pattern, you will need to decide which one of many possible patterns best suits your needs.

  3. Here’s one way to do it:

    /\***/

    That’s what the text asked for: a backslash (typed twice, since we mean a real backslash[12]) zero or more times (that’s the first star), followed by an asterisk (backslashed, since star is a metacharacter) zero or more times (that’s the last star). Whew!

    And what about the sample strings? Did it match any of them? You bet: it matches all of them! It’s because the backslashes and asterisks aren’t required in the pattern; that is, this pattern can match the empty string. Here’s a rule you can rely upon: when a pattern may freely match the empty string, it’ll always match, since the empty string can be found in any string. In fact, it’ll always match in the first place that you look.

    So, this pattern matches all four characters in \**, as you’d expect. It matches the empty string at the beginning of fred, which you may not have expected. In the string barney \***, it matches the empty string at the beginning. You might wish it would hunt down the backslashes and stars at the end of that string, but it doesn’t bother. It looks at the beginning, sees zero backslashes followed by zero asterisks, declares the match a success, and goes home to watch television. And in *wilma, it matches just the star at the beginning; as you see, this pattern never gets away from the beginning of the string, since it always matches at the first opportunity.

    Now, if someone asked you for a pattern to match any number of backslashes followed by any number of asterisks, you’d be technically correct to give them this one. But chances are, that’s not what they really wanted. Spoken languages like English may be ambiguous and not say exactly what they mean, but regular expressions always mean exactly what they say they mean.

    In this case, maybe the person who asked for the pattern forgot to say that he or she always wants to match at least one character, when the pattern matches at all. We can do that. If there’s at least one backslash, /\+**/ will match. (That’s just like what we had before, but there’s a plus in place of the first star, meaning one or more backslashes.) If there’s not at least one backslash, then in order to match at least one character, we’ll need at least one asterisk, so we want /*+/. When you put those two possibilities together, you get:

    /\+**|*+/

    Ugly, isn’t it? Regular expressions are powerful but not beautiful. And they’ve contributed to Perl being maligned as a “write-only language.” To be sure that no one criticizes your code in that way, though, it’s kind to put an explanatory comment near any pattern that’s not obvious. On the other hand, when you’ve been using these for a year, you will have a different definition of “obvious” than you have today.

    How does this new pattern work with the sample strings? With \**, it matches all four characters, just like the last one. It won’t match fred, which is probably the right behavior given the problem description. For barney \***, it matches the six characters at the end, as you hoped. And for *wilma, it matches the asterisk at the beginning.

  4. Here’s one way to do it:

    while (<>) {
      if (/wilma/) {
        print;
      }
    }

    This is a grep-like program. For each line of text (contained in $_), we check to see whether the pattern matches. If it matches, we print it. This program uses print’s default: if you don’t tell it to print something else, it prints $_. So we have written a program that uses $_ all the way through, but never mentions it anywhere. Perl folks love to use the defaults and save time typing, so you’ll see a lot of programs that do this.

    And if, for extra credit, you wanted to match a capitalized Wilma as well, /wilma|Wilma/ would do the job. Or, more simply, you could have written /(w|W)ilma/. People who have used other regular expression implementations and already know about character classes, which we’ll discuss in the next chapter, could make that last one even shorter (and more efficient).[13]

  5. Here’s one way to do it:

    while (<>) {
      if (/wilma/) {
        if (/fred/) {
          print;
        }
      }
    }

    This tests /fred/ only after we find /wilma/ matches, but fred could appear before or after wilma in the line; each test is independent of the other.

    If you wanted to avoid the extra nested if test, you might have written something like this:[14]

    while (<>) {
      if (/wilma.*fred|fred.*wilma/) {
        print;
      }
    }

    This works because we’ll either have wilma before fred, or fred before wilma. If we had written just /wilma.*fred/, that wouldn’t have matched a line like fred and wilma flintstone, even though that line mentions both of them.

    We made this an extra-credit exercise because many folks have a mental block here. We showed you an “or” operation (with the vertical bar, "|“), but we never showed you an “and” operation. That’s because there isn’t one in regular expressions.[15] If you want to know whether one pattern and another are both successful, just test both of them.

Answers to Chapter 8 Exercises

  1. Here’s one way to do it:

    /(fred|wilma)s+flintstone/

    If you forgot to use the  word-boundary anchors, take off half a point; without those, this would mistakenly match strings like alfred flintstones. The exercise description said to match words.

  2. The point of this exercise may not be obvious, but in the real world, you’ll often have to do something similar. Someday, you’ll be unlucky enough to have a confusing program to maintain, and you’ll wonder what the author was trying to accomplish.[16]

    /"([^"]*)"/ matches a simple string in double quotes. By a “simple” string, we don’t mean one like Perl’s double-quoted strings, which could contain a backslashed double-quote mark or other backslash magic. This matches just a double-quote mark, the contents of the string (which can’t contain a double quote), and a closing quote mark. The contents may be empty. The parentheses aren’t needed for grouping, so they seem to be memory parentheses; as we’ll see in the next chapter, this regular expression memory, which holds the quoted substring, is probably being saved for some later use. Perhaps this pattern would be used in reading a configuration file with quoted strings, although in that case it should probably use anchors.

    /^0?[0-3]?[0-7]{1,2}$/ matches if the string has nothing but an octal number (perhaps with a leading zero) in the range from 0 through 0377. Note that this one is anchored at both ends, so it doesn’t allow anything else in the string before or after the number. (The previous pattern wasn’t anchored; it could match anywhere in the string.)

    /^[w.]{1,12}$/ matches strings made up of nothing but letters, digits, underscores, and dots, but never starting or ending with a dot. Also, the strings are limited to a maximum of 12 characters.

    You may have noticed that the dot inside the character class is not special, so it doesn’t need to be backslashed. That makes the character class match ordinary letters, digits, and underscores, and also dots.

    The way we can be sure that this one won’t allow a string to start or end with a dot is that it has both a word-boundary anchor and a start-of-string or end-of-string anchor at each end of the string. The word-boundary anchor can match only if there’s a “word” starting or ending there, and a dot can’t be part of a word.

    So, this would match strings like perl.tar.gz, but not some_excessively_long_filename or perl.tar. or .profile or ...[17] This pattern could be useful for validating user-chosen filenames.

  3. Here’s one way to do it:

    /^$[A-Za-z_]w*$/

    The dollar sign at the start has to be backslashed to mean a real dollar sign. What follows must be a letter or underscore, then zero or more letters, digits, or underscores.

  4. This pattern is surprisingly tricky to get right. Here’s how we construct it, step by step.

    We start out by needing to match a word, so that’s /w+/. Of course, we want to remember that word for later, so we add parentheses: /(w+)/. And we want to match it when it occurs two or more times, so that’s /(w+)1+/. (The plus sign at the end means one or more times—but that’s in addition to the one time that the word occurred originally.)

    But we’re not done yet. Now we need to allow for the whitespace which may come between the words. We don’t want to memorize the whitespace (since it may vary), so we’ll put it outside the parentheses: /(w+)s1+/. Oh, but there could be any number of whitespace characters, so long as there’s at least one, so we’ll add a plus sign. So now we have /(w+)s+1+/.

    But that’s not right; the final plus sign is modifying the backreference alone. We need it to apply to both the backreference (that is, our repeated word) and the whitespace in front of it: /(w+)(s+1)+/. So, now we can match a triple word. First, the part in the first parenthesis pair matches the first occurrence, then the part in the second parenthesis pair can twice match some whitespace followed by that same word. When we try it out, it matches all of our sentences with doubled words, so we happily put it into our program and move on to the next project.

    Then, the next week, we get a bug report! The pattern reports a match on the sentence This is a test, even though there’s clearly no doubled word there. In moments, we’ve fired up the pattern test program[18] to see what part of the string is matching: |Th<is is> a test|. There it is, a doubled word is, hidden in an ordinary string.

    Clearly, this is a job for a word boundary anchor; we can’t have our word start in the middle of another word. So we fix the program to use /(w+)(s+1)+/, and sit back, confident that we’ve got it right this time.

    And then, just when you got started on another project, another bug report comes in. This time, we’ve matched the doubled word the in the phrase the theory. Yes, we need a word boundary at the end of the pattern to keep from matching a partial word there: /(w+)(s+1)+/. Now we’ve finally gotten it right.

    What you’ve just read is a true story. The regular expression has been changed, but the bug reports are real. It does happen, more often than we’d like to admit, that even after you’ve been writing these patterns for years, you can make a pattern which has a bug, you can test it with a number of test cases, you can put it into a long-running program, the Perl documentation, or even a best-selling Perl book, and not realize that the bug is there until much later.

    The moral of the story is that regular expressions can be challenging. If you’re serious about learning about regular expressions, though (and all Perl programmers should be), we highly recommend the book Mastering Regular Expressions, by Jeffry Friedl (O’Reilly & Associates, Inc.).

Answers to Chapter 9 Exercises

  1. Here’s one way to do it:

    /($what){3}/

    Once $what has been interpolated, this gives a pattern resembling /(fred|barney){3}/. Without the parentheses, the pattern would be something like /fred|barney{3}/, which is the same as /fred|barneyyy/. So, the parentheses are required.

  2. Here’s one way to do it:

    @ARGV = '/path/to/perlfunc.pod';  # or mentioned on the command line
    
    while (<>) {
      if (/^=items+([a-z_]w*)/i) {
        print "$1
    ";                 # print out that identifier name
      }
    }

    With what we’ve shown you so far, the only way to open an arbitrary file for input is to use the diamond operator (or to use input redirection, perhaps). So we put the path to perlfunc.pod into @ARGV.

    The heart of this program is the pattern, which looks for an identifier name on an =item line. The exercise description was ambiguous, in that it didn’t say whether =item had to be in lower case; the author of this pattern seems to have decided that it should be a case-insensitive pattern. If you interpreted it otherwise, you could have used the pattern /^=items+([a-zA-Z_]w*)/.

  3. Here’s one way to do it:

    @ARGV = '/path/to/perlfunc.pod';  # or mentioned on the command line
    
    my %seen;                         # (optionally) declaring the hash
    
    while (<>) {
      if (/^=items+([a-z_]w*)/i) {
        $seen{$1} += 1;               # a tally for each item
      }
    }
    
    foreach (sort keys %seen) {
      if ($seen{$_} > 2) {            # more than twice
        print "$_ was seen $seen{$_} times.
    ";
      }
    }

    This one starts out much like the previous one, but declares the hash %seen (in case use strict might be in effect). This is called %seen because it tells us which identifier names we’ve seen so far in the program, and how many times. This is a common use of a hash. The first loop now counts each identifier name, as an entry in %seen, instead of printing it out.

    The second loop goes through the keys of %seen, which are the different identifier names we’ve seen. It sorts the list, which (although not specified in the exercise description) is a courtesy to the user, who might otherwise have to search for the desired item in a long list.

    Although it may not be obvious, this program is pretty close to a real-world problem that most of us are likely to see. Imagine that your webserver’s 400-megabyte logfile has some information you need. There’s no way you’re going to read that file on your own; you’ll want a program to match the information you need (with a pattern) and then print it out in some nice report format. Perl is good for putting together quick programs to do that sort of thing.

Answer to Chapter 10 Exercise

  1. Here’s one way to do it:

    my $secret = int(1 + rand 100);
    # This next line may be un-commented during debugging
    # print "Don't tell anyone, but the secret number is $secret.
    ";
    
    while (1) {
      print "Please enter a guess from 1 to 100: ";
      chomp(my $guess = <STDIN>);
      if ($guess =~ /quit|exit|^s*$/i) {
        print "Sorry you gave up. The number was $secret.
    ";
        last;
      } elsif ($guess < $secret) {
        print "Too small. Try again!
    ";
      } elsif ($guess == $secret) {
        print "That was it!
    ";
        last;
      } else {
        print "Too large. Try again!
    ";
      }
    }

    The first line picks out our secret number from 1 to 100. Here’s how it works. First, rand is Perl’s random number function, so rand 100 gives us a random number in the range from zero up to (but not including) 100. That is, the largest possible value of that expression is something like 99.999.[19] Adding one gives a number from 1 to 100.999, then the int function truncates that, giving a result from 1 to 100, as we needed.

    The commented-out line can be helpful during development and debugging, or if you like to cheat. The main body of this program is the infinite while loop. That will keep asking for guesses until we execute last.

    It’s important that we test the possible strings before the numbers. If we didn’t, do you see what would happen when the user types quit? That would be interpreted as a number (probably giving a warning message, if warnings were turned on), and since the value as a number would be zero, the poor user would get the message that their guess was too small. We might never get to the string tests, in that case.

    Another way to make the infinite loop here would be to use a naked block with redo. It’s no more or less efficient; merely another way to write it. Generally, if you expect to mostly loop, it’s good to write while, since that loops by default. If looping will be the exception, a naked block may be a better choice.

Answers to Chapter 11 Exercises

  1. Here’s one way to do it:

    sub get_line { 
      # prompts for, reads, chomps, and returns a line of input
      print $_[0];
      chomp(my $line = <STDIN>);
      $line;
    }
    
    my $source = &get_line("Which source file? ");
    open IN, $source
      or die "Can't open '$source' for input: $!";
    
    my $dest = &get_line("What destination file? ");
    die "Won't overwrite existing file"  
      if -e $dest;  # optional safety test
    open OUT, ">$dest"
      or die "Can't open '$dest' for output: $!";
    
    my $pattern = &get_line("What search pattern: ");
    my $replace = &get_line("What replacement string: ");
    
    while (<IN>) {
      s/$pattern/$replace/g;
      print OUT $_;
    }

    This one needs to ask the user for several things, so we decided to make a subroutine to take care of some of the work. The subroutine prints out the prompt, which is the first (and only) parameter to the subroutine. Then it reads a line of input, chomps it, and returns it. That makes it easy to ask for each parameter, one after the other.

    Once we know what the user wants for the source file, we try opening it. An earlier version of this program asked for all of the parameters first, but if the source file name is incorrect, there’s no point in having the user type more parameters. This way can save the user some time and trouble. Note that the die message reports the file name inside quote marks; this can be helpful in diagnosing a problem when a string has whitespace characters. If you opened "<$source" instead of just plain $source, that’s fine, too. (There’s no reason to worry that the user of this program will do something nefarious, since anything they can do with this program, they could accomplish just as well without it. If this program were made to run over the web, to give one example, we’d need to be much more cautious about opening the user’s choice of file.)

    As we hope you discovered when you tried it, it’s easy to overwrite an existing file simply by opening it for output. So we put in a safety test using the -e file test. The corresponding die message doesn’t include $! because we’re not reporting a failed request of the system. By the way, this test for overwrite is fine here, but it would be insufficient in an environment where many copies of the same program (or different programs all working with the same files) might be running at once. This typically happens with programs on the web: Two processes check the same filename for existence at approximately the same time, and both see that it doesn’t exist. So one of them creates the file, and an instant later the other one overwrites it with a file of its own. This kind of concurrency problem can’t be solved with the -e file test; some kind of file locking (which is beyond the scope of this book) is needed.

    With that safety test, the user won’t accidentally overwrite an existing file. Is that test a good idea? Well, if the user comes to see you next week and says, “Golly, I’m glad you put in that safety test. It kept me from accidentally overwriting my file!”, then you know that it was the right thing to do. But if the user says, “Dagnabit, your program is hard to use! I told it what filename I wanted to use for output, and it wouldn’t let me use it until I first deleted that file!”, then it was the wrong thing to do. Making decisions like this is often the toughest part of a programmer’s job. Perhaps we should make the program ask something like, “Are you sure you want to overwrite the existing file `barney'?” by default, but have a command-line option for the power user that says to overwrite without asking. Next version.

    Once we’ve asked for everything and opened the files, the rest of the program is pretty simple. The heart of the program is the loop at the end, which reads lines, updates them, and prints them out. Note that the substitution uses the /g option—if you left that out, your program is broken, since the exercise asked that the program replace every occurrence of the search pattern, not just the first one on each line.

    Were you able to use regular expression metacharacters in the search pattern? Sure; the substitution interpolates $pattern to make the search pattern. Were you able to use memory variables and backslash escapes in the replacement string? Nope; $replace is interpolated to make the replacement string, but it’s not re-interpolated to interpret any magical characters. If $replace holds $1, that’s a dollar sign and a numeral one in the replacement string. If Perl always kept re-interpolating, you could never put a dollar sign or backslash into the replacement string, since they’d always make something magical happen. (Actually, though, if you need one additional level of interpolation, it is possible. See the perlfaq manpages for some suggestions on how to do this.)

  2. Here’s one way to do it:

    foreach my $file (@ARGV) {
      my $attribs = &attributes($file);
      print "'$file' $attribs.
    ";
    }
    
    sub attributes {
      # report the attributes of a given file
      my $file = shift @_;
      return "does not exist" unless -e $file;
    
      my @attrib;
      push @attrib, "readable" if -r $file;
      push @attrib, "writable" if -w $file;
      push @attrib, "executable" if -x $file;
      return "exists" unless @attrib;
      'is ' . join " and ", @attrib;  # return value
    }

    In this one, once again it’s convenient to use a subroutine. The main loop prints one line of attributes for each file, perhaps telling us that 'cereal-killer' is executable or that 'sasquatch' does not exist.

    The subroutine tells us the attributes of the given filename. Of course, if the file doesn’t even exist, there’s no need for the other tests, so we test for that first. If there’s no file, we’ll return early.

    If the file does exist, we’ll build a list of attributes. (Give yourself extra credit points if you used the special _ filehandle instead of $file on these tests, to keep from calling the system separately for each new attribute.) It would be easy to add additional tests like the three we show here. But what happens if none of the attributes is true? Well, if we can’t say anything else, at least we can say that the file exists, so we do. The unless clause uses the fact that @attrib will be true (in a Boolean context, which is a special case of a scalar context) if it’s got any elements.

    But if we’ve got some attributes, we’ll join them with " and " and put "is " in front, to make a description like is readable and writable. This isn’t perfect however; if there are three attributes, it says that the file is readable and writable and executable, which has too many ands, but we can get away with it. If you wanted to add more attributes to the ones this program checks for, you should probably fix it to say something like is readable, writable, executable, and nonempty. If that matters to you.

    Note that if you somehow didn’t put any filenames on the command line, this produces no output. This makes sense; if you ask for information on zero files, you should get zero lines of output. But let’s compare that to what the next program does in a similar case, in the discussion below.

  3. Here’s one way to do it:

    die "No file names supplied!
    " unless @ARGV;
    my $oldest_name = shift @ARGV;
    my $oldest_age = -M $oldest_name;
    
    foreach (@ARGV) {
      my $age = -M;
      ($oldest_name, $oldest_age) = ($_, $age)    
        if $age > $oldest_age;
    }
    
    printf "The oldest file was %s, and it was %.1f days old.
    ",
      $oldest_name, $oldest_age;

    This one starts right out by complaining if it didn’t get any filenames on the command line. That’s because it’s supposed to tell us the oldest filename—and there ain’t one if there aren’t any files to check.

    Once again, we’re using the “high-water-mark” algorithm. The first file is certainly the oldest one seen so far. We have to keep track of its age as well, so that’s in $oldest_age.

    For each of the remaining files, we’ll determine the age with the -M file test, just as we did for the first one (except that here, we’ll use the default argument of $_ for the file test). The last-modified time is generally what people mean by the “age” of a file, although you could make a case for using a different one. If the age is more than $oldest_age, we’ll use a list assignment to update both the name and age. We didn’t have to use a list assignment, but it’s a convenient way to update several variables at once.

    We stored the age from -M into the temporary variable $age. What would have happened if we had simply used -M each time, rather than using a variable? Well, first, unless we used the special _ filehandle, we would have been asking the operating system for the age of the file each time, a potentially slow operation (not that you’d notice unless you have hundreds or thousands of files, and maybe not even then). More importantly, though, we should consider what would happen if someone updated a file while we’re checking it. That is, first we see the age of some file, and it’s the oldest one seen so far. But before we can get back to use -M a second time, someone modifies the file and resets the timestamp to the current time. Now the age that we save into $oldest_age is actually the youngest age possible. The result would be that we’d get the oldest file among the files tested from that point on, rather than the oldest overall; this would be a tough problem to debug!

    Finally, at the end of the program, we use printf to print out the name and age, with the age rounded off to the nearest tenth of a day. Give yourself extra credit if you went to the trouble to convert the age to a number of days, hours, and minutes.

Answers to Chapter 12 Exercises

  1. Here’s one way to do it, with a glob:

    print "Which directory? (Default is your home directory) ";
    chomp(my $dir = <STDIN>);
    if ($dir =~ /^s*$/) {         # A blank line
      chdir or die "Can't chdir to your home directory: $!"; 
    } else {
      chdir $dir or die "Can't chdir to '$dir': $!";
    }
    
    my @files = <*>;
    foreach (@files) {
      print "$_
    ";
    }

    First, we show a simple prompt, and read the desired directory, chomping it as needed. (Without a chomp, we’d be trying to head for a directory that ends in a newline—legal in Unix, and therefore cannot be presumed to simply be extraneous by the chdir function.)

    Then, if the directory name is nonempty, we’ll change to that directory, aborting on a failure. If empty, the home directory is selected instead.

    Finally, a glob on “star” pulls up all the names in the (new) working directory, automatically sorted to alphabetical order, and they’re printed one at a time.

  2. Here’s one way to do it:

    print "Which directory? (Default is your home directory) ";
    chomp(my $dir = <STDIN>);
    if ($dir =~ /^s*$/) {         # A blank line
      chdir or die "Can't chdir to your home directory:
    $!"; 
    } else {
      chdir $dir or die "Can't chdir to '$dir': $!";
    }
    
    my @files = <.* *>;       ## now includes .*
    foreach (sort @files) {   ## now sorts
      print "$_
    ";
    }

    Two differences from previous one: first, the glob now includes “dot star”, which matches all the names that do begin with a dot. And second, we now must sort the resulting list, because some of the names that begin with a dot must be interleaved appropriately either before or after the list of things without a beginning dot.

  3. Here’s one way to do it:

    print "Which directory? (Default is your home directory) ";
    chomp(my $dir = <STDIN>);
    if ($dir =~ /^s*$/) {         # A blank line
      chdir or die "Can't chdir to your home directory:
    $!"; 
    } else {
      chdir $dir or die "Can't chdir to '$dir': $!";
    }
    
    opendir DOT, "." or die "Can't opendir dot: $!";
    foreach (sort readdir DOT) {
      # next if /^./; ##   if we were skipping dot files
      print "$_
    ";
    }

    Again, same structure as the previous two programs, but now we’ve chosen to open a directory handle. Once we’ve changed the working directory, we want to open the current directory, and we’ve shown that as the DOT directory handle.

    Why DOT? Well, if the user asks for an absolute directory name, like /etc, there’s no problem opening it. But if the name is relative, like fred, let’s see what would happen. First, we chdir to fred, and then we want to use opendir to open it. But that would open fred in the new directory, not fred in the original directory. The only name we can be sure will mean “the current directory” is ".“, which always has that meaning (on Unix and similar systems, at least).

    The readdir function pulls up all the names of the directory, which are then sorted, and displayed. If we had done the first exercise this way, we would have skipped over the dot files, and that’s handled by uncommenting the commented-out line in the foreach loop.

    You may find yourself asking, “Why did we chdir first? You can use readdir and friends on any directory, not merely on the current directory.” Primarily, we wanted to give the user the convenience of being able to get to their home directory with a single keystroke. But this could be the start of a general file-management utility program; maybe the next step would be to ask the user which of the files in this directory should be moved to offline tape storage, say.

Answers to Chapter 13 Exercises

  1. Here’s one way to do it:

    unlink @ARGV;

    ...or, if you want to warn the user of any problems:

    foreach (@ARGV) {
      unlink $_ or warn "Can't unlink '$_': $!, continuing...
    ";
    }

    Here, each item from the command-invocation line is placed individually into $_, which is then used as the argument to unlink. If something goes wrong, the warning gives the clue about why.

  2. Here’s one way to do it:

    use File::Basename;
    use File::Spec;
    
    my($source, $dest) = @ARGV;
    
    if (-d $dest) {
      my $basename = basename $source;
      $dest = File::Spec->catfile($dest, $basename);
    }
    
    rename $source, $dest
      or die "Can't rename '$source' to '$dest': $!
    ";

    The workhorse in this program is the last statement, but the remainder of the program is necessary when we are renaming into a directory. First, after declaring the modules we’re using, we name the command-line arguments sensibly. If $dest is a directory, we need to extract the basename from the $source name and append it to the directory ($dest). Finally, once $dest is patched up if needed, the rename does the deed.

  3. Here’s one way to do it:

    use File::Basename;
    use File::Spec;
    
    my($source, $dest) = @ARGV;
    
    if (-d $dest) {
      my $basename = basename $source;
      $dest = File::Spec->catfile($dest, $basename);
    }
    
    link $source, $dest
      or die "Can't link '$source' to '$dest': $!
    ";

    As the hint in the exercise description said, this program is much like the previous one. The difference is that we’ll link rather than rename. If your system doesn’t support hard links, you might have written this as the last statement:

    print "Would link '$source' to '$dest'.
    ";
  4. Here’s one way to do it:

    use File::Basename;
    use File::Spec;
    
    my $symlink = $ARGV[0] eq '-s';
    shift @ARGV if $symlink;
    
    my($source, $dest) = @ARGV;
    if (-d $dest) {
      my $basename = basename $source;
      $dest = File::Spec->catfile($dest, $basename);
    }
    
    if ($symlink) {
      symlink $source, $dest
        or die "Can't make soft link from '$source' to '$dest': $!
    ";
    } else {
      link $source, $dest
        or die "Can't make hard link from '$source' to '$dest': $!
    ";
    }

    The first few lines of code (after the two use declarations) look at the first command-line argument, and if it’s "-s“, we’re making a symbolic link, so we note that as a true value for $symlink. If we saw that "-s“, we then need to get rid of it (in the next line). The next few lines are cut-and-pasted from the previous exercise answers. Finally, based on the truth of $symlink, we’ll choose either to create a symbolic link or a hard link. We also updated the dying words to make it clear which kind of link we were attempting.

  5. Here’s one way to do it:

    foreach (<.* *>) {
      my $dest = readlink $_;
      print "$_ -> $dest
    " if defined $dest;
    }

    Each item resulting from the glob ends up in $_ one by one. If the item is a symbolic link, then readlink returns a defined value, and the location is displayed. If not, then the condition fails, and we skip over it.

Answers to Chapter 14 Exercises

  1. Here’s one way to do it:

    chdir "/" or die "Can't chdir to root directory: $!";
    exec "ls", "-l" or die "Can't exec ls: $!";

    The first line changes the current working directory to the root directory, as our particular hard-coded directory. The second line uses the multiple-argument exec function to send the result to standard output. We could have used the single-argument form just as well, but it doesn’t hurt to do it this way.

  2. Here’s one way to do it:

    open STDOUT, ">ls.out" or die "Can't write to ls.out: $!";
    open STDERR, ">ls.err" or die "Can't write to ls.err: $!";
    chdir "/" or die "Can't chdir to root directory: $!";
    exec "ls", "-l" or die "Can't exec ls: $!";

    The first and second lines reopen STDOUT and STDERR to a file in the current directory (before we change directories). Then, after the directory change, the directory listing command executes, sending the data back to the files opened in the original directory.

    Where would the message from the last die go? Why, it would go into ls.err, of course, since that’s where STDERR is going at that point. The die from chdir would go there, too. But where would the message go if we can’t re-open STDERR on the second line? It goes to the old STDERR. For the three standard filehandles, STDIN, STDOUT, and STDERR, if re-opening them fails, the old filehandle is still open.

  3. Here’s one way to do it:

    if (`date` =~ /^S/) {
      print "go play!
    ";
    } else {
      print "get to work!
    ";
    }

    Well, since both Saturday and Sunday start with an S, and the day of the week is the first part of the output of the date command, this is pretty simple. Just check the output of the date command to see if it starts with S. There are many harder ways to do this program, and we’ve seen most of them in our classes.

    If we had to use this in a real-world program, though, we’d probably use the pattern /^(Sat|Sun)/. It’s a tiny bit less efficient, but that hardly matters; besides, it’s so much easier for the maintenance programmer to understand.

Answers to Chapter 15 Exercises

  1. Here’s one way to do it:

    my @numbers;
    push @numbers, split while <>;
    foreach (sort { $a <=> $b } @numbers) {
      printf "%20g
    ", $_;
    }

    That second line of code is too confusing, isn’t it? Well, we did that on purpose. Although we recommend that you write clear code, some people like writing code that’s as hard to understand as possible,[20] so we want you to be prepared for the worst. Someday, you’ll need to maintain confusing code like this.

    Since that line uses the while modifier, it’s the same as if it were written in a loop like this:

    while (<>) {
      push @numbers, split;
    }

    That’s better, but maybe it’s still a little unclear. (Nevertheless, we don’t have a quibble about writing it this way. This one is on the correct side of the “too hard to understand at a glance” line.) The while loop is reading the input a line at a time (from the user’s choice of input sources, as shown by the diamond operator), and split is, by default, splitting that on whitespace to make a list of words—or, in this case, a list of numbers. The input is just a stream of numbers separated by whitespace, after all. Either way that you write it, then, that while loop will put all of the numbers from the input into @numbers.

    The foreach loop takes the sorted list and prints each one on its own line, using the %20g numeric format to put them in a right-justified column. You could have used %20s instead. What difference would that make? Well, that’s a string format, so it would have left the strings untouched in the output. Did you notice that our sample data included both 1.50 and 1.5, and both 04 and 4? If you printed those as strings, the extra zero characters will still be in the output; but %20g is a numeric format, so equal numbers will appear identically in the output. Either format could potentially be correct, depending upon what you’re trying to do.

  2. Here’s one way to do it:

    # don't forget to incorporate the hash %last_name,
    # either from the exercise text or the downloaded file
    
    my @keys = sort {
      "L$last_name{$a}" cmp "L$last_name{$b}"  # by last name
       or
      "L$a" cmp "L$b"                          # by first name
    } keys %last_name;
    
    foreach (@keys) {
      print "$last_name{$_}, $_
    ";              # Rubble,Bamm-Bamm
    }

    There’s not much to say about this one; we put the keys in order as needed, then print them out. We chose to print them in last-name-comma-first-name order just for fun; the exercise description left that up to you.

  3. Here’s one way to do it:

    print "Please enter a string: ";
    chomp(my $string = <STDIN>);
    print "Please enter a substring: ";
    chomp(my $sub = <STDIN>);
    
    my @places;
    
    for (my $pos = -1; ; ) {                  # tricky use of three-part for loop
      $pos = index($string, $sub, $pos + 1);  # find next position
      last if $pos == -1;
      push @places, $pos;
    }
    
    print "Locations of '$sub' in '$string' were: @places
    ";

    This one starts out simply enough, asking the user for the strings and declaring an array to hold the list of substring positions. But once again, as we see in the for loop, the code seems to have been “optimized for cleverness”, which should be done only for fun, never in production code. But this actually shows a valid technique which could be useful in some cases, so let’s see how it works.

    The my variable $pos is declared private to the scope of the for loop, and it starts with a value of -1. So as not to keep you in suspense about this variable, we’ll tell you right now that it’s going to hold a position of the substring in the larger string. The test and increment sections of the for loop are empty, so this is an infinite loop. (Of course, we’ll eventually break out of it, in this case with last).

    The first statement of the loop body looks for the first occurrence of the substring at or after position $pos + 1. That means that on the first iteration, when $pos is still -1, the search will start at position 0, the start of the string. The location of the substring is stored back into $pos. Now, if that was -1, we’re done with the for loop, so last breaks out of the loop in that case. If it wasn’t -1, then we save the position into @places and go around the loop again. This time, $pos + 1 means that we’ll start looking for the substring just after the previous place where we found it. And so we get the answers we wanted and the world is once again a happy place.

    If you didn’t want that tricky use of the for loop, you could accomplish the same result as shown here:

    {
      my $pos = -1;
      while (1) {
        ... # Same loop body as the for loop used above
      }
    }

    The naked block on the outside restricts the scope of $pos. You don’t have to do that, but it’s often a good idea to declare each variable in the smallest possible scope. This means we have fewer variables “alive” at any given point in the program, making it less likely that we’ll accidentally reuse the name $pos for some new purpose. For the same reason, if you don’t declare a variable in a small scope, you should generally give it a longer name that’s thereby less likely to be reused by accident. Maybe something like $substring_position would be appropriate in this case.

    On the other hand, if you were trying to obfuscate your code (shame on you!), you could create a monster like this (shame on us!):

    for (my $pos = -1; -1 != 
      ($pos = index 
        +$string,
        +$sub,
        +$pos
        +1
      );
    push @places, ((((+$pos))))) {
        'for ($pos != 1; # ;$pos++) {
          print "position $pos
    ";#;';#' } pop @places;
    }

    That even trickier code works in place of the original tricky for loop. By now, you should know enough to be able to decipher that one on your own, or to obfuscate code in order to amaze your friends and confound your enemies. Be sure to use these powers only for good, never for evil.

    Oh, and what did you get when you searched for t in This is a test.? It’s at positions 10 and 13. It’s not at position 0; since the capitalization doesn’t match, the substring doesn’t match.

Answers to Chapter 16 Exercises

  1. Here’s one way to do it:

    open PF, '/path/to/perlfunc.pod' or die "Can't open perlfunc.pod: $!";
    dbmopen my %DB, "pf_data", 0644 or die "Can't create dbm file: $!";
    
    %DB = ( );  # wipe existing data, if any
    
    while (<PF>) {
      if (/^=items+([a-z_]w*)/i) {
        $DB{$1} = $DB{$1} || $.;
      }
    }
    
    print "Done!
    ";

    This one is similar to the previous ones with perlfunc.pod. Here, though, we open a DBM file called pf_data as the DBM hash %DB. In case that file had any leftover data, we set the hash to an empty list. That’s normally a rare thing to do, but we want to wipe out the entire database, in case a previous run of this program left incorrect or out-of-date data in the file. (After all, there’s a new perlfunc.pod with each new release of Perl.)

    When we find an identifier, we need to store its line number (from $.) into the database. The statement that does that uses the high-precedence short-circuit || operator. If the database entry already has a value, that value is true, so the old value is used. If the database entry is empty, that’s false, so the value on the right ($.) is used instead. We could have written that line in a shorter way, like this:

    $DB{$1} ||= $.;

    When the program is done, it says so. That’s not required by the exercise description, but it lets us know that the program did something; without that line, there would be no output at all. But how can we tell that it worked correctly? That’s the next exercise.

  2. Here’s one way to do it:

    dbmopen my %DB, "pf_data", undef or die "Can't open dbm file: $!";
    my $line = $DB{$ARGV[0]} || "not found";
    
    print "$ARGV[0]: $line
    ";

    Once we have the database, it’s simple to look something up in it. Note that in this program, the third argument to dbmopen is undef, since that file must already exist for this program to work.

    If the entry for $ARGV[0] (the first command-line parameter) isn’t found in the database, we’ll say it’s not found, using the high-precedence short-circuit ||.

  3. Here’s one way to do it:

    dbmopen my %DB, "pf_data", undef or die "Can't open dbm file: $!";
    
    if (my $line = $DB{$ARGV[0]}) {
      exec 'less', "+$line", '/path/to/perlfunc.pod'
        or die "Can't exec pager: $!";
    } else {
      die "Entry unknown: '$ARGV[0]'.
    ";
    }

    This starts out like the previous one, but uses exec to start up a pager program if it can, and dies if it can’t.

Answer to Chapter 17 Exercises

  1. Here’s one way to do it:

    my $filename = 'path/to/sample_text';
    open FILE, $filename
      or die "Can't open '$filename': $!";
    chomp(my @strings = <FILE>);
    while (1) {
      print "Please enter a pattern: ";
      chomp(my $pattern = <STDIN>);
      last if $pattern =~ /^s*$/;
      my @matches = eval {
        grep /$pattern/, @strings;
      };
      if ($@) {
        print "Error: $@";
      } else {
        my $count = @matches;
        print "There were $count matching strings:
    ",
          map "$_
    ", @matches;
      }
      print "
    ";
    }

    This one uses an eval block to trap any failure that might occur when using the regular expression. Inside that block, a grep pulls the matching strings from the list of strings.

    Once the eval is finished, we can report either the error message or the matching strings. Note that we “unchomped” the strings for output by using map to add a newline to each string .



[1] If you’d prefer a more formal sort of constants, the constant pragma may be what you’re looking for.

[2] It nearly did change by a legislative act in the state of Indiana. http://www.urbanlegends.com/legal/pi_indiana.html

[3] We asked O’Reilly to spend the extra money to print the input cursor with blinking ink, but they wouldn’t do it for us.

[4] Chomping is like chewing—not always needed, but most of the time it doesn’t hurt.

[5] We can’t do this without advanced trickiness, that is. It’s rare to find anything that you absolutely can’t do in Perl.

[6] Also, at least in some versions of Perl, the shorter way will avoid a warning about using an undefined value that may crop up with the longer one. The warning may also be avoided by using the ++ operator to increment the variable, although we haven’t shown you that operator yet.

[7] Or to Larry, if he’s standing nearby.

[8] Unless Larry told you not to do that.

[9] As Larry should have explained to you by now.

[10] If the test program didn’t work correctly, you probably didn’t download it as we suggested. And you probably didn’t test what you typed, as we also suggested. But in that case, you probably didn’t do the exercises either; you’re just reading these answers in the back of the book, and so the test program (which you didn’t actually run) performed flawlessly. In that case, this footnote is pointless.

[11] To be sure, you’ll match different parts of the strings. But any string that matches /a+b*/ will also match /a+/, and vice versa.

[12] Whenever you mean a real backslash in Perl, type two of them. A lone backslash may try to do something magical, but two of them will always mean a real backslash.

[13] If you made the whole pattern case-insensitive, shame on you. We haven’t learned that yet. Besides, that would match WILMA, which shouldn’t match, according to the exercise description.

[14] Folks who know about the logical-and operator, which we’ll see in Chapter 10, could do both tests /fred/ and /wilma/ in the same if conditional. That’s more efficient, more scalable, and an all-around better way than the ones given here. But we haven’t seen logical-and yet.

[15] But there are some tricky and advanced ways of doing what some folks would call an “and” operation. These are generally less efficient than using Perl’s logical-and, though, depending upon what optimizations Perl and its regular expression engine can make.

[16] If you’re especially unlucky, this happens when you look at your own code ten minutes after writing it.

[17] You may know that file and directory names beginning with a dot are not displayed by default on Unix systems, and that the special directory name .. always means the directory one level higher in the hierarchy.

[18] We told you that it would come in handy, and we weren’t kidding.

[19] The actual largest possible value depends upon your system; see http://www.cpan.org/doc/FMTEYEWTK/random if you really need to know.

[20] Well, we don’t recommend it for normal coding purposes, but it can be a fun game to write confusing code, and it can be educational to take someone else’s obfuscated code examples and spend a weekend or two figuring out just what they do. If you want to see some fun snippets of such code and maybe get a little help with decoding them, ask around at the next Perl Mongers’ meeting. Or search for JAPHs on the Web, or see how well you can decipher the example obfuscated code block near the end of this chapter’s answers.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset