We’ve already seen how to do some input/output (I/O), in order to make some of the earlier exercises possible. But now we’ll learn a little more about those operations. As the title of this chapter implies, there will be more about Perl’s I/O operations in Chapter 11.
Reading from the standard input stream is easy.[1] We’ve been doing it already with the
<STDIN>
operator.[2]
Evaluating this operator in a scalar context gives you the next line
of input:
$line = <STDIN>; # read the next line chomp($line); # and chomp it chomp($line = <STDIN>); # same thing, more idiomatically
Since the line-input operator will return undef
when you reach end-of-file, this is handy for dropping out of loops:
while (defined($line = <STDIN>)) { print "I saw $line"; }
There’s a
lot going on in that first line: we’re reading the input into a
variable, checking that it’s defined, and if it is (meaning
that we haven’t reached the end of the input) we’re
running the body of the while
loop. So, inside the
body of the loop, we’ll see each line, one after another, in
$line
.[3] This is something you’ll want to do
fairly often, so naturally Perl has a shortcut for it. The shortcut
looks like this:
while (<STDIN>) { print "I saw $_"; }
Now, to make this shortcut, Larry chose some useless syntax. That is,
this is literally saying, “Read a line of
input, and see if it’s true. (Normally it is.) And if it is
true, enter the while
loop, but throw
away that line of input!" Larry knew that it was a
useless thing to do; nobody should ever need to do that in a real
Perl program. So, Larry took this useless syntax and made it useful.
What this is actually saying is that Perl should
do the same thing as we saw in our earlier loop: it tells Perl to
read the input into a variable, and (as long as the result was
defined, so we haven’t reached end-of file) then enter the
while
loop. However, instead of storing the input
into $line
, Perl will use its favorite default
variable, $_
, just as if you had written this:
while (defined($_ = <STDIN>)) { print "I saw $_"; }
Now, before we go any further, we must be very clear about something:
this shortcut works only if you write it just as
we did. If you put a line-input operator anywhere else (in
particular, as a statement all on its own) it won’t read a line
into $_
by default. It works
only if there’s nothing but the line-input
operator in the conditional of a while
loop.[4] If you put
anything else into the conditional expression, this shortcut
won’t apply.
There’s no connection between the line-input operator
(<STDIN>
) and Perl’s favorite
default variable
($_
). In this case, though, it just happens that
the input is being stored in that variable.
On the other hand, evaluating the line-input operator in a list context gives you all of the (remaining) lines of input as a list—each element of the list is one line:
foreach (<STDIN>) { print "I saw $_"; }
Once again, there’s no connection between the line-input
operator and Perl’s favorite default variable. In this case,
though, the default control variable for foreach
is $_
. So in this loop, we’ll see each line
of input in $_
, one after the other.
That may sound familiar, and for good reason: That’s the same
behavior as the while
loop would do. Isn’t
it?
The difference is under the hood. In the while
loop, Perl reads a line of input, puts it into a variable, and runs
the body of the loop. Then, it goes back to find another line of
input. But in the foreach
loop, the line-input
operator is being used in a list context (since
foreach
needs a list to iterate through). So it
has to read all of the input before the loop can start running. That
difference will become apparent when the input is coming from your
400MB web server log file! It’s generally best to use code like
the while
loop’s shortcut, which will
process input a line at a time, whenever
possible.
Another way
to read input is with the diamond[5] operator:
<>
. This is useful for making programs that
work like standard
Unix[6] utilities, with respect to the
invocation arguments (which
we’ll see in a moment). If you want to make a Perl program that
can be used like the utilities cat,
sed, awk,
sort, grep,
lpr, and many others, the diamond operator will be
your friend. If you want to make anything else, the diamond operator
probably won’t help.
The invocation arguments to a program are normally a number of “words” on the command line after the name of the program.[7] In this case, they give the names of a number of files to be processed in sequence:
$ ./my_program fred barney betty
That command means to run the command my_program (which will be found in the current directory), and that it should process file fred, followed by file barney, followed by file betty.
If you give no invocation arguments, the program should process the
standard input stream. Or, as a special case, if you give just a
hyphen as one of the
arguments, that means standard input as well.[8] So, if the invocation
arguments had been fred - betty
, that would have
meant that the program should process file fred,
followed by the standard input stream, followed by file
betty.
The benefit of making your programs work like this is that you may choose where the program gets its input at run time; for example, you won’t have to rewrite the program to use it in a pipeline (which we’ll discuss more later). Larry put this feature into Perl because he wanted to make it easy for you to write your own programs that work like standard Unix utilities—even on non-Unix machines. Actually, he did it so he could make his own programs work like standard Unix utilities; since some vendors’ utilities don’t work just like others', Larry could make his own utilities, deploy them on a number of machines, and know that they’d all have the same behavior. Of course, this meant porting Perl to every machine he could find.
The diamond operator is actually a special kind of line-input operator. But instead of getting the input from the keyboard, it comes from the user’s choice of input:[9]
while (defined($line = <>)) { chomp($line); print "It was $line that I saw! "; }
So, if we run this program with the invocation arguments
fred
, barney
, and
betty
, it will say something like: “It was
[a line from file fred] that I saw!”,
“It was [another line from file fred] that
I saw!”, on and on until it reaches the end of file
fred
. Then, it will automatically go on to file
barney
, printing out one line after another, and
then on to file betty
. Note that there’s no
break when we go from one file to another; when you use the diamond,
it’s as if the input files have been merged into one big
file.[10] The diamond will
return undef
(and we’ll drop out of the
while
loop) only at the end of all of the input.
In fact, since this is just a special kind of line-input operator, we
may use the same shortcut we saw earlier, to read the input into
$_
by default:
while (<>) { chomp; print "It was $_ that I saw! "; }
This works like the loop above, but with less typing. And you may
have noticed that we’re using the default for
chomp
; without an argument,
chomp
will work on $_
. Every
little bit of saved typing helps!
Since the diamond operator is generally being used to process all of
the input, it’s typically a mistake to use it in more than one
place in your program. If you find yourself putting two diamonds into
the same program, especially using the second diamond inside the
while
loop that is reading from the first one,
it’s almost certainly not going to do what you would
like.[11] In our experience, when beginners put a
second diamond into a program, they meant to use
$_
instead. Remember, the diamond operator
reads the input, but the input itself is
(generally, by default) found in $_
.
If the diamond operator can’t open one of the files and read from it, it’ll print an allegedly helpful diagnostic message, such as:
can't open wimla: No such file or directory
The diamond operator will then go on to the next file automatically, much like what you’d expect from cat or another standard utility.
Technically, the diamond operator isn’t looking literally at
the invocation arguments—it works
from the @ARGV
array.
This array is a special array that is preset by the Perl interpreter
to be a list of the invocation arguments. In other words, this is
just like any other array, (except for its funny, all-caps name), but
when your program starts, @ARGV
is already stuffed
full of the list of invocation arguments.[12]
You can use @ARGV
just like any other array; you
could shift
items off of it, perhaps, or use
foreach
to iterate over it. You could even check
to see if any arguments start with a hyphen, so that you could
process them as invocation options (like Perl does with its own
-w
option).[13]
This is how the diamond operator knows what filenames it should use:
it looks in @ARGV
. If it finds an empty list, it
uses the standard input stream; otherwise it uses the list of files
that it finds. This means that after your program starts and before
you start using the diamond, you’ve got a chance to tinker with
@ARGV
. For example, here we can process three
specific files, regardless of what the user chose on the command
line:
@ARGV = qw# larry moe curly #; # force these three files to be read while (<>) { chomp; print "It was $_ that I saw in some stooge-like file! "; }
In Chapter 11, we’ll see how to open and close specific filenames at specific times. But this technique will suffice for the next few chapters.
The
print
operator takes a list of values and sends each item (as a string, of
course) to standard output in turn, one after another. It
doesn’t add any extra characters before, after, or in between
the items;[14] if you want spaces between items and a newline at the
end, you have to say so:
$name = "Larry Wall"; print "Hello there, $name, did you know that 3+4 is ", 3+4, "? ";
Of course, that means that there’s a difference between printing an array and interpolating an array:
print @array; # print a list of items print "@array"; # print a string (containing an interpolated array)
That first print
statement will print a list of
items, one after another, with no spaces in between. The second one
will print exactly one item, which is the string you get by
interpolating @array
into the empty
string—that is, it prints the contents of
@array
, separated by spaces.[15] So, if @array
holds
qw/
fred barney
betty
/
,[16] the first one prints
fredbarneybetty
, while the second prints
fred
barney betty
separated by
spaces.
But before you decide to always use the second form, imagine that
@array
is a list of unchomped lines of input. That
is, imagine that each of its strings has a trailing newline
character. Now, the first print
statement prints
fred
, barney
, and
betty
on three separate lines. But the second one
prints this:
fred barney betty
Do you see where the spaces come from? Perl is interpolating an
array, so it puts spaces between the elements. So, we get the first
element of the array (fred
and a newline
character), then a space, then the next element of the array
(barney
and a newline character), then a space,
then the last element of the array (betty
and a
newline character). The result is that the lines seem to have become
indented, except for the first one. Every week or two, a message
appears on the newsgroup comp.lang.perl.misc
with a subject line something like:
Perl indents everything after the first line |
Without even reading the message, we can immediately see that the program used double quotes around an array containing unchomped strings. “Did you perhaps put an array of unchomped strings inside double quotes?” we ask, and the answer is always yes.
Generally, if your strings contain newlines, you simply want to print them, after all:
print @array;
But if they don’t contain newlines, you’ll generally want to add one at the end:
print "@array ";
So, if you’re using the quote marks, you’ll be
(generally) adding the
at the end of the string
anyway; this should help you to remember which is which.
It’s normal for your program’s output to be buffered . That is, instead of sending out every little bit of output at once, it’ll be saved until there’s enough to bother with. That’s because if (for example) the output were going to be saved on disk, it would be (relatively) slow and inefficient to spin the disk every time that one or two characters need to be added to the file. Generally, then, the output will go into a buffer that is flushed (that is, actually written to disk, or wherever) only when the buffer gets full, or when the output is otherwise finished (such as at the end of runtime). Usually, that’s what you want.
But if you (or a program) may be waiting impatiently for the output,
you may wish to take that performance hit and flush the output buffer
each time you print
. See the Perl manpages for
more information on controlling buffering in that case.
Since print
is looking for a list of strings to
print, its arguments are evaluated in list context. Since the diamond
operator (as a special kind of line-input operator) will return a
list of lines in a list context, these can work well together:
print <>; # source code for 'cat' print sort <>; # source code for 'sort'
Well, to be fair, the standard Unix commands cat and sort do have some additional functionality that these replacements lack. But you can’t beat them for the price! You can now re-implement all of your standard Unix utilities in Perl, and painlessly port them to any machine that has Perl, whether that machine is running Unix or not. And you can be sure that the programs on every different type of machine will nevertheless have the same behavior.[17]
What might not be obvious is that
print
has optional parentheses, which can sometimes cause confusion.
Remember the rule that parentheses in Perl may always be omitted,
except when doing so would change the meaning of a statement. So,
here are two ways to print the same thing:
print("Hello, world! "); print "Hello, world! ";
So far, so good. But another rule in Perl is that if the invocation
of print
looks like a
function call, then it is a function call.
It’s a simple rule, but what does it mean for something to look
like a function call?
In a function call, there’s a function name immediately[18] followed by parentheses around the function’s arguments, like this:
print (2+3);
That looks like a function call, so it is a function call. It
prints 5
, but then it returns a value like any
other function. The return value of print
is a
true or false value, indicating the success of the print. It nearly
always succeeds, unless you get some I/O error, so the
$result
in the following statement will normally
be 1
:
$result = print("hello world! ");
But what if you used the result in some other way? Let’s suppose you decide to multiply the return value times four:
print (2+3)*4; # Oops!
When Perl sees this line of code, it prints 5
,
just as you asked. Then it takes the return value from
print
, which is 1
, and
multiplies that times 4
. It then throws away the
product, wondering why you didn’t tell it to do something else
with it. And at this point, someone looking over your shoulder says,
“Hey, Perl can’t do math! That should have printed
20
, rather than 5
!”
This is the problem with allowing the parentheses to be optional;
sometimes we humans forget where the parentheses really belong. When
there are no parentheses, print
is a list
operator, printing all of the items in the following list;
that’s generally what you’d expect. But when the first
thing after print
is a left parenthesis,
print
is a function call, and it will print only
what’s found inside the parentheses. Since that line had
parentheses, it’s the same to Perl as if you’d said this:
( print(2+3) ) * 4; # Oops!
Fortunately, Perl itself can almost always help you with this, if you
ask for warnings—so use -w
, at least during
program development and debugging.
Actually, this rule—“If it looks like a function call, it
is a function call”—applies to all list
functions[19] in Perl, not
just to print
. It’s just that you’re
most likely to notice it with print
. If
print
(or another function name) is followed by
an open parenthesis, make sure that the corresponding close
parenthesis comes after all of the arguments to
that function.
You may wish to have a little more
control with your output than print
provides. In
fact, you may be accustomed to the formatted output of C’s
printf
function. Fear not—Perl provides a
comparable operation with the same name.
The printf
operator takes a format string
followed by a list of things to print. The format[20] string is a fill-in-the-blanks template
showing the desired form of the output:
printf "Hello, %s; your password expires in %d days! ", $user, $days_to_die;
The format string holds a number of so-called
conversions
; each conversion
begins with a percent sign (%
) and ends with a
letter. (As we’ll see in a moment, there may be significant
extra characters between these two symbols.) There should be the same
number of items in the following list as there are conversions; if
these don’t match up, it won’t work correctly. In the
example above, there are two items and two conversions, so the output
might look something like this:
Hello, merlyn; your password expires in 3 days!
There are many possible printf
conversions, so
we’ll take time here to describe just the most common ones. Of
course, the full details are available in the
perlfunc
manpage.
To print a number in what’s generally a good way, use
%g
,[21] which
automatically chooses floating-point, integer, or even exponential
notation as needed:
printf "%g %g %g ", 5/2, 51/17, 51 ** 17; # 2.5 3 1.0683e+29
The %d
format means a decimal[22] integer, truncated as needed:
printf "in %d days! ", 17.85; # in 17 days!
Note that this is truncated, not rounded; we’ll see how to round off a number in a moment.
In Perl, printf
is most often used for columnar
data, since most formats accept a field width. If the data
won’t fit, the field will generally be expanded as needed:
printf "%6d ", 42; # output like ````42 (the ` symbol stands for a space) printf "%2d ", 2e3 + 1.95; # 2001
The %s
conversion means a string, so it
effectively interpolates the given value as a string, but with a
given field width:
printf "%10s ", "wilma"; # looks like `````wilma
A negative field width is left-justified (in any of these conversions):
printf "%-15s ", "flintstone"; # looks like flintstone `````
The %f
conversion (floating-point) rounds
off its output as needed, and even lets you request a certain number
of digits after the decimal point:
printf "%12f ", 6 * 7 + 2/3; # looks like ```42.666667 printf "%12.3f ", 6 * 7 + 2/3; # looks like ``````42.667 printf "%12.0f ", 6 * 7 + 2/3; # looks like ` ` ` ` ` ` ` ` ` `43
To print a real percent sign,
use %%
, which is special in that it uses no
element from the list:[23]
printf "Monthly interest rate: %.2f%% ", 5.25/12; # the value looks like "0.44%"
Generally, you won’t use an
array as an argument to
printf
. That’s because an array may hold
any number of items, and a given format string will work with only a
certain fixed number of items: if there are three conversions in the
format, there must be exactly three items.
But there’s no reason you can’t whip up a format string on the fly, since it may be any expression. This can be tricky to get right, though, so it may be handy (especially when debugging) to store the format into a variable:
my @items = qw( wilma dino pebbles ); my $format = "The items are: " . ("%10s " x @items); ## print "the format is <<$format>> "; # for debugging printf $format, @items;
This uses the
x
operator (which we learned about in Chapter 2) to
replicate the given string a number of times given by
@items
(which is being used in a scalar context).
In this case, that’s 3
, since there are
three items, so the resulting format string is the same as if we had
written it as "The items
are:
%10s
%10s
%10s
."
And the
output prints each item on its own line, right-justified in a
ten-character column, under a heading line. Pretty cool, huh? But not
cool enough, because you can even combine these:
printf "The items are: ".("%10s " x @items), @items;
Note that here we have @items
being used once in a
scalar context, to get its length, and once in a list context, to get
its contents. Context is
important.
See Section A.5 for answers to the following exercises:
[7] Write a program that acts like cat, but
reverses the order of the output lines. (Some systems have a utility
like this named tac.) If you run yours as
./tac fred barney betty
, the output should be all
of file betty from last line to first, then
barney and then fred, also from
last line to first. (Be sure to use the ./
in your
program’s invocation if you call it tac, so
that you don’t get the system’s utility instead!)
[8] Write a program that asks the user to enter a list of strings on
separate lines, printing each string in a right-justified
20-character column. To be certain that the output is in the proper
columns, print a “ruler line” of digits as well. (This is
simply a debugging aid.) Make sure that you’re not using a
19-character column by mistake! For example, entering
hello
, good-bye
should give
output something like this:
123456789012345678901234567890123456789012345678901234567890 hello good-bye
[8] Modify the previous program to let the user choose the column
width, so that entering 30
,
hello
, good-bye
(on separate
lines) would put the strings at the 30th column. (Hint: see the
section Section 2.6.1 in Chapter 2 about controlling variable interpolation.) For
extra credit, make the ruler line longer when the selected width is
larger.
[1] If you’re already familiar with the workings of standard input, output, and error streams, you’re ahead of the game. If not, we’ll get you caught up when we get to Chapter 14. For now, just think of “standard input” as being “the keyboard,” and “standard output” as being “the display screen.”
[2] What
we’re calling the line-input operator here,
<STDIN>
, is actually a line-input operator
(represented by the angle brackets) around a
filehandle. We’ll learn about
filehandles in Chapter 11.
[3] You probably noticed that
we never chomped that input. In this kind of a loop, you can’t
really put chomp
into the conditional
expression, so it’s often the first item in the loop body, when
it’s needed. We’ll see examples of that in the next
section.
[4] Well, okay, the conditional of a for
loop is just a while
conditional in
disguise, so it works there, too.
[5] The diamond operator was named by Larry’s daughter, Heidi, when Randal went over to Larry’s house one day to show off the new training materials he’d been writing, and complained that there was no spoken name for “that thing”. Larry didn’t have a name for it, either. Heidi (eight years old at the time) quickly chimed in, “That’s a diamond, Daddy.” So the name stuck. Thanks, Heidi!
[6] But not just on Unix systems. Many other systems have adopted this way of using invocation arguments.
[7] Whenever a program is started, it has a list of zero or more invocation arguments, supplied by whatever program is starting it. Often this is the shell, which makes up the list depending upon what you type on the command line. But we’ll see in a later chapter that you can invoke a program with pretty much any strings as the invocation arguments. Because they often come from the shell’s command line, they are sometimes called “command-line arguments” as well.
[8] Here’s a possibly unfamilar Unix fact: most of those standard utilities, like cat and sed use this same convention, where a hyphen stands for the standard input stream.
[9] Which may or may not include getting input from the keyboard.
[10] If it matters to you, or even if it
doesn’t, the current file’s name is kept in Perl’s
special variable $ARGV
. This name may be "-"
instead of a real filename if the input is coming from the
standard input stream, though.
[11] If you re-initialize @ARGV
before using the second diamond, then you’re on solid
ground. We’ll see @ARGV
in the next
section.
[12] C
programmers may be wondering about
argc
(there isn’t one in Perl), and what happened to the
program’s own name (that’s found in Perl’s special
variable $0
, not @ARGV
). Also,
depending upon how you’ve invoked your program, there may be a
little more happening than we say here. See the
perlrunmanpage for the full
details.
[13] If you need more than
just one or two such options, you should almost certainly use a
module to process them in a standard way. See the documentation for
the Getopt::Long
and Getopt::Std
modules, which are part of the standard
distribution.
[14] Well, it doesn’t add anything extra by default, but this default (like so many others in Perl) may be changed. Changing these defaults will likely confuse your maintenance programmer, though, so avoid doing so except in small, quick-and-dirty programs, or (rarely) in a small section of a normal program. See the perlvarmanpage to learn about changing the defaults.
[15] Yes, the spaces are another default; see the perlvarmanpage again.
[16] You know that we mean a three-element list here, right? This is just Perl notation.
[17] In fact, there was even an endeavor started, called the PPT (Perl Power Tools) project, whose goal is to implement all of the classic Unix utilities in Perl. They actually completed nearly all the utilities (and most of the games!), but got bogged down when they got to reimplementing the shell. The PPT project has been helpful because it has made these standard utilities available on many non-Unix machines.
[18] We say “immediately” here because Perl won’t permit a newline character between the function name and the opening parenthesis in this kind of function call. If there is a newline there, Perl sees your code as making a list operator, rather than a function call. This is the kind of piddling technical detail that we mention only for completeness. If you’re terminally curious, see the full story in the manpages.
[19] Functions that take zero or one arguments don’t suffer from this problem.
[20] Here, we’re using “format” in the generic sense. Perl has a report-generating feature called “formats” that we won’t even be mentioning (except in this footnote) until Appendix B, and then only to say that we really aren’t going to talk about them. So, you’re on your own there. Just wanted to keep you from getting lost.
[21] “General” numeric conversion. Or maybe a “Good conversion for this number” or “Guess what I want the output to look like.”
[22] There’s also %x
for hexadecimal and
%o
for octal, if you need those. But we really say
“decimal” here as a memory aid: %d
for
Decimal integer.
[23] Maybe you thought you could
simply put a backslash in front of the percent sign. Nice try, but
no. The reason that won’t work is that the format is an
expression, and the expression "\%"
means the one-character string '%'
. Even
if we got a backslash into the format string, printf
wouldn’t know what to do with it. Besides, C
programmers are used to printf
working like
this.