6. I/O Basics

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 6. I/O Basics

We’ve already seen how to do some input/output (I/O), in order to make some of the earlier exercises possible. But now we’ll learn a little more about those operations. As the title of this chapter implies, there will be more about Perl’s I/O operations in Chapter 11.

Input from Standard Input

Reading from the standard input stream is easy.^[1] We’ve been doing it already with the <STDIN> operator.^[2] Evaluating this operator in a scalar context gives you the next line of input:

$line = <STDIN>;                # read the next line
chomp($line);                   # and chomp it

chomp($line = <STDIN>);         # same thing, more idiomatically

Since the line-input operator will return undef when you reach end-of-file, this is handy for dropping out of loops:

while (defined($line = <STDIN>)) {
  print "I saw $line";
}

There’s a lot going on in that first line: we’re reading the input into a variable, checking that it’s defined, and if it is (meaning that we haven’t reached the end of the input) we’re running the body of the while loop. So, inside the body of the loop, we’ll see each line, one after another, in $line.^[3] This is something you’ll want to do fairly often, so naturally Perl has a shortcut for it. The shortcut looks like this:

while (<STDIN>) {
  print "I saw $_";
}

Now, to make this shortcut, Larry chose some useless syntax. That is, this is literally saying, “Read a line of input, and see if it’s true. (Normally it is.) And if it is true, enter the while loop, but throw away that line of input!" Larry knew that it was a useless thing to do; nobody should ever need to do that in a real Perl program. So, Larry took this useless syntax and made it useful.

What this is actually saying is that Perl should do the same thing as we saw in our earlier loop: it tells Perl to read the input into a variable, and (as long as the result was defined, so we haven’t reached end-of file) then enter the while loop. However, instead of storing the input into $line, Perl will use its favorite default variable, $_, just as if you had written this:

while (defined($_ = <STDIN>)) {
  print "I saw $_";
}

Now, before we go any further, we must be very clear about something: this shortcut works only if you write it just as we did. If you put a line-input operator anywhere else (in particular, as a statement all on its own) it won’t read a line into $_ by default. It works only if there’s nothing but the line-input operator in the conditional of a while loop.^[4] If you put anything else into the conditional expression, this shortcut won’t apply.

There’s no connection between the line-input operator (<STDIN> ) and Perl’s favorite default variable ($_). In this case, though, it just happens that the input is being stored in that variable.

On the other hand, evaluating the line-input operator in a list context gives you all of the (remaining) lines of input as a list—each element of the list is one line:

foreach (<STDIN>) {
  print "I saw $_";
}

Once again, there’s no connection between the line-input operator and Perl’s favorite default variable. In this case, though, the default control variable for foreach is $_. So in this loop, we’ll see each line of input in $_, one after the other.

That may sound familiar, and for good reason: That’s the same behavior as the while loop would do. Isn’t it?

The difference is under the hood. In the while loop, Perl reads a line of input, puts it into a variable, and runs the body of the loop. Then, it goes back to find another line of input. But in the foreach loop, the line-input operator is being used in a list context (since foreach needs a list to iterate through). So it has to read all of the input before the loop can start running. That difference will become apparent when the input is coming from your 400MB web server log file! It’s generally best to use code like the while loop’s shortcut, which will process input a line at a time, whenever possible.

Input from the Diamond Operator

Another way to read input is with the diamond^[5] operator: <>. This is useful for making programs that work like standard Unix^[6] utilities, with respect to the invocation arguments (which we’ll see in a moment). If you want to make a Perl program that can be used like the utilities cat, sed, awk, sort, grep, lpr, and many others, the diamond operator will be your friend. If you want to make anything else, the diamond operator probably won’t help.

The invocation arguments to a program are normally a number of “words” on the command line after the name of the program.^[7] In this case, they give the names of a number of files to be processed in sequence:

$ ./my_program fred barney betty

That command means to run the command my_program (which will be found in the current directory), and that it should process file fred, followed by file barney, followed by file betty.

If you give no invocation arguments, the program should process the standard input stream. Or, as a special case, if you give just a hyphen as one of the arguments, that means standard input as well.^[8] So, if the invocation arguments had been fred - betty, that would have meant that the program should process file fred, followed by the standard input stream, followed by file betty.

The benefit of making your programs work like this is that you may choose where the program gets its input at run time; for example, you won’t have to rewrite the program to use it in a pipeline (which we’ll discuss more later). Larry put this feature into Perl because he wanted to make it easy for you to write your own programs that work like standard Unix utilities—even on non-Unix machines. Actually, he did it so he could make his own programs work like standard Unix utilities; since some vendors’ utilities don’t work just like others', Larry could make his own utilities, deploy them on a number of machines, and know that they’d all have the same behavior. Of course, this meant porting Perl to every machine he could find.

The diamond operator is actually a special kind of line-input operator. But instead of getting the input from the keyboard, it comes from the user’s choice of input:^[9]

while (defined($line = <>)) {
  chomp($line);
  print "It was $line that I saw!
";
}

So, if we run this program with the invocation arguments fred, barney, and betty, it will say something like: “It was [a line from file fred] that I saw!”, “It was [another line from file fred] that I saw!”, on and on until it reaches the end of file fred. Then, it will automatically go on to file barney, printing out one line after another, and then on to file betty. Note that there’s no break when we go from one file to another; when you use the diamond, it’s as if the input files have been merged into one big file.^[10] The diamond will return undef (and we’ll drop out of the while loop) only at the end of all of the input.

In fact, since this is just a special kind of line-input operator, we may use the same shortcut we saw earlier, to read the input into $_ by default:

while (<>) {
  chomp;
  print "It was $_ that I saw!
";
}

This works like the loop above, but with less typing. And you may have noticed that we’re using the default for chomp; without an argument, chomp will work on $_. Every little bit of saved typing helps!

Since the diamond operator is generally being used to process all of the input, it’s typically a mistake to use it in more than one place in your program. If you find yourself putting two diamonds into the same program, especially using the second diamond inside the while loop that is reading from the first one, it’s almost certainly not going to do what you would like.^[11] In our experience, when beginners put a second diamond into a program, they meant to use $_ instead. Remember, the diamond operator reads the input, but the input itself is (generally, by default) found in $_.

If the diamond operator can’t open one of the files and read from it, it’ll print an allegedly helpful diagnostic message, such as:

can't open wimla: No such file or directory

The diamond operator will then go on to the next file automatically, much like what you’d expect from cat or another standard utility.

The Invocation Arguments

Technically, the diamond operator isn’t looking literally at the invocation arguments—it works from the @ARGV array. This array is a special array that is preset by the Perl interpreter to be a list of the invocation arguments. In other words, this is just like any other array, (except for its funny, all-caps name), but when your program starts, @ARGV is already stuffed full of the list of invocation arguments.^[12]

You can use @ARGV just like any other array; you could shift items off of it, perhaps, or use foreach to iterate over it. You could even check to see if any arguments start with a hyphen, so that you could process them as invocation options (like Perl does with its own -w option).^[13]

This is how the diamond operator knows what filenames it should use: it looks in @ARGV. If it finds an empty list, it uses the standard input stream; otherwise it uses the list of files that it finds. This means that after your program starts and before you start using the diamond, you’ve got a chance to tinker with @ARGV. For example, here we can process three specific files, regardless of what the user chose on the command line:

@ARGV = qw# larry moe curly #;  # force these three files to be read
while (<>) {
  chomp;
  print "It was $_ that I saw in some stooge-like file!
";
}

In Chapter 11, we’ll see how to open and close specific filenames at specific times. But this technique will suffice for the next few chapters.

Output to Standard Output

The print operator takes a list of values and sends each item (as a string, of course) to standard output in turn, one after another. It doesn’t add any extra characters before, after, or in between the items;^[14] if you want spaces between items and a newline at the end, you have to say so:

$name = "Larry Wall";
print "Hello there, $name, did you know that 3+4 is ", 3+4, "?
";

Of course, that means that there’s a difference between printing an array and interpolating an array:

print @array;     # print a list of items
print "@array";   # print a string (containing an interpolated array)

That first print statement will print a list of items, one after another, with no spaces in between. The second one will print exactly one item, which is the string you get by interpolating @array into the empty string—that is, it prints the contents of @array, separated by spaces.^[15] So, if @array holds qw/ fred barney betty /,^[16] the first one prints fredbarneybetty, while the second prints fred barney betty separated by spaces.

But before you decide to always use the second form, imagine that @array is a list of unchomped lines of input. That is, imagine that each of its strings has a trailing newline character. Now, the first print statement prints fred, barney, and betty on three separate lines. But the second one prints this:

fred
 barney
 betty

Do you see where the spaces come from? Perl is interpolating an array, so it puts spaces between the elements. So, we get the first element of the array (fred and a newline character), then a space, then the next element of the array (barney and a newline character), then a space, then the last element of the array (betty and a newline character). The result is that the lines seem to have become indented, except for the first one. Every week or two, a message appears on the newsgroup comp.lang.perl.misc with a subject line something like:

Perl indents everything after the first line

Without even reading the message, we can immediately see that the program used double quotes around an array containing unchomped strings. “Did you perhaps put an array of unchomped strings inside double quotes?” we ask, and the answer is always yes.

Generally, if your strings contain newlines, you simply want to print them, after all:

print @array;

But if they don’t contain newlines, you’ll generally want to add one at the end:

print "@array
";

So, if you’re using the quote marks, you’ll be (generally) adding the at the end of the string anyway; this should help you to remember which is which.

It’s normal for your program’s output to be buffered . That is, instead of sending out every little bit of output at once, it’ll be saved until there’s enough to bother with. That’s because if (for example) the output were going to be saved on disk, it would be (relatively) slow and inefficient to spin the disk every time that one or two characters need to be added to the file. Generally, then, the output will go into a buffer that is flushed (that is, actually written to disk, or wherever) only when the buffer gets full, or when the output is otherwise finished (such as at the end of runtime). Usually, that’s what you want.

But if you (or a program) may be waiting impatiently for the output, you may wish to take that performance hit and flush the output buffer each time you print. See the Perl manpages for more information on controlling buffering in that case.

Since print is looking for a list of strings to print, its arguments are evaluated in list context. Since the diamond operator (as a special kind of line-input operator) will return a list of lines in a list context, these can work well together:

print <>;          # source code for 'cat'

print sort <>;  # source code for 'sort'

Well, to be fair, the standard Unix commands cat and sort do have some additional functionality that these replacements lack. But you can’t beat them for the price! You can now re-implement all of your standard Unix utilities in Perl, and painlessly port them to any machine that has Perl, whether that machine is running Unix or not. And you can be sure that the programs on every different type of machine will nevertheless have the same behavior.^[17]

What might not be obvious is that print has optional parentheses, which can sometimes cause confusion. Remember the rule that parentheses in Perl may always be omitted, except when doing so would change the meaning of a statement. So, here are two ways to print the same thing:

print("Hello, world!
");
print "Hello, world!
";

So far, so good. But another rule in Perl is that if the invocation of print looks like a function call, then it is a function call. It’s a simple rule, but what does it mean for something to look like a function call?

In a function call, there’s a function name immediately^[18] followed by parentheses around the function’s arguments, like this:

print (2+3);

That looks like a function call, so it is a function call. It prints 5, but then it returns a value like any other function. The return value of print is a true or false value, indicating the success of the print. It nearly always succeeds, unless you get some I/O error, so the $result in the following statement will normally be 1:

$result = print("hello world!
");

But what if you used the result in some other way? Let’s suppose you decide to multiply the return value times four:

print (2+3)*4;  # Oops!

When Perl sees this line of code, it prints 5, just as you asked. Then it takes the return value from print, which is 1, and multiplies that times 4. It then throws away the product, wondering why you didn’t tell it to do something else with it. And at this point, someone looking over your shoulder says, “Hey, Perl can’t do math! That should have printed 20, rather than 5!”

This is the problem with allowing the parentheses to be optional; sometimes we humans forget where the parentheses really belong. When there are no parentheses, print is a list operator, printing all of the items in the following list; that’s generally what you’d expect. But when the first thing after print is a left parenthesis, print is a function call, and it will print only what’s found inside the parentheses. Since that line had parentheses, it’s the same to Perl as if you’d said this:

( print(2+3) ) * 4;  # Oops!

Fortunately, Perl itself can almost always help you with this, if you ask for warnings—so use -w, at least during program development and debugging.

Actually, this rule—“If it looks like a function call, it is a function call”—applies to all list functions^[19] in Perl, not just to print. It’s just that you’re most likely to notice it with print. If print (or another function name) is followed by an open parenthesis, make sure that the corresponding close parenthesis comes after all of the arguments to that function.

Formatted Output with printf

You may wish to have a little more control with your output than print provides. In fact, you may be accustomed to the formatted output of C’s printf function. Fear not—Perl provides a comparable operation with the same name.

The printf operator takes a format string followed by a list of things to print. The format^[20] string is a fill-in-the-blanks template showing the desired form of the output:

printf "Hello, %s; your password expires in %d days!
",
  $user, $days_to_die;

The format string holds a number of so-called conversions ; each conversion begins with a percent sign (%) and ends with a letter. (As we’ll see in a moment, there may be significant extra characters between these two symbols.) There should be the same number of items in the following list as there are conversions; if these don’t match up, it won’t work correctly. In the example above, there are two items and two conversions, so the output might look something like this:

Hello, merlyn; your password expires in 3 days!

There are many possible printf conversions, so we’ll take time here to describe just the most common ones. Of course, the full details are available in the perlfunc manpage.

To print a number in what’s generally a good way, use %g ,^[21] which automatically chooses floating-point, integer, or even exponential notation as needed:

printf "%g %g %g
", 5/2, 51/17, 51 ** 17;  # 2.5 3 1.0683e+29

The %d format means a decimal^[22] integer, truncated as needed:

printf "in %d days!
", 17.85;  # in 17 days!

Note that this is truncated, not rounded; we’ll see how to round off a number in a moment.

In Perl, printf is most often used for columnar data, since most formats accept a field width. If the data won’t fit, the field will generally be expanded as needed:

printf "%6d
", 42;  # output like ````42 (the ` symbol stands for a space)
printf "%2d
", 2e3 + 1.95;  # 2001

The %s conversion means a string, so it effectively interpolates the given value as a string, but with a given field width:

printf "%10s
", "wilma";  # looks like `````wilma

A negative field width is left-justified (in any of these conversions):

printf "%-15s
", "flintstone";  # looks like flintstone `````

The %f conversion (floating-point) rounds off its output as needed, and even lets you request a certain number of digits after the decimal point:

printf "%12f
", 6 * 7 + 2/3;    # looks like ```42.666667
printf "%12.3f
", 6 * 7 + 2/3;  # looks like ``````42.667
printf "%12.0f
", 6 * 7 + 2/3;  # looks like `
            `
            `
            `
            `
            `
            `
            `
            `
            `43

To print a real percent sign, use %%, which is special in that it uses no element from the list:^[23]

printf "Monthly interest rate: %.2f%%
",
  5.25/12;  # the value looks like "0.44%"

Arrays and printf

Generally, you won’t use an array as an argument to printf. That’s because an array may hold any number of items, and a given format string will work with only a certain fixed number of items: if there are three conversions in the format, there must be exactly three items.

But there’s no reason you can’t whip up a format string on the fly, since it may be any expression. This can be tricky to get right, though, so it may be handy (especially when debugging) to store the format into a variable:

my @items = qw( wilma dino pebbles );
my $format = "The items are:
" . ("%10s
" x @items);
## print "the format is <<$format>>
"; # for debugging
printf $format, @items;

This uses the x operator (which we learned about in Chapter 2) to replicate the given string a number of times given by @items (which is being used in a scalar context). In this case, that’s 3, since there are three items, so the resulting format string is the same as if we had written it as "The items are: %10s %10s %10s." And the output prints each item on its own line, right-justified in a ten-character column, under a heading line. Pretty cool, huh? But not cool enough, because you can even combine these:

printf "The items are:
".("%10s
" x @items), @items;

Note that here we have @items being used once in a scalar context, to get its length, and once in a list context, to get its contents. Context is important.

Exercises

See Section A.5 for answers to the following exercises:

[7] Write a program that acts like cat, but reverses the order of the output lines. (Some systems have a utility like this named tac.) If you run yours as ./tac fred barney betty, the output should be all of file betty from last line to first, then barney and then fred, also from last line to first. (Be sure to use the ./ in your program’s invocation if you call it tac, so that you don’t get the system’s utility instead!)
[8] Write a program that asks the user to enter a list of strings on separate lines, printing each string in a right-justified 20-character column. To be certain that the output is in the proper columns, print a “ruler line” of digits as well. (This is simply a debugging aid.) Make sure that you’re not using a 19-character column by mistake! For example, entering hello, good-bye should give output something like this:
```
123456789012345678901234567890123456789012345678901234567890
               hello
            good-bye
```
[8] Modify the previous program to let the user choose the column width, so that entering 30, hello, good-bye (on separate lines) would put the strings at the 30th column. (Hint: see the section Section 2.6.1 in Chapter 2 about controlling variable interpolation.) For extra credit, make the ruler line longer when the selected width is larger.

^[1]If you’re already familiar with the workings of standard input, output, and error streams, you’re ahead of the game. If not, we’ll get you caught up when we get to Chapter 14. For now, just think of “standard input” as being “the keyboard,” and “standard output” as being “the display screen.”

^[2]What we’re calling the line-input operator here, <STDIN>, is actually a line-input operator (represented by the angle brackets) around a filehandle. We’ll learn about filehandles in Chapter 11.

^[3]You probably noticed that we never chomped that input. In this kind of a loop, you can’t really put chomp into the conditional expression, so it’s often the first item in the loop body, when it’s needed. We’ll see examples of that in the next section.

^[4]Well, okay, the conditional of a for loop is just a while conditional in disguise, so it works there, too.

^[5]The diamond operator was named by Larry’s daughter, Heidi, when Randal went over to Larry’s house one day to show off the new training materials he’d been writing, and complained that there was no spoken name for “that thing”. Larry didn’t have a name for it, either. Heidi (eight years old at the time) quickly chimed in, “That’s a diamond, Daddy.” So the name stuck. Thanks, Heidi!

^[6]But not just on Unix systems. Many other systems have adopted this way of using invocation arguments.

^[7]Whenever a program is started, it has a list of zero or more invocation arguments, supplied by whatever program is starting it. Often this is the shell, which makes up the list depending upon what you type on the command line. But we’ll see in a later chapter that you can invoke a program with pretty much any strings as the invocation arguments. Because they often come from the shell’s command line, they are sometimes called “command-line arguments” as well.

^[8]Here’s a possibly unfamilar Unix fact: most of those standard utilities, like cat and sed use this same convention, where a hyphen stands for the standard input stream.

^[9]Which may or may not include getting input from the keyboard.

^[10]If it matters to you, or even if it doesn’t, the current file’s name is kept in Perl’s special variable $ARGV . This name may be "-" instead of a real filename if the input is coming from the standard input stream, though.

^[11]If you re-initialize @ARGV before using the second diamond, then you’re on solid ground. We’ll see @ARGV in the next section.

^[12]C programmers may be wondering about argc (there isn’t one in Perl), and what happened to the program’s own name (that’s found in Perl’s special variable $0 , not @ARGV). Also, depending upon how you’ve invoked your program, there may be a little more happening than we say here. See the perlrunmanpage for the full details.

^[13]If you need more than just one or two such options, you should almost certainly use a module to process them in a standard way. See the documentation for the Getopt::Long and Getopt::Std modules, which are part of the standard distribution.

^[14]Well, it doesn’t add anything extra by default, but this default (like so many others in Perl) may be changed. Changing these defaults will likely confuse your maintenance programmer, though, so avoid doing so except in small, quick-and-dirty programs, or (rarely) in a small section of a normal program. See the perlvarmanpage to learn about changing the defaults.

^[15]Yes, the spaces are another default; see the perlvarmanpage again.

^[16]You know that we mean a three-element list here, right? This is just Perl notation.

^[17]In fact, there was even an endeavor started, called the PPT (Perl Power Tools) project, whose goal is to implement all of the classic Unix utilities in Perl. They actually completed nearly all the utilities (and most of the games!), but got bogged down when they got to reimplementing the shell. The PPT project has been helpful because it has made these standard utilities available on many non-Unix machines.

^[18]We say “immediately” here because Perl won’t permit a newline character between the function name and the opening parenthesis in this kind of function call. If there is a newline there, Perl sees your code as making a list operator, rather than a function call. This is the kind of piddling technical detail that we mention only for completeness. If you’re terminally curious, see the full story in the manpages.

^[19]Functions that take zero or one arguments don’t suffer from this problem.

^[20]Here, we’re using “format” in the generic sense. Perl has a report-generating feature called “formats” that we won’t even be mentioning (except in this footnote) until Appendix B, and then only to say that we really aren’t going to talk about them. So, you’re on your own there. Just wanted to keep you from getting lost.

^[21]“General” numeric conversion. Or maybe a “Good conversion for this number” or “Guess what I want the output to look like.”

^[22]There’s also %x for hexadecimal and %o for octal, if you need those. But we really say “decimal” here as a memory aid: %d for Decimal integer.

^[23]Maybe you thought you could simply put a backslash in front of the percent sign. Nice try, but no. The reason that won’t work is that the format is an expression, and the expression "\%" means the one-character string '%'. Even if we got a backslash into the format string, printfwouldn’t know what to do with it. Besides, C programmers are used to printf working like this.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 6. I/O Basics

Create new playlist

Sign In

Sign Up