A filehandle is the name in a Perl program for an I/O connection between your Perl process and the outside world. That is, it’s the name of a connection, not necessarily the name of a file.
Filehandles are named like other Perl identifiers (letters, digits, and underscores, but they can’t start with a digit), but since they don’t have any prefix character, they might be confused with present or future reserved words, as we saw with labels. Once again, as with labels, the recommendation from Larry is that you use all uppercase letters in the name of your filehandle—not only will it stand out better, but it will also guarantee that your program won’t fail when a future (lowercase) reserved word is introduced.
But there are also six special filehandle names that Perl already
uses for its own purposes: STDIN
,
STDOUT
, STDERR
,
DATA
, ARGV
, and
ARGVOUT
.[1] Although you may choose any filehandle name you’d
like, you shouldn’t choose one of those six unless you intend
to use that one’s special properties.[2]
Maybe you recognized some of those names already. When your program
starts, STDIN
is the filehandle naming the
connection between the Perl process and wherever the program should
get its input, known as the standard input
stream
. This is generally the user’s
keyboard unless the user asked for something else to be the source of
input, such as reading the input from a file or reading the output of
another program through a pipe.[3]
There’s also the standard output
stream
, which is
STDOUT
. By default, this one goes to the
user’s display screen, but the user may send the output to a
file or to another program, as we’ll see shortly. These
standard streams come to us from the
Unix “standard I/O” library,
but they work in much the same way on most modern operating
systems.[4] The general idea is that your program
should blindly read from STDIN
and blindly write
to STDOUT
, trusting in the user (or generally
whichever program is starting your program) to have set those up. In
that way, the user can type a command like this one at the shell
prompt:
$ ./your_program <dino >wilma
That command tells the shell that the program’s input should be
read from the file dino, and the output should
go to the file wilma. As long as the program
blindly reads its input from STDIN
, processes it
(in whatever way we need), and blindly writes its output to
STDOUT
, this will work just fine.
And at no extra charge, the program will work in a pipeline. This is another concept from Unix, which lets us write command lines like this one:
$ cat fred barney | sort | ./your_program | grep something | lpr
Now, if you’re not familiar with these Unix commands, that’s okay. This line says that the cat command should print out all of the lines of file fred followed by all of the lines of file barney. Then that output should be the input of the sort command, which sorts those lines and passes them on to your_program. After it has done its processing, your_program will send the data on to grep, which discards certain lines in the data, sending the others on to the lpr command, which should print everything that it gets on a printer. Whew!
But pipelines like that are common in Unix and many other systems today because they let you put together a powerful, complex command out of simple, standard building blocks.
There’s one more standard I/O stream. If (in the previous
example) your_program had to emit any warnings or
other diagnostic messages, those shouldn’t go down the
pipeline. The grep command is set to discard
anything that it hasn’t specifically been told to look for, and
so it will most likely discard the warnings. Even if it did keep the
warnings, we probably don’t want those to be passed downstream
to the other programs in the pipeline. So that’s why
there’s also the standard error
stream
:
STDERR
. Even if the standard output is
going to another program or file, the errors will go to wherever the
user desires. By default, the errors will generally go to the
user’s display screen,[5] but the
user may send the errors to a file with a shell command like this
one:
$ netstat | ./your_program 2>/tmp/my_errors
So we see that Perl provides three
filehandles—STDIN
,
STDOUT
, and STDERR
—which
are automatically open to files or devices established by the
program’s parent process (probably the shell). When you need
other filehandles, use the open
operator to tell
Perl to ask the operating system to open the connection between your
program and the outside world. Here are some examples:
open CONFIG, "dino"; open CONFIG, "<dino"; open BEDROCK, ">fred"; open LOG, ">>logfile";
The first one opens a filehandle called CONFIG
to
a file called dino. That is, the (existing) file
dino will be opened and whatever it holds will
come into our program through the filehandle named
CONFIG
. This is similar to the way that data from
a file could come in through STDIN
if the command
line had a shell redirection like <dino
. In
fact, the second example uses exactly that sequence. The second does
the same as the first, but the less-than sign explicitly says
“this filename is to be used for input,” even though
that’s the default.[6]
Although you don’t have to use the
less-than sign to
open a file for input, we include that because, as you can see in the
third example, a
greater-than sign
means to create a new file for output. This opens the filehandle
BEDROCK
for output to the new file
fred. Just as when the greater-then sign is used
in shell redirection, we’re sending the output to a
new file called fred. If
there’s already a file of that name, we’re asking to wipe
it out and replace it with this new one.
The fourth example shows how two greater-than signs may be used
(again, as the shell does) to open a file for appending. That is, if
the file already exists, we will add new data at the end. If it
doesn’t exist, it will be created in much the same way as if we
had used just one greater-than sign. This is handy for
log files; your program could write a few
lines to the end of a log file each time it’s run. So
that’s why the fourth example names the filehandle
LOG
and the file logfile.
You can use any scalar expression in place of the filename specifier, although typically you’ll want to be explicit about the direction specification:
my $selected_output = "my_output"; open LOG, "> $selected_output";
Note the
space after the greater-than. Perl ignores
this,[7] but it keeps
unexpected things from happening if
$selected_output
were
">passwd"
for example (which would make an
append instead of a write).
We’ll see how to use these filehandles later in this chapter.
When you are finished with a filehandle, you may close it with the
close
operator like this:
close BEDROCK;
Closing a filehandle tells Perl to inform the operating system that we’re all done with the given data stream, so any last output data should be written to disk in case someone is waiting for it.[8]
Perl will automatically close a filehandle if you reopen it (that is,
if you reuse the filehandle name in a new open
)
or if you exit the program.[9] Because of this, many
simple Perl programs don’t bother with
close
. But it’s there if you want to be
tidy, with one close
for every
open
. In general, it’s best to close each
filehandle soon after you’re done with it, though the end of
the program often arrives soon enough.[10]
Perl can’t actually open a file all by itself. Like any other programming language, Perl can merely ask the operating system to let us open a file. Of course, the operating system may refuse, because of permission settings, an incorrect filename, or other reasons.
If you try to read from a bad filehandle (that is, a filehandle that
isn’t properly open), you’ll see an immediate
end-of-file. (With the I/O
methods we’ll see in this chapter, end-of-file will be
indicated by undef
in a scalar context or an empty
list in a list context.) If you try to write to a bad filehandle, the
data is silently discarded.
Fortunately, these dire consequences are easy to avoid. First of all,
if we ask for warnings with
-w
, Perl will generally be able to tell us with a
warning when it sees that we’re using a bad filehandle. But
even before that, open
always tells us if it
succeeded or failed, by returning true for success or false for
failure. So you could write code like this:
my $success = open LOG, ">>logfile"; # capture the return value unless ($success) { # The open failed ... }
Well, you could do it like that, but there’s another way that we’ll see in the next section.
Let’s step aside for a moment. We need some stuff that isn’t directly related to (or limited to) filehandles, but is more about getting out of a program earlier than normal.
When a fatal error happens inside Perl (for example, if you divide by
zero, use an invalid regular expression, or call a subroutine that
hasn’t been declared) your program stops with an error message
telling why.[11]
But this functionality is available to us with the
die
function, so we can make our own fatal
errors.
The die
function prints out the
message you give it (to the standard
error stream, where such messages should go) and makes sure that your
program exits with a nonzero exit status.
You may not have known it, but every program that runs on Unix (and many other modern operating systems) has an exit status, telling whether it was successful or not. Programs that run other programs (like the make utility program) look at that exit status to see that everything is running correctly. The exit status is just a single byte, so it can’t say much; traditionally, it is zero for success and a nonzero value for failure. Perhaps one means a syntax error in the command arguments, while two means that something went wrong during processing and three means the configuration file couldn’t be found; the details differ from one command to the next. But zero always means that everything worked. When the exit status shows failure, a program like make knows not to go on to the next step.
So we could rewrite the previous example, perhaps something like this:
unless (open LOG, ">>logfile") { die "Cannot create logfile: $!"; }
If the open
fails, die
will
terminate the program and tell us that it cannot create the logfile.
But what’s that $!
in the message?
That’s the human-readable complaint from the system. In
general, when the system refuses to do something we’ve
requested (like opening a file), it will give us a reason (perhaps
“permission denied” or “file not found,” in
this case). This is the string that you may have obtained with
perror
in C or a similar language. This
human-readable complaint message will be available in Perl’s
special variable $!
.[12] It’s a good idea to include
$!
in the message when it could help the
user to figure out what he or she did wrong. But if you use
die
to indicate an error that is not the failure
of a system request, don’t include $!
, since
it will generally hold an unrelated message left over from something
Perl did internally. It will hold a useful value only immediately
after a failed system request. A successful
request won’t leave anything useful there.
There’s one more thing that die
will do
for you: it will automatically append the Perl
program name and line number[13] to
the end of the message, so you can easily identify which
die
in your program is responsible for the
untimely exit. The error message from the previous code might look
like this, if $!
contained the message
permission denied
:
Cannot create logfile: permission denied at your_program line 1234.
That’s pretty helpful—in fact, we always seem to want
more information in our error messages than we put in the first time
around. If you don’t want the line number and file revealed,
make sure that the dying words have a newline on the end. That is,
another way you could use die
is in a line like
this, with a trailing
newline:
die "Not enough arguments " if @ARGV < 2;
If there aren’t at least two command-line arguments, that program will say so and quit. It won’t include the program name and line number, since the line number is of no use to the user; this is the user’s error, after all. As a rule of thumb, put the newline on messages that indicate a usage error and leave it off when it the error might be something you want to track down during debugging.[14]
When opening a file fails, though, there’s an easier and more
common way instead of the unless
block:
open LOG, ">>logfile" or die "Cannot create logfile: $!";
This uses the low-precedence short-circuit
or
operator that we saw
in Chapter 10. If the open
succeeds, it returns true, and the or
is done. If
the open
fails, it returns false, and the
short-circuit or
goes on to the right side and
dies with the message. You can read this as if it were English:
“Open this file, or die!” It may not be the battle cry
that will win a war, but it’s a good way to write code.
You should always check the return value of
open
, since the rest of the program is relying
upon its success. That’s why we say that this is really the
only way to write open
—with or
die
after it.[15] Until you’re ready to be
extra tricky, you should simply think of this as the syntax for
open
. Typing or die
and a
message takes only a moment when you’re writing the program,
but it can save hours, or possibly days of debugging time when
something goes wrong.
Just as die
can indicate a fatal error that acts
like one of Perl’s builtin errors (like dividing by zero), you
can use the warn
function to cause a warning that acts
like one of Perl’s builtin warnings (like using an
undef
value as if it were defined, when warnings
are enabled).
The warn
function works just like
die
does, except for that last step—it
doesn’t actually quit the program. But it adds the
program name and line number if needed, and it prints the
message to standard error, just as die
would.[16]
And having talked about death and dire warnings, we now return you to your regularly scheduled filehandle instructional material. Read on.
Once a
filehandle
is open for reading, you can read lines from it just like you can
read from standard input with STDIN
. So, for
example, to read lines from the Unix password file:
open PASSWD, "/etc/passwd" or die "How did you get logged in? ($!)"; while (<PASSWD>) { chomp; if (/^root:/) { # found root entry... ...; } }
In this example, the die
message uses
parentheses around $!
. Those are merely
parentheses around the message in the output. (Sometimes a
punctuation mark is just a punctuation mark.) As you can see, what
we’ve been calling the "line-input operator” is really
made of two components; the angle brackets (the
real line-input operator) are around an input
filehandle. Each line of input is then tested to see if it begins
with root
followed by a colon, triggering unseen
actions.
A filehandle open for writing or appending may be used with
print
or printf
,
appearing immediately after the keyword but before the list of
arguments:
print LOG "Captain's log, stardate 3.14159 "; # output goes to LOG printf STDERR "%d percent complete. ", $done/$total * 100;
Did you notice that there’s no comma between the filehandle and the items to be printed?[17] This looks especially weird if you use parentheses. Either of these forms is correct:
printf (STDERR "%d percent complete. ", $done/$total * 100); printf STDERR ("%d percent complete. ", $done/$total * 100);
By default, if you don’t give a
filehandle to
print
(or to printf
, as
everything we say here about one applies equally well to the other),
the output will go to
STDOUT
. But that default may be changed
with the select
operator. Here we’ll send some
output lines to BEDROCK
:
select BEDROCK; print "I hope Mr. Slate doesn't find out about this. "; print "Wilma! ";
Once you’ve selected a filehandle as the default for output, it
will stay that way. But it’s generally a bad idea to confuse
the rest of the program, so you should generally set it back to
STDOUT
when you’re done.[18]
Also by default, the output to each filehandle is buffered. Setting
the special $|
variable to 1
will set the currently selected filehandle (that is, the one selected
at the time that the variable is modified) to always flush the
buffer after each output operation. So if
you wanted to be sure that the logfile gets its entries at
once, in case you might be reading the log to monitor progress of
your long-running program, you could use something like this:
select LOG; $| = 1; # don't keep LOG entries sitting in the buffer select STDOUT; # ... time passes, babies learn to walk, tectonic plates shift, and then... print LOG "This gets written to the LOG at once! ";
We mentioned earlier that if you were to reopen a
filehandle
(that is, if you were to open a filehandle FRED
when you’ve already got an open filehandle named
FRED
, say), the old one would be closed for you
automatically. And we said that you shouldn’t reuse one of the
six standard filehandle names unless you intended to get that
one’s special features. And we also said that the messages from
die
and warn
, along with
Perl’s internally generated complaints, go automatically to
STDERR
. If you put those three pieces of
information together, you now have an idea about how you could send
error messages to a file, rather than to
your program’s standard error stream:[19]
# Send errors to my private error log open STDERR, ">>/home/barney/.error_log" or die "Can't open error log for append: $!";
After reopening STDERR
, any error messages from
Perl will go into the new file. But what happens if the or
die
part is executed—where will
that message go, if the new file couldn’t
be opened to accept the messages?
The answer is that if one of the three system
filehandles—STDIN
,
STDOUT
, or STDERR
—fails
to be reopened, Perl kindly restores the original one.[20] That is, Perl closes the
original one (of those three) only when it sees that opening the new
connection is successful. Thus, this technique could be used to
redirect any (or all) of those three system filehandles from inside
your program,[21] almost
as if the program had been run with that I/O redirection from the
shell in the first place.
Now you know how to open a filehandle for output. Normally, that will create a new file, wiping out any existing file with the same name. Perhaps you want to check that there isn’t a file by that name. Perhaps you need to know how old a given file is. Or perhaps you want to go through a list of files to find which ones are larger than a certain number of bytes and not accessed for a certain amount of time. Perl has a complete set of tests you can use to find out information about files.
Let’s try that first example, where we need to check that a
given file doesn’t exist, so that we don’t accidentally
overwrite a vital spreadsheet data file, or that important birthday
calendar. For this, we need the -e
file test,
testing for existence:
die "Oops! A file called '$filename' already exists. " if -e $filename;
Notice that we don’t include $!
in this
die
message, since we’re not reporting
that the system refused a request in this case. Here’s an
example of checking whether a file is being kept up-to-date.
Let’s say that our program’s configuration file should be
updated every week or two. (Maybe it’s checking for computer
viruses, say.) If the file hasn’t been modified in the past 28
days, then something is wrong:
warn "Config file is looking pretty old! " if -M CONFIG > 28;
The third example is more complex. Here, let’s say that disk space is filling up and rather than buy more disks, we’ve decided to move any large, useless files to the backup tapes. So let’s go through our list of files[22] to see which of them are larger than 100 K. But even if a file is large, we shouldn’t move it to the backup tapes unless it hasn’t been accessed in the last 90 days (so we know that it’s not used too often):[23]
my @original_files = qw/ fred barney betty wilma pebbles dino bamm-bamm /; my @big_old_files; # The ones we want to put on backup tapes foreach my $filename (@original_files) { push @big_old_files, $filename if -s $filename > 100_000 and -A $filename > 90; }
This is the first time that we’ve seen it, so maybe you noticed
that the control variable of the foreach
loop is a
my
variable. That declares it to have the scope of
the loop itself, so this example should work under use
strict
. Without the my
keyword, this
would be using the global $filename
.
The file tests all look like a hyphen and a letter, which is the name of the test, followed by either a filename or a filehandle to be tested. Many of them return a true/false value, but several give something more interesting. See Table 11-1 for the complete list, and then read the following discussion to learn more about the special cases.
The tests -r
, -w
,
-x
, and -o
tell whether the
given attribute is true for the effective user or group ID,[24] which essentially refers to the person who is “in
charge of” running the program.[25] These tests look at
the “permission bits” on the file to see what is
permitted. If your system uses Access Control Lists (ACLs), the tests
will use those as well. These tests generally tell whether the system
would try to permit something, but it
doesn’t mean that it really would be possible. For example,
-w
may be true for a file on a CD-ROM, even though
you can’t write to it, or -x
may be true on
an empty file, which can’t truly be executed.
The -s
test does return true if the file is
nonempty, but it’s a special kind of true. It’s the
length of the file, measured in bytes, which evaluates as true for a
nonzero number.
On a Unix filesystem,[26] there are
just seven types of items, represented by the seven file tests
-f
, -d
, -l
,
-S
, -p
, -b
,
and -c
. Any item should be one of those. But if
you have a symbolic link pointing to a file, that
will report true for both -f
and
-l
. So if you want to know whether something is a
symbolic link, you should generally test that first. (We’ll
learn more about symbolic links in Chapter 13.)
The age tests, -M
, -A
, and
-C
(yes, they’re uppercase), return the
number of days since the file was last modified, accessed, or had its
inode changed.[27] (The inode contains all of the information about the file
except for its contents—see the stat
system
call manpage or a good book on Unix internals for details.) This age
value is a full floating-point number, so you might get a value of
2.00001
if a file were modified two days and one
second ago. (These “days” aren’t necessarily the
same as a human would count; for example, if it’s one thirty in
the morning when you check a file modified at about an hour before
midnight, the value of -M
for this file would be
around 0.1
, even though it was modified
“yesterday.”)
When checking the age of a file, you might even get a negative value
like -1.2
, which means that the file’s
last-access timestamp is set at about thirty hours in the future! The
zero point on this timescale is the moment your program started
running,[28] so that value might mean
that a long-running program was looking at a file that had just been
accessed. Or a timestamp could be set (accidentally or intentionally)
to a time in the future.
The tests -T
and -B
take a try
at telling whether a file is text or binary. But people who know a
lot about filesystems know that there’s no bit (at least in
Unix-like operating systems) to indicate that a file is a binary or
text file—so how can Perl tell? The answer is that Perl cheats:
it opens the file, looks at the first few thousand bytes, and makes
an educated guess. If it sees a lot of null bytes, unusual control
characters, and bytes with the high bit set, then that looks like a
binary file. If there’s not much weird stuff then it looks like
text. As you might guess, it sometimes guesses wrong. If a text file
has a lot of Swedish or French words (which may have characters
represented with the high bit set, as some ISO-8859-something
variant, or perhaps even a Unicode version), it may fool Perl into
declaring it binary. So it’s not perfect, but if you just need
to separate your source code from compiled files, or HTML files from
PNGs, these tests should do the trick.
You’d think that -T
and
-B
would always disagree, since a text file
isn’t a binary and vice versa, but there are two special cases
where they’re in complete agreement. If the file doesn’t
exist, both are false, since it’s neither a text file nor a
binary. Alternatively, if the file is empty, it’s an empty text
file and an empty binary file at the same time, so they’re both
true.
The -t
file test returns true if the given
filehandle is a TTY—in short, if it’s able to be
interactive because it’s not a simple file or pipe. When
-t STDIN
returns true, it generally means that you
can interactively ask the user questions. If it’s false, your
program is probably getting input from a file or pipe, rather than a
keyboard.
Don’t worry if you don’t know what some of the other file tests mean—if you’ve never heard of them, you won’t be needing them. But if you’re curious, get a good book about programming for Unix. (On non-Unix systems, these tests all try to give results analogous to what they do on Unix. Usually you’ll be able to guess correctly what they’ll do.)
If you omit the filename or filehandle parameter to a file test (that
is, if you have just -r
or just
-s
, say), the default operand is the file named in
$_
.[29] So, to test a list of
filenames to see which ones are readable, you simply type:
foreach (@lots_of_filenames) { print "$_ is readable " if -r; # same as -r $_ }
But if you omit the parameter, be careful that whatever follows the
file test doesn’t look like it could be a
parameter. For example, if you wanted to find out the size of a file
in K rather than in bytes, you might be tempted to divide the result
of -s
by 1000
(or
1024
), like this:
# The filename is in $_ my $size_in_K = -s / 1000; # Oops!
When the Perl parser sees the slash, it doesn’t think about
division; since it’s looking for the optional operand for
-s
, it sees what looks like the start of a regular
expression in forward slashes. One simple way to prevent this kind of
confusion is to put
parentheses around the file test:
my $size_in_k = (-s) / 1024; # Uses $_ by default
Of course, it’s always safe to explicitly give a file test a parameter.
While these file tests are fine for testing various attributes
regarding a particular file or filehandle, they don’t tell the
whole story. For example, there’s no file test that returns the
number of links to a
file or the owner’s
user-ID (uid).
To get at the remaining information about a file, merely call the
stat
function, which returns pretty much
everything that the stat Unix system call returns (hopefully more
than you want to know).[30]
The operand to stat
is a filehandle, or an
expression that evaluates to a filename. The return value is either
the empty list, indicating that the stat
failed
(usually because the file doesn’t exist), or a 13-element list
of numbers, most easily described using the following list of
scalar variables:
my($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size, $atime, $mtime, $ctime, $blksize, $blocks) = stat($filename);
The names here refer to the parts of the stat structure, described in detail in the stat(2) manpage. You should probably look there for the detailed descriptions. But in short, here’s a quick summary of the important ones:
$mode
The set of permission bits for the file, and some other bits. If
you’ve ever used the Unix command ls -l to
get a detailed (long) file listing, you’ll see that each line
of output starts with something like -rwxr-xr-x
.
The nine letters and hyphens of file permissions[31] correspond to the nine least-significant bits of
$mode
, which would in this case give the octal
number 0755
. The other bits, beyond the lowest
nine, indicate other details about the file. So if you need to work
with the mode, you’ll generally want to use the bitwise
operators covered later in this chapter.
$nlink
The number of (hard) links to the file or directory. This is the
number of true names that the item has. This number is always
2
or more for directories and (usually)
1
for files. We’ll see more about this when
we talk about creating links to files in Chapter 13. In the listing from ls -l,
this is the number just after the permission-bits string.
$atime
, $mtime
, and $ctime
The three timestamps, but here they’re represented in the system’s timestamp format: a 32-bit number telling how many seconds have passed since the Epoch, an arbitrary starting point for measuring system time. On Unix systems and some others, the Epoch is the beginning of 1970 at midnight Universal Time, but the Epoch is different on some machines. There’s more information later in this chapter on turning that timestamp number into something useful.
Invoking stat
on the name of a
symbolic link
returns information on what the symbolic link points at, not
information about the symbolic link itself (unless the link just
happens to be pointing at nothing currently accessible). If you need
the (mostly useless) information about the symbolic link itself, use
lstat
rather than stat
(which returns the same information in the same order). If the
operand isn’t a symbolic link, lstat
returns the same things that stat
would.
Like the file tests, the operand of stat
or
lstat
defaults to $_
,
meaning that the underlying stat system call will be performed on the
file named by the scalar variable $_
.
When you have a
timestamp number (such as the ones
from stat
), it will typically look something like
1080630098. That’s not very useful for most humans, unless you
need to compare two timestamps by subtracting. You may need to
convert it to something human-readable, such as a string like
"Tue Mar 30 07:01:38 2004
“. Perl can
do that with the localtime
function in a scalar
context:
my $timestamp = 1080630098; my $date = localtime $timestamp;
In a list context, localtime
returns a list of
numbers, several of which may not be quite what you’d expect:
my($sec, $min, $hour, $day, $mon, $year, $wday, $yday, $isdst) = localtime $timestamp;
The $mon
is a month number, ranging from
0
to 11
, which is handy as an
index into an array of month names. The $year
is
the number of years since 1900, oddly enough, so add
1900
to get the real year number. The
$wday
ranges from 0
(for
Sunday) through 6
(for Saturday), and the
$yday
is the day-of-the-year (ranging from 0 for
January 1, through 364 or 365 for December 31).
There are two related functions that you’ll also find useful.
The gmtime
function is just the same as
localtime
, except that it returns the time in
Universal
Time (what we once called Greenwich Mean Time). If you need the
current timestamp number from the system clock, just use the
time
function. Both
localtime
and gmtime
default to
using the current time
value if you don’t
supply a parameter:
my $now = gmtime; # Get the current universal timestamp as a string
For more information on manipulating date and time information, see the information about some useful modules in Appendix B.
When you need to work with numbers
bit-by-bit, as when working with the mode bits returned by
stat
, you’ll need to use the bitwise
operators. The bitwise-and operator (&
)
reports which bits are set in the left argument
and in the right argument. For example, the
expression 10 & 12
has the value
8
. The bitwise-and needs to have a one-bit in both
operands to produce a one-bit in the result. That means that the
logical-and operation on ten (which is 1010
in
binary) and twelve (which is 1100
) gives eight
(which is 1000
, with a one-bit only where the left
operand has a one-bit and the right operand also
has a one-bit). See Figure 11-1.
The different bitwise operators and their meanings are shown in this table:
So, here’s an example of some things you could do with the
$mode
returned by stat
. The
results of these bit manipulations could be useful with
chmod
, which we’ll see in Chapter 13:
# $mode is the mode value returned from a stat of CONFIG warn "Hey, the configuration file is world-writable! " if $mode & 0002; # configuration security problem my $classical_mode = 0777 & $mode; # mask off extra high-bits my $u_plus_x = $classical_mode | 0100; # turn one bit on my $go_minus_r = $classical_mode & (~ 0044); # turn two bits off
All of the bitwise operators can work with
bitstrings, as
well as with integers. If the operands are integers, the result will
be an integer. (The integer will be at least a 32-bit integer, but
may be larger if your machine supports that. That is, if you have a
64-bit machine, ~10
may give the 64-bit result
0xFFFFFFFFFFFFFFF5
, rather than the 32-bit result
0xFFFFFFF5
.)
But if any operand of a bitwise operator is a string, Perl will
perform the operation on that bitstring. That is, "xAA" |
"x55"
will give the string "xFF"
. Note
that these values are single-byte strings; the result is a byte with
all eight bits set. Bitstrings may be arbitrarily long.
This is one of the very few places where Perl distinguishes between strings and numbers. See the perlop manpage for more information on using bitwise operators on strings.
Every time you use stat
,
lstat
, or a file test in a program, Perl has to go
out to the system to ask for a
stat
buffer on the file (that is, the return buffer from the stat system
call). That means if you want to know whether a file is both readable
and writable, you’ve essentially asked the system twice for the
same information (which isn’t likely to change in a fairly
nonhostile environment).
This looks like a waste of time,[32] and in fact, it can be avoided. Doing a file test,
stat
, or
lstat
on the special
_
filehandle (that is, the operand is nothing but a single underscore)
tells Perl to use whatever happened to be lounging around in memory
from the previous file test, stat
, or
lstat
function, rather than going out to the
operating system again. Sometimes this is dangerous: a subroutine
call can invoke stat
without your knowledge,
blowing your buffer away. But if you’re careful, you can save
yourself a few unneeded system calls, thereby making your program
considerably faster. Here’s that example of finding files to
put on the backup tapes again, using the new tricks we’ve
learned:
my @original_files = qw/ fred barney betty wilma pebbles dino bamm-bamm /; my @big_old_files; # The ones we want to put on backup tapes foreach (@original_files) { push @big_old_files, $_ if (-s) > 100_000 and -A _ > 90; # More efficient than before }
Note that we used the default of $_
for the first
test—this is no more efficient (except perhaps for the
programmer), but it gets the data from the operating system. The
second test uses the magic _
filehandle; for this
test, the data left around after getting the file’s size is
used, which is exactly what we want.
Note that testing the _
filehandle is not the same
as allowing the operand of a file test, stat
, or
lstat
to default to testing
$_
; using $_
would be a fresh
test each time on the current file named by the contents of
$_
, but using _
saves the
trouble of calling the system again. Here is another case where
similar names were chosen for radically different functions. By now,
you are probably used to it.
See Section A.10 for answers to the following exercises:
[20] Make a program which asks the user for a source file name, a
destination file name, a search pattern, and a replacement string.
(Be sure to ask the user interactively for these; don’t get
them from the command-line arguments.) Your program should read the
source file and write it out as the destination file, replacing the
search pattern with the replacement string wherever it appears. That
is, the destination file will be a modified duplicate of the source
file. Can you overwrite an existing file (not the same as the input
file)? Can you use regular expression metacharacters in the search
pattern? (That is, can you enter (fred|wilma)
flintstone
to search for either name?) Can you use the
memory variables and backslash escapes in the replacement string?
(That is, can you use uL$1E Flintstone
as the
replacement string to properly capitalize the names of Fred and
Wilma?) Don’t worry if you can’t accomplish each of these
things; it’s more important simply to see what happens when you
try.
[15] Make a program which takes a list of files named on the command
line and reports for each one whether it’s readable, writable,
executable, or doesn’t exist. (Hint: It may be helpful to have
a function which will do all of the file tests for one file at a
time.) What does it report about a file which has been
chmod‘ed to 0
? (That is,
if you’re on a Unix system, use the command chmod 0
some_file to mark that file as neither being readable,
writable, nor executable.) In most shells, use a star as the argument
to mean all of the normal files in the current directory. That is,
you could type something like ./ex11-2 *
to ask
the program for the attributes of many files at once.
[10] Make a program to identify the oldest file named on the command line and report its age in days. What does it do if the list is empty? (That is, if no files are mentioned on the command line.)
[1] Some people hate typing
in all-caps, even for a moment, and will try spelling these in
lowercase, like stdin
. Perl may even let you get
away with that from time to time, but not always. The details of when
these work and when they fail are beyond the scope of this book. But
the important thing is that programs that rely upon this kindness
will one day break, so it is best to avoid lowercase here.
[2] In some cases, you could (re-)use these names without a problem. But your maintenance programmer may think that you’re using the name for its builtin features, and thus may be confused.
[3] The defaults we speak of in this chapter for the three main I/O streams are what the Unix shells do by default. But it’s not just shells that launch programs, of course. We’ll see in Chapter 14 what happens when you launch another program from Perl.
[4] If you’re not already familiar with how your non-Unix system provides standard input and output, see the perlport manpage and the documentation for that system’s equivalent to the Unix shell (the program that runs programs based upon your keyboard input).
[5] Also, generally, errors aren’t buffered. That means that if the standard error and standard output streams are both going to the same place (such as the monitor), the errors may appear earlier than the normal output. For example, if your program prints a line of ordinary text, then tries to divide by zero, the output may show the message about dividing by zero first, and the ordinary text second.
[6] This may be important for
security reasons. As we’ll see in a moment (and in further
detail in Chapter 14), there are a number of
magical characters that may be used in filenames. If
$name
holds a user-chosen filename, simply opening
$name
will allow any of these magical characters
to come into play. This could be a convenience tn the user, or it
could be a security hole. But opening "<$name"
is much safer, since it explicitly says to open the given name for
input. Still, this doesn’t prevent all possible mischief. For
more information on different ways of opening files, especially when
security may be a concern, see the perlopentut
manpage.
[7] Yes, this means that if your filename were to have leading whitespace, that would also be ignored by Perl. See perlfunc and perlopentut if you’re worried about this.
[8] If you know much about I/O systems, you’ll know there’s more to the story. Generally, though, when a filehandle is closed, here’s what happens. If there’s input remaining in a file, it’s ignored. If there’s input remaining in a pipeline, the writing program may get a signal that the pipeline is closed. If there’s output going to a file or pipeline, the buffer is flushed (that is, pending output is sent on its way). If the filehandle had a lock, the lock is released. See your system’s I/O documentation for further details.
[9] Any exit from the program will close all filehandles, but if Perl itself breaks, pending output buffers won’t get flushed. That is to say, if you accidentally crash your program by dividing by zero, for example, Perl itself is still running. Perl will ensure that data you’ve written actually gets output in that case. But if Perl itself can’t run (because you ran out of memory or caught an unexpected signal), the last few pieces of output may not be written to disk. Usually, this isn’t a big issue.
[10] Closing a filehandle will flush any output buffers and release any locks on the file. Since someone else may be waiting for those things, a long-running program should generally close each filehandle as soon as possible. But many of our programs will take only one or two seconds to run to completion, so this may not matter. Closing a filehandle also releases possibly limited resources, so it’s more than just being tidy.
[11] Well, it does this by default, but
errors may be trapped with an eval
block, as
we’ll see in Chapter 17.
[12] On some
non-Unix operating systems, $!
may say something
like error number 7
, leaving it up to the user to
look that one up in the documentation. On Windows and VMS, the
variable $^E
may have additional diagnostic
information.
[13] If the error happened while reading from a file, the error message will include the "chunk number” (usually the line number) from the file and the name of the filehandle as well, since those are often useful in tracking down a bug.
[14] The program’s name is in
Perl’s special variable
$0
, so you may wish to include that in
the string: "$0:Not enough arguments
"
. This is
useful if the program may be used in a pipeline or shell script, for
example, where it’s not obvious which command is complaining.
$0
can be changed during the execution of the
program, however. You might also want to look into the special
_ _FILE
_ _ and _
_LINE
_ _tokens (or the
caller
function) to get the information that
is being left out by adding the newline, so you can print it in your
own choice of format.
[15] Older code may use the
higher-precedence ||
operator instead. The only
difference is the precedence, but it’s a big one! If the
open
is written without parentheses, the
higher-precedence operator will bind to the filename argument, not to
the return value—so the return value of open
isn’t being checked after all. If you use the
||
, be sure to use the parentheses. Better yet,
just use the low-precedence or
as we’ve
shown here whenever you’re writing or
die
.
[16] Warnings can’t be trapped with an
eval
block, like fatal errors can. But see the
documentation for the _ _WARN
_ _ pseudo-signal (in
the perlvar manpage) if you need to trap a
warning.
[17] If you got straight A’s in freshman English or Linguistics, when we say that this is called “indirect object syntax,” you may say “Ah, of course! I see why there’s no comma after the filehandle name—it’s an indirect object!” We didn’t get straight A’s; we don’t understand why there’s no comma; we merely omit it because Larry told us that we should omit the comma.
[18] In
the unlikely case that STDOUT
might not be the
selected filehandle, you could save and restore the filehandle, using
the technique shown in the documentation for
select
in the perlfunc
manpage. And as long as we’re sending you to that manpage, we
may as well tell you that there are actually two
builtin functions in Perl named select
,
and both covered in the perlfunc manpage. The
other select
always has four arguments, so
it’s sometimes called “four-argument
select
“.
[19] Don’t do this without a reason. It’s nearly always
better to let the user set up redirection when launching your
program, rather than have redirection hardcoded. But this is handy in
cases where your program is being run automatically by another
program (say, by a web server or a scheduling utility like
cron
or at
). Another reason
might be that your program is going to start another process
(probably with system
or exec
,
which we’ll see in Chapter 14), and you need
that process to have different I/O connections.
[20] At least, this is true if you haven’t changed
Perl’s special $^F
variable, which tells
Perl that only those three are special like this. But you’d
never change that.
[21] But don’t open
STDIN
for output or the others for input. Just
thinking about that makes our heads hurt.
[22] It’s more likely that, instead of having the list of files in an array, as our example shows, you’ll read it directly from the filesystem using a glob or directory handle, as shown in Chapter 12. Since we haven’t seen that yet, we’ll just start with the list and go from there.
[23] There’s a way to make this example more efficient, as we’ll see by the end of the chapter.
[24] The -o
and -O
tests
relate only to the user ID and not to the group ID.
[25] Note for
advanced students: the corresponding -R
,
-W
, -X
, and
-O
tests use the real user or group ID, which
becomes important if your program may be running set-ID; in that
case, it’s generally the ID of the person who requested running
it. See any good book about advanced Unix programming for a
discussion of set-ID programs.
[26] This is the case on many non-Unix filesystems, but not all of the file tests are meaningful everywhere. For example, you aren’t likely to have block special files on your non-Unix system.
[27] This information will be somewhat
different on non-Unix systems, since not all keep track of the same
times that Unix does. For example, on some systems, the ctime field
(which the -C
test looks at) is the file creation
time (which Unix doesn’t keep track of), rather than the inode
change time; see the perlport manpage.
[28] As recorded in the $^T
variable, which you could update (with a statement like $^T
= time;
) if you needed to get the ages relative to a
different starting time.
[29] The -t
file
test is an exception; since that test isn’t useful with
filenames (they’re never TTYs). By default it tests
STDIN
.
[30] On a non-Unix system, both
stat
and lstat
, as well as
the file tests, should return “the closest thing
available.” For example, a system that doesn’t have user
IDs (that is, a system that has just one “user,” in the
Unix sense) might return zero for the user and group IDs, as if the
one and only user is the system administrator. If
stat
or lstat
fails, it will
return an empty list. If the system call underlying a file test fails
(or isn’t available on the given system), that test will
generally return undef
. See the
perlport manpage for the latest about what to
expect on different systems.
[31] The
first character in that string isn’t a permission bit; it
indicates the type of entry: a hyphen for an ordinary file,
d
for directory, or l
for
symbolic link, among others. The ls command
determines this from the other bits past the least-significant nine.
[32] Because it is. Asking the system for information is relatively slow.