The files we created in the previous chapter were generally in the same place as our program. But modern operating systems let us organize files into directories, allowing us to keep our Beatles MP3s away from our important Llama book chapter sources so that we don’t accidentally send an MP3 file to the publisher. Perl lets you manipulate these directories directly, in ways that are even fairly portable from one operating system to another.
Your program runs with a “working directory,” which is
the starting point for relative pathnames. That is, if you refer to
the file fred
, that means
"fred
in the current working
directory.”
The chdir
operator changes the
working directory.
It’s just like the Unix shell’s
cd command:
chdir "/etc" or die "cannot chdir to /etc: $!";
Because this is a system request, the value of $!
will be set if an error occurs. You should normally check
$!
when a false value is returned from
chdir
, since that indicates that something has
not gone as requested.
The working directory is inherited by all processes that Perl starts (we’ll talk more about that in Chapter 14). However, the change in working directory cannot affect the process that invoked Perl, such as the shell.[1] So you can’t make a Perl program to replace your shell’s cd command.
If you omit the parameter, Perl determines your
home
directory as best as possible and attempts to set the working
directory to your home directory, similar to using the
cd command at the shell without a parameter. This
is one of the few places where omitting the parameter doesn’t
use $_
.
Some shells permit you to use a tilde-prefixed path with
cd to use another user’s home directory as a
starting point (like cd
~merlyn
). This is a function of the shell, not the
operating system, and Perl is calling the operating system directly.
Thus, a
tilde-prefix will not work with
chdir
.
Normally,
the shell expands any filename patterns on each command line into the
matching filenames. This is called globbing. For
example, if you give a filename pattern of *.pm
to
the echo
command, the shell expands this list to
a list of names that match:
$ echo *.pm
barney.pm dino.pm fred.pm wilma.pm
$
The echo command doesn’t have to know
anything about expanding *.pm
, because the shell
has already expanded it. This works even for your Perl programs:
$cat >show-args
foreach $arg (@ARGV) {
print "one arg is $arg ";
}
^D
$perl show-args *.pm
one arg is barney.pm one arg is dino.pm one arg is fred.pm one arg is wilma.pm $
Note that show-args didn’t need to know
anything about globbing—the names were already expanded in
@ARGV
.
But sometimes we end up with a pattern like *.pm
inside our Perl program. Can we expand this pattern into the matching
filenames without working very hard? Sure—just use the
glob
operator:
my @all_files = glob "*"; my @pm_files = glob "*.pm";
Here, @all_files
gets all the files in the current
directory, alphabetically sorted, and not including the files
beginning with a period, just like the shell. And
@pm_files
gets the same list as we got before by
using *.pm
on the command line.
In fact, anything you can say on the command line, you can also put
as the (single) argument to glob
, including
multiple patterns separated by spaces:
my @all_files_including_dot = glob ".* *";
Here, we’ve included an additional “dot star” parameter to get the filenames that begin with a dot as well as the ones that don’t. Please note that the space between these two items inside the quoted string is significant, as it separates two different items to be globbed.[2]
The reason this works exactly as the shell works is that prior to
Perl Version 5.6, the glob
operator simply
called /bin/csh
[3] behind the scenes to perform the expansion. Because of
this, globs were time-consuming and could break in large directories,
or in some other cases. Conscientious Perl hackers avoided globbing
in favor of directory handles, which will be discussed in
Section 12.4 later in this chapter. However, if
you’re using a modern version of Perl, you should no longer be
concerned about such things.
Although we use the term globbing freely, and we talk about the
glob
operator, you might not see the word
glob
in very many of the programs that use
globbing. Why not? Well, most legacy code was written before the
glob
operator was given a name. Instead, it was
called up by the
angle-bracket syntax, similar to reading
from a filehandle:
my @all_files = <*>; ## exactly the same as my @all_files = glob "*";
The value between the angle brackets is interpolated similarly to a double-quoted string, which means that Perl variables are expanded to their current Perl values before being globbed:
my $dir = "/etc"; my @dir_files = <$dir/* $dir/.*>;
Here, we’ve fetched all the non-dot and dot files from the
designated directory, because $dir
has been
expanded to its current value.
So, if using angle brackets means both filehandle reading and globbing, how does Perl decide which of the two operators to use? Well, a filehandle has to be a Perl identifier. So if the item between the angle brackets is strictly a Perl identifier, it’s a filehandle read; otherwise, it’s a globbing operation. For example:
my @files = <FRED/*>; ## a glob my @lines = <FRED>; ## a filehandle read my $name = "FRED"; my @files = <$name/*>; ## a glob
The one exception is if the contents are a simple scalar variable (not an element of a hash or array), then it’s an indirect filehandle read ,[4] where the variable contents give the name of the filehandle to be read:
my $name = "FRED"; my @lines = <$name>; ## an indirect filehandle read of FRED handle
Determining whether it’s a glob or a filehandle read is made at compile time, and thus it is independent of the content of the variables.
If you want, you can get the operation of an indirect filehandle read
using the readline
operator,[5] which also
makes it clearer:
my $name = "FRED"; my @lines = readline FRED; ## read from FRED my @lines = readline $name; ## read from FRED
But the readline
operator is rarely used, as indirect
filehandle reads are uncommon and are generally performed against a
simple scalar variable anyway.
Another way to get a list of names from
a given directory is with a directory handle. A
directory handle looks and acts like a filehandle. You open it (with
opendir
instead of open
),
you read from it (with
readdir
instead of
readline
), and you close it (with
closedir
instead of
close
). But instead of reading the
contents of a file, you’re reading the
names of
files (and other things)
in a directory. For example:
my $dir_to_process = "/etc"; opendir DH, $dir_to_process or die "Cannot open $dir_to_process: $!"; foreach $file (readdir DH) { print "one file in $dir_to_process is $file "; } closedir DH;
Like filehandles, directory handles are automatically closed at the end of the program or if the directory handle is reopened onto another directory.
Unlike globbing, which in older versions of Perl fired off a separate process, a directory handle never fires off another process. So it makes them more efficient for applications that demand every ounce of power from the machine. However, it’s also a lower-level operation, meaning that we have to do more of the work ourselves.
For example, the names are returned in no particular order.[6] And the list
includes all files, not just those matching a particular pattern
(like *.pm
from our globbing examples). And the
list includes all files, especially the dot files, and particularly
the dot and dot-dot entries.[7]
So, if we wanted only the pm-ending files, we could use a skip-over function inside the loop:
while ($name = readdir DIR) { next unless $name =~ /.pm$/; ... more processing ... }
Note here that the syntax is that of a regular expression, not a glob. And if we wanted all the non-dot files, we could say that:
next if $name =~ /^./;
Or if we wanted everything but the common dot (current directory) and dot-dot (parent directory) entries, we could explicitly say that:
next if $name eq "." or $name eq "..";
Now we’ll look at the part that gets most people mixed up, so
pay close attention. The filenames returned by the
readdir
operator have no
pathname component. It’s just the name
within the directory. So, we’re not looking at
/etc/passwd, we’re just looking at
passwd. (And because this is another difference
from the globbing operation, it’s easy to see how people get
confused.)
So you’ll need to patch up the name to get the full name:
opendir SOMEDIR, $dirname or die "Cannot open $dirname: $!"; while (my $name = readdir SOMEDIR) { next if $name =~ /^./; # skip over dot files $name = "$dirname/$name"; # patch up the path next unless -f $name and -r $name; # only readable files ... }
Without the patch, the file tests would have been checking files in
the current directory, rather than in the directory named in
$dirname
. This is the single most-common mistake
when using directory handles.
You probably won’t need recursive directory access for the first
few dozen hours of your Perl programming career. So rather than
distract you with the possibility of replacing all those ugly
find scripts with Perl right now, we’ll
simply entice you by saying that Perl comes with a nice library
called
File::Find
, which you can use for nifty recursive
directory processing. We’re also saying this to keep you from
writing your own routines, which everyone seems to want to do after
those first few dozen hours of programming, and then getting puzzled
about things like “local directory handles” and
“how do I change my directory back?” So, when
you’re ready, the knowledge will come, but stay with us to
learn about Manipulating Files and Directories (in the next chapter)
instead, right after you finish these exercises.
See Section A.11 for answers to the following exercises.
[12] Write a program to ask the user for a directory name, then change to that directory. If the user enters a line with nothing but whitespace, change to his or her home directory as a default. After changing, list the ordinary directory contents (not the items whose names begin with a dot) in alphabetical order. (Hint: Will that be easier to do with a directory handle or with a glob?) If the directory change doesn’t succeed, just alert the user—but don’t try show the contents.
[4] Modify the program to include all files, not just the ones that don’t begin with a dot.
[5] If you used a directory handle for the previous exercise, rewrite it to use a glob. Or if you used a glob, try it now with a directory handle.
[1] This isn’t a limitation on Perl’s part; it’s actually a feature of Unix, Windows, and other systems. If you really need to change the shell’s working directory, see the documentation of your shell.
[2] Windows users may be accustomed to using a glob of *.* to mean “all files”. But that actually means “all files with a dot in their names,” even in Perl on Windows.
[3] Or it will call a valid substitute if a C-shell wasn’t available.
[4] If the indirect
handle is a text string, then it’s subject to the
“symbolic reference” test that is forbidden under
use strict
. However, the indirect handle might
also be a typeglob or reference to an IO object, and then it would
work even under use strict
.
[5] If you’re using Perl 5.005 or later.
[6] It’s actually the unsorted order of the directory entries, similar to the order you get from ls -f or find.
[7] Do not make the mistake of many old Unix programs and presume that dot and dot-dot are always returned as the first two entries (sorted or not). If that hadn’t even occurred to you, pretend we never said it, because it’s a false presumption. In fact, we’re now sorry for even bringing it up.