Chapter 12. Directory Operations

The files we created in the previous chapter were generally in the same place as our program. But modern operating systems let us organize files into directories, allowing us to keep our Beatles MP3s away from our important Llama book chapter sources so that we don’t accidentally send an MP3 file to the publisher. Perl lets you manipulate these directories directly, in ways that are even fairly portable from one operating system to another.

Moving Around the Directory Tree

Your program runs with a “working directory,” which is the starting point for relative pathnames. That is, if you refer to the file fred, that means "fred in the current working directory.”

The chdir operator changes the working directory. It’s just like the Unix shell’s cd command:

chdir "/etc" or die "cannot chdir to /etc: $!";

Because this is a system request, the value of $! will be set if an error occurs. You should normally check $! when a false value is returned from chdir, since that indicates that something has not gone as requested.

The working directory is inherited by all processes that Perl starts (we’ll talk more about that in Chapter 14). However, the change in working directory cannot affect the process that invoked Perl, such as the shell.[1] So you can’t make a Perl program to replace your shell’s cd command.

If you omit the parameter, Perl determines your home directory as best as possible and attempts to set the working directory to your home directory, similar to using the cd command at the shell without a parameter. This is one of the few places where omitting the parameter doesn’t use $_.

Some shells permit you to use a tilde-prefixed path with cd to use another user’s home directory as a starting point (like cd ~merlyn). This is a function of the shell, not the operating system, and Perl is calling the operating system directly. Thus, a tilde-prefix will not work with chdir.

Globbing

Normally, the shell expands any filename patterns on each command line into the matching filenames. This is called globbing. For example, if you give a filename pattern of *.pm to the echo command, the shell expands this list to a list of names that match:

$ echo *.pm
barney.pm dino.pm fred.pm wilma.pm
$

The echo command doesn’t have to know anything about expanding *.pm, because the shell has already expanded it. This works even for your Perl programs:

$ cat >show-args
            foreach $arg (@ARGV) {
              print "one arg is $arg
";
            }
            ^D
$ perl show-args *.pm
one arg is barney.pm
one arg is dino.pm
one arg is fred.pm
one arg is wilma.pm
$

Note that show-args didn’t need to know anything about globbing—the names were already expanded in @ARGV.

But sometimes we end up with a pattern like *.pm inside our Perl program. Can we expand this pattern into the matching filenames without working very hard? Sure—just use the glob operator:

my @all_files = glob "*";
my @pm_files = glob "*.pm";

Here, @all_files gets all the files in the current directory, alphabetically sorted, and not including the files beginning with a period, just like the shell. And @pm_files gets the same list as we got before by using *.pm on the command line.

In fact, anything you can say on the command line, you can also put as the (single) argument to glob, including multiple patterns separated by spaces:

my @all_files_including_dot = glob ".* *";

Here, we’ve included an additional “dot star” parameter to get the filenames that begin with a dot as well as the ones that don’t. Please note that the space between these two items inside the quoted string is significant, as it separates two different items to be globbed.[2]

The reason this works exactly as the shell works is that prior to Perl Version 5.6, the glob operator simply called /bin/csh [3] behind the scenes to perform the expansion. Because of this, globs were time-consuming and could break in large directories, or in some other cases. Conscientious Perl hackers avoided globbing in favor of directory handles, which will be discussed in Section 12.4 later in this chapter. However, if you’re using a modern version of Perl, you should no longer be concerned about such things.

An Alternate Syntax for Globbing

Although we use the term globbing freely, and we talk about the glob operator, you might not see the word glob in very many of the programs that use globbing. Why not? Well, most legacy code was written before the glob operator was given a name. Instead, it was called up by the angle-bracket syntax, similar to reading from a filehandle:

my @all_files = <*>; ## exactly the same as my @all_files = glob "*";

The value between the angle brackets is interpolated similarly to a double-quoted string, which means that Perl variables are expanded to their current Perl values before being globbed:

my $dir = "/etc";
my @dir_files = <$dir/* $dir/.*>;

Here, we’ve fetched all the non-dot and dot files from the designated directory, because $dir has been expanded to its current value.

So, if using angle brackets means both filehandle reading and globbing, how does Perl decide which of the two operators to use? Well, a filehandle has to be a Perl identifier. So if the item between the angle brackets is strictly a Perl identifier, it’s a filehandle read; otherwise, it’s a globbing operation. For example:

my @files = <FRED/*>;   ## a glob
my @lines = <FRED>;    ## a filehandle read
my $name = "FRED";
my @files = <$name/*>; ## a glob

The one exception is if the contents are a simple scalar variable (not an element of a hash or array), then it’s an indirect filehandle read ,[4] where the variable contents give the name of the filehandle to be read:

my $name = "FRED";
my @lines = <$name>; ## an indirect filehandle read of FRED handle

Determining whether it’s a glob or a filehandle read is made at compile time, and thus it is independent of the content of the variables.

If you want, you can get the operation of an indirect filehandle read using the readline operator,[5] which also makes it clearer:

my $name = "FRED";
my @lines = readline FRED;  ## read from FRED
my @lines = readline $name; ## read from FRED

But the readline operator is rarely used, as indirect filehandle reads are uncommon and are generally performed against a simple scalar variable anyway.

Directory Handles

Another way to get a list of names from a given directory is with a directory handle. A directory handle looks and acts like a filehandle. You open it (with opendir instead of open), you read from it (with readdir instead of readline), and you close it (with closedir instead of close). But instead of reading the contents of a file, you’re reading the names of files (and other things) in a directory. For example:

my $dir_to_process = "/etc";
opendir DH, $dir_to_process or die "Cannot open $dir_to_process: $!";
foreach $file (readdir DH) {
  print "one file in $dir_to_process is $file
";
}
closedir DH;

Like filehandles, directory handles are automatically closed at the end of the program or if the directory handle is reopened onto another directory.

Unlike globbing, which in older versions of Perl fired off a separate process, a directory handle never fires off another process. So it makes them more efficient for applications that demand every ounce of power from the machine. However, it’s also a lower-level operation, meaning that we have to do more of the work ourselves.

For example, the names are returned in no particular order.[6] And the list includes all files, not just those matching a particular pattern (like *.pm from our globbing examples). And the list includes all files, especially the dot files, and particularly the dot and dot-dot entries.[7]

So, if we wanted only the pm-ending files, we could use a skip-over function inside the loop:

while ($name = readdir DIR) {
  next unless $name =~ /.pm$/;
  ... more processing ...
}

Note here that the syntax is that of a regular expression, not a glob. And if we wanted all the non-dot files, we could say that:

next if $name =~ /^./;

Or if we wanted everything but the common dot (current directory) and dot-dot (parent directory) entries, we could explicitly say that:

next if $name eq "." or $name eq "..";

Now we’ll look at the part that gets most people mixed up, so pay close attention. The filenames returned by the readdir operator have no pathname component. It’s just the name within the directory. So, we’re not looking at /etc/passwd, we’re just looking at passwd. (And because this is another difference from the globbing operation, it’s easy to see how people get confused.)

So you’ll need to patch up the name to get the full name:

opendir SOMEDIR, $dirname or die "Cannot open $dirname: $!";
while (my $name = readdir SOMEDIR) {
  next if $name =~ /^./; # skip over dot files
  $name = "$dirname/$name"; # patch up the path
  next unless -f $name and -r $name; # only readable files
  ...
}

Without the patch, the file tests would have been checking files in the current directory, rather than in the directory named in $dirname. This is the single most-common mistake when using directory handles.

Recursive Directory Listing

You probably won’t need recursive directory access for the first few dozen hours of your Perl programming career. So rather than distract you with the possibility of replacing all those ugly find scripts with Perl right now, we’ll simply entice you by saying that Perl comes with a nice library called File::Find, which you can use for nifty recursive directory processing. We’re also saying this to keep you from writing your own routines, which everyone seems to want to do after those first few dozen hours of programming, and then getting puzzled about things like “local directory handles” and “how do I change my directory back?” So, when you’re ready, the knowledge will come, but stay with us to learn about Manipulating Files and Directories (in the next chapter) instead, right after you finish these exercises.

Exercises

See Section A.11 for answers to the following exercises.

  1. [12] Write a program to ask the user for a directory name, then change to that directory. If the user enters a line with nothing but whitespace, change to his or her home directory as a default. After changing, list the ordinary directory contents (not the items whose names begin with a dot) in alphabetical order. (Hint: Will that be easier to do with a directory handle or with a glob?) If the directory change doesn’t succeed, just alert the user—but don’t try show the contents.

  2. [4] Modify the program to include all files, not just the ones that don’t begin with a dot.

  3. [5] If you used a directory handle for the previous exercise, rewrite it to use a glob. Or if you used a glob, try it now with a directory handle.



[1] This isn’t a limitation on Perl’s part; it’s actually a feature of Unix, Windows, and other systems. If you really need to change the shell’s working directory, see the documentation of your shell.

[2] Windows users may be accustomed to using a glob of *.* to mean “all files”. But that actually means “all files with a dot in their names,” even in Perl on Windows.

[3] Or it will call a valid substitute if a C-shell wasn’t available.

[4] If the indirect handle is a text string, then it’s subject to the “symbolic reference” test that is forbidden under use strict. However, the indirect handle might also be a typeglob or reference to an IO object, and then it would work even under use strict.

[5] If you’re using Perl 5.005 or later.

[6] It’s actually the unsorted order of the directory entries, similar to the order you get from ls -f or find.

[7] Do not make the mistake of many old Unix programs and presume that dot and dot-dot are always returned as the first two entries (sorted or not). If that hadn’t even occurred to you, pretend we never said it, because it’s a false presumption. In fact, we’re now sorry for even bringing it up.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset