13. Manipulating Files and Directories

Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 13. Manipulating Files and Directories

Perl is commonly used to wrangle files and directories. Because Perl grew up in a Unix environment and still spends most of its time there, most of the description in this chapter may seem Unix-centric. But the nice thing is that to whatever degree possible, Perl works exactly the same way on non-Unix systems.

And now a word of warning—some cultures consider the number “13” to be very unlucky. We deliberately placed this material as Chapter 13 of this book, since we’re about to do some pretty dangerous things if bugs creep into the code (like remove files without a chance of recovery), so be very careful when you’re playing with the exercises.

Removing Files

Most of the time, we make files so that the data can stay around for a while. But when the data has outlived its life, it’s time to make the file go away. At the Unix shell level, we’d type an rm command to remove a file or files:

$ rm slate bedrock lava

In Perl, we use the unlink operator:

unlink "slate", "bedrock", "lava";

This sends the three named files away to bit heaven, never to be seen again.

Now, since unlink takes a list, and the glob function (described in Chapter 12) returns a list, we can combine the two to delete many files at once:

unlink glob "*.o";

This is similar to rm *.o at the shell, except that we didn’t have to fire off a separate rm process. So we can make those important files go away that much faster!

The return value from unlink tells us how many files have been successfully deleted. So, back to the first example, we can check its success:

my $successful = unlink "slate", "bedrock", "lava";
print "I deleted $successful file(s) just now
";

Sure, if this number is 3, we know it removed all of the files, and if it’s 0, then we removed none of them. But what if it’s 1 or 2? Well, there’s no clue which ones were removed. If you need to know, do them one at a time in a loop:

foreach my $file (qw(slate bedrock lava)) {
  unlink $file or warn "failed on $file: $!
";
}

Here, each file being deleted one at a time means the return value will be 0 (failed) or 1 (succeeded), which happens to look like a nice Boolean value, controlling the execution of warn. Using or warn is similar to or die, except that it’s not fatal, of course (as we said back in Chapter 11). In this case, we put the newline on the end of the message to warn, because it’s not a bug in our program that causes the message.

When a particular unlink fails, the $! variable is set to something related to the operating system error, which we’ve included in the message. This makes sense to use only when doing one filename at a time, because the next operating system failed request resets the variable. You can’t remove a directory with unlink (just like you can’t remove a directory with the simple rm invocation either). Look for the rmdir function coming up shortly for that.

Now, here’s a little-known Unix fact. It turns out that you can have a file that you can’t read, you can’t write, you can’t execute, maybe you don’t even own the file—that is, it’s somebody else’s file altogether—but you can still delete the file. That’s because the permission to unlink a file doesn’t depend upon the permission bits on the file itself; it’s the permission bits on the directory that contains the file that matter.

We mention this because it’s normal for a beginning Perl programmer, in the course of trying out unlink, to make a file, to chmod it to 0 (so that it’s not readable or writable), and then to see whether this makes unlink fail. But instead it vanishes without so much as a whimper.^[1] If you really want to see a failed unlink, though, just try to remove /etc/passwd or a similar system file. Since that’s a file controlled by the system administrator, you won’t be able to remove it.^[2]

Renaming Files

Giving an existing file a new name is simple with the rename function:

rename "old", "new";

This is similar to the Unix mv command, taking a file named old and giving it the name new in the same directory. You can even move things around:

rename "over_there/some/place/some_file", "some_file";

This moves a file called some_file from another directory into the current directory, provided the user running the program has the appropriate permissions.^[3]

Like most functions that request something of the operating system, rename returns false if it fails, and sets $! with the operating system error, so you can (and often should) use or die (or or warn) to report this to the user.

One frequent^[4] question in the Unix shell-usage newsgroups is how to rename everything that ends with ".old " to the same name with ".new“. Here’s how to do it in Perl nicely:

foreach my $file (glob "*.old") {
  my $newfile = $file;
  $newfile =~ s/.old$/.new/;
  if (-e $newfile) {
    warn "can't rename $file to $newfile: $newfile exists
";
  } elsif (rename $file, $newfile) {
    ## success, do nothing
  } else {
    warn "rename $file to $newfile failed: $!
";
  }
}

The check for the existence of $newfile is needed because rename will happily rename a file right over the top of an existing file, presuming the user has permission to remove the destination filename. We put the check in so that it’s less likely that we’ll lose information this way. Of course, if you wanted to replace existing files like wilma.new, you wouldn’t bother testing with -e first.

Those first two lines inside the loop can be combined (and often are) to simply be:

(my $newfile = $file) =~ s/.old$/.new/;

This works to declare $newfile, copy its initial value from $file, then select $newfile to be modified by the substitution. You can read this as “transform $file to $newfile using this replacement on the right.” And yes, because of precedence, those parentheses are required.

Also, some programmers seeing this substitution for the first time wonder why the backslash is needed on the left, but not on the right. The two sides aren’t symmetrical: the left part of a substitution is a regular expression, and the right part is a double-quoted string. So we use the pattern /.old$/ to mean ".old anchored at the end of the string” (anchored at the end, because we don’t want to rename the first occurrance of .old in a file called betty.old.old), but on the right we can simply write .new to make the replacement.

Links and Files

To understand more about what’s going on with files and directories, it helps to understand the Unix model of files and directories, even if your non-Unix system doesn’t work in exactly this way. As usual, there’s more to the story than we’re able to explain here, so check any good book on Unix internal details if you need the full story.

A mounted volume is a hard disk drive (or something else that works more-or-less like that, such as a disk partition, a floppy disk, a CD-ROM, or a DVD-ROM). It may contain any number of files and directories. Each file is stored in a numbered inode , which we can think of as a particular piece of disk real estate. One file might be stored in inode 613, while another is in inode 7033.

To locate a particular file, though, we’ll have to look it up in a directory. A directory is a special kind of file, maintained by the system. Essentially, it is a table of filenames and their inode numbers.^[5] Along with the other things in the directory, there are always two special directory entries. One is . (called "dot”), which is the name of that very directory; and the other is .. (“dot-dot”), which is the directory one step higher in the hierarchy (i.e., the directory’s parent directory).^[6]

Figure 13-1 provides an illustration of two inodes. One is for a file called chicken, and the other is Barney’s directory of poems, /home/barney/poems, which contains that file. The file is stored in inode 613, while the directory is stored in inode 919. (The directory’s own name, poems, doesn’t appear in the illustration, because that’s stored in another directory.) The directory contains entries for three files (including chicken) and two directories (one of which is the reference back to the directory itself, in inode 919), along with each item’s inode number.

Figure 13-1. The chicken before the egg

When it’s time to make a new file in a given directory, the system adds an entry with the file’s name and the number of a new inode. How can the system tell that a particular inode is available, though? Each inode holds a number called its link count . The link count is always zero if the inode isn’t listed in any directory, so any inode with a link count of zero is available for new file storage. When the inode is added to a directory, the link count is incremented; when the listing is removed, the link count is decremented. For the file chicken as illustrated above, the inode count of 1 is shown in the box above the inode’s data.

But some inodes have more than one listing. For example, we’ve already seen that each directory entry includes ., which points back to that directory’s own inode. So the link count for a directory should always be at least two: its listing in its parent directory and its listing in itself. In addition, if it has subdirectories, each of those will add a link, since each will contain ...^[7] In Figure 13-1, the directory’s inode count of 2 is shown in the box above its data. A link count is the number of true names for the inode.^[8]

Could an ordinary file inode have more than one listing in the directory? It certainly could. Suppose that, working in the directory shown above, Barney uses the Perl link function to create a new link:

link "chicken", "egg"
  or warn "can't link chicken to egg: $!";

This is similar to typing "ln chicken egg" at the Unix shell prompt. If link succeeds, it returns true. If it fails, it returns false and sets $!, which Barney is checking in the error message. After this runs, the name egg is another name for the file chicken, and vice versa; neither name is “more real” than the other, and (as you may have guessed) it would take some detective work to find out which came first. Figure 13-2 shows a picture of the new situation, where there are two links to inode 613.

Figure 13-2. The egg is linked to the chicken

These two filenames are thus talking about the same place on the disk. If the file chicken holds 200 bytes of data, egg holds the same 200 bytes, for a total of 200 bytes (since it’s really just one file with two names). If Barney appends a new line of text to file egg, that line will also appear at the end of chicken. ^[9]

Now, if Barney were to accidentally (or intentionally) delete chicken, that data will not be lost—it’s still available under the name egg. And vice versa: if he were to delete egg, he’d still have chicken. Of course, if he deletes both of them, the data will be lost.^[10]

There’s another rule about the links in directory listings: the inode numbers in a given directory listing all refer to inodes on that same mounted volume.^[11] This rule ensures that if the physical medium (the diskette, perhaps) is moved to another machine, all of the directories stick together with their files. That’s why you can use rename to move a file from one directory to another, but only if both directories are on the same filesystem (mounted volume). If they were on different disks, the inode’s data would have to be relocated, which is too complex an operation for a simple system call.

And yet another restriction on links is that they can’t make new names for directories. That’s because the directories are arranged in a hierarchy. If you were able to change that, utility programs like find and pwd could easily become lost trying to find their way around the filesystem.

So, links can’t be added to directories, and they can’t cross from one mounted volume to another. Fortunately, there’s a way to get around these restrictions on links, by using a new and different kind of link: a symbolic link .^[12] A symbolic link (also called a soft link to distinguish it from the true or hard links that we’ve been talking about up to now) is a special entry in a directory that tells the system to look elsewhere. Let’s say that Barney (working in the same directory of poems as before) creates a symbolic link with Perl’s symlink function, like this:

symlink "dodgson", "carroll"
  or warn "can't symlink dodgson to carroll: $!";

This is similar to what would happen if Barney used the command "ln -s dodgson carroll" from the shell. Figure 13-3 shows a picture of the result, including the poem in inode 7033.

Figure 13-3. A symlink to inode 7033

Now if Barney chooses to read /home/barney/poems/carroll, he gets the same data as if he had opened /home/barney/poems/dodgson, because the system follows the symbolic link automatically. But that new name isn’t the “real” name of the file, because (as you can see in the diagram) the link count on inode 7033 is still just one. That’s because the symbolic link simply tells the system, “If you got here looking for carroll, now you want to go off to find something called dodgson instead.”

A symbolic link can freely cross mounted filesystems or provide a new name for a directory, unlike a hard link. In fact, a symbolic link could point to any filename, one in this directory or in another one—or even to a file that doesn’t exist! But that also means that a soft link can’t keep data from being lost as a hard link can, since the symlink doesn’t contribute to the link count. If Barney were to delete dodgson, the system would no longer be able to follow the soft link.^[13] Even though there would still be an entry called carroll, trying to read from it would give an error like file not found. The file test -l 'carroll' would report true, but -e 'carroll' would be false: it’s a symlink, but it doesn’t exist.

Since a soft link could point to a file that doesn’t yet exist, it could be used when creating a file as well. Barney has most of his files in his home directory, /home/barney, but he also needs frequent access to a directory with a long name that is difficult to type: /usr/local/opt/system/httpd/root-dev/users/staging/barney/cgi-bin. So he sets up a symlink named /home/barney/my_stuff, which points to that long name, and now it’s easy for him to get to it. If he creates a file (from his home directory) called my_stuff/bowling, that file’s real name is /usr/local/opt/system/httpd/root-dev/users/staging/barney/cgi-bin/bowling. Next week, when the system administrator moves these files of Barney’s to /usr/local/opt/internal/httpd/www-dev/users/staging/barney/cgi-bin, Barney just repoints the one symlink, and now he and all of his programs can still find his files with ease.

It’s normal for either /usr/bin/perl or /usr/local/bin/perl (or both) to be symbolic links to the true Perl binary on your system. This makes it easy to switch to a new version of Perl. Say you’re the system administrator, and you’ve built the new Perl. Of course, your older version is still running, and you don’t want to disrupt anything. When you’re ready for the switch, you simply move a symlink or two, and now every program that begins with #!/usr/bin/perl will automatically use the new version. In the unlikely case that there’s some problem, it’s a simple thing to replace the old symlinks and have the older Perl running the show again. (But, like any good admin, you notified your users to test their code with the new /usr/bin/perl-7.2 well in advance of the switch, and you told them that they can keep using the older one during the next month’s grace period by changing their programs’ first lines to #!/usr/bin/perl-6.1, if they need to.)

Perhaps suprisingly, both hard and soft links are very useful. Many non-Unix operating systems have neither, and the lack is sorely felt. On some non-Unix systems, symbolic links may be implemented as a “shortcut” or an “alias”—check the perlport manpage for the latest details.

To find out where a symbolic link is pointing, use the readlink function. This will tell you where the symlink leads, or it will return undef if its argument wasn’t a symlink:

my $where = readlink "carroll";             # Gives "dodgson"

my $perl = readlink "/usr/local/bin/perl";  # Maybe tells where perl is

You can remove either kind of link with unlink—and now you see where that operation gets its name. unlink simply removes the directory entry associated with the given filename, decrementing the link count and thus possibly freeing the inode.

Making and Removing Directories

Making a directory inside an existing directory is easy. Just invoke the mkdir function:

mkdir "fred", 0755 or warn "Cannot make fred directory: $!";

Again, true means success, and $! is set on failure.

But what’s that second parameter, 0755? That’s the initial permission setting^[14] on the newly created directory (you can always change it later). The value here is specified as an octal value because the value will be interpreted as a Unix permission value, which has a meaning based on groups of three bits each, and octal values represent that nicely. Yes, even on Windows or MacPerl, you still need to know a little about Unix permissions values to use the mkdir function. Mode 0755 is a good one to use, because it gives you full permission, but lets everyone else have read access but no permission to change anything.

The mkdir function doesn’t require you to specify this value in octal—it’s just looking for a numeric value (either a literal or a calculation). But unless you can quickly can figure that 0755 octal is 493 decimal in your head, it’s probably easier to let Perl calculate that. And if you accidentally leave off the leading zero, you get 755 decimal, which is 1363 octal, a strange permission combination indeed.

As we saw earlier (in Chapter 2), a string value being used as a number is never interpreted as octal, even if it starts with a leading 0. So this doesn’t work:

my $name = "fred";
my $permissions = "0755";  # danger... this isn't working
mkdir $name, $permissions;

Oops, we just created a directory with that bizarre 01363 permissions, because 0755 was treated as decimal. To fix that, use the oct function, which forces octal interpretation of a string whether or not there’s a leading zero:

mkdir $name, oct($permissions);

Of course, if you are specifying the permission value directly within the program, just use a number instead of a string. The need for the extra oct function shows up most often when the value comes from user input. For example, suppose we take the arguments from the command line:

my ($name, $perm) = @ARGV;  # first two args are name, permissions
mkdir $name, oct($perm) or die "cannot create $name: $!";

The value here for $perm is interpreted as a string initially, and thus the oct function interprets the common octal representation properly.

To remove empty directories, use the rmdir function in a manner similar to the unlink function:

rmdir glob "fred/*";  # remove all empty directories below fred/

foreach my $dir (qw(fred barney betty)) {
  rmdir $dir or warn "cannot rmdir $dir: $!
";
}

As with unlink, rmdir returns the number of directories removed, and if invoked with a single name, sets $! in a reasonable manner on a failure.

The rmdir operator fails for non-empty directories. As a first pass, you can attempt to delete the contents of the directory with unlink, then try to remove what should now be an empty directory. For example, suppose we need a place to write many temporary files during the execution of a program:

my $temp_dir = "/tmp/scratch_$$";       # based on process ID; see the text
mkdir $temp_dir, 0700 or die "cannot create $temp_dir: $!";
...
# use $temp_dir as location of all temporary files
...
unlink glob "$temp_dir/* $temp_dir/.*"; # delete contents of $temp_dir
rmdir $temp_dir;                        # delete now-empty directory

The initial temporary directory name includes the current process ID, which is unique for every running process and is accessed with the $$ variable (similar to the shell). We do this to avoid colliding with any other processes, as long as they also include their process ID as part of their pathname as well. (In fact, it’s common to use the program’s name as well as the process ID, so if the program is called quarry, the directory would probably be something like /tmp/quarry_$$.)

At the end of the program, that last unlink should remove all the files in this temporary directory, and then the rmdir function can delete the then-empty directory. However, if we’ve created subdirectories under that directory, the unlink operator fails on those, and the rmdir also fails. For a more robust solution, check out the rmtree function provided by the File::Path module of the standard distribution.

Modifying Permissions

The Unix chmod command changes the permissions on a file or directory. Similarly, Perl has the chmod function to perform this task:

chmod 0755, "fred", "barney";

As with many of the operating system interface functions, chmod returns the number of items successfully altered, and when used with a single argument, sets $! in a sensible way for error messages when it fails. The first parameter is the Unix permission value (even for non-Unix versions of Perl). For the same reasons we presented earlier in describing mkdir, this value is usually specified in octal.

Symbolic permissions (like +x or go=u-w) accepted by the Unix chmod command are not valid for the chmod function.^[15]

Changing Ownership

If the operating system permits it, you may change the ownership and group membership of a list of files with the chown function. The user and group are both changed at once, and both have to be the numeric user-ID and group-ID values. For example:

my $user = 1004;
my $group = 100;
chown $user, $group, glob "*.o";

What if you have a username like merlyn instead of the number? Simple. Just call the getpwnam function to translate the name into a number, and the corresponding getgrnam ^[16] to translate the group name into its number:

defined(my $user = getpwnam "merlyn") or die "bad user";
defined(my $group = getgrnam "users") or die "bad group";
chown $user, $group, glob "/home/merlyn/*";

The defined function verifies that the return value is not undef, which will be returned if the requested user or group is not valid.

The chown function returns the number of files affected, and it sets $! on error.

Changing Timestamps

In those rare cases when you want to lie to other programs about when a file was most recently modified or accessed, you can use the utime function to fudge the books a bit. The first two arguments give the new access time and modification time, while the remaining arguments are the list of filenames to alter to those timestamps. The times are specified in internal timestamp format (the same type of values returned from the stat function that we mentioned in Chapter 12).

One convenient value to use for the timestamps is “right now”, returned in the proper format by the time function. So to update all the files in the current directory to look like they were modified a day ago, but accessed just now, we could simply do this:

my $now = time;
my $ago = $now - 24 * 60 * 60;  # seconds per day
utime $now, $ago, glob "*";     # set access to now, mod to a day ago

Of course, nothing stops you from creating a file that is arbitrarily stamped far in the future or past (within the limits of the Unix timestamp values of 1970 to 2038, or whatever your non-Unix system uses, until we get 64-bit timestamps). Maybe you could use this to create a directory where you keep your notes for that time-travel novel you’re writing.

The third timestamp (the ctime value) is always set to “now” whenever anything alters a file, so there’s no way to set it (it would have to be reset to “now” after you set it) with the utime function. That’s because it’s primary purpose is for incremental backups: if the file’s ctime is newer than the date on the backup tape, it’s time to back it up again.

Using Simple Modules

Suppose that you’ve got a long filename like /usr/local/bin/perl in your program, and you need to find out the basename. That’s easy enough, since the basename is everything after the last slash (it’s just "perl" in this case):

my $name = "/usr/local/bin/perl";
(my $basename = $name) =~ s#.*/##;  # Oops!

As we saw earlier, first Perl will do the assignment inside the parentheses, then it will do the substitution. The substitution is supposed to replace any string ending with a slash (that is, the directory name portion) with an empty string, leaving just the basename.

And if you try this, it seems to work. Well, it seems to, but actually, there are three problems.

First, a Unix file or directory name could contain a newline character. (It’s not something that’s likely to happen by accident, but it’s permitted.) So, since the regular expression dot (”.“) can’t match a newline, a filename like the string "/home/fred/flintstone /brontosaurus" won’t work right—that code would think the basename is "flintstone /brontosaurus". You could fix that with the /s option to the pattern (if you remembered about this subtle and infrequent case), making the substitution look like this: s#.*/##s

The second problem is that this is Unix-specific. It assumes that the forward slash will always be the directory separator, as it is on Unix, and not the backslash or colon that some systems use.

And the third (and biggest) problem with this is that we’re trying to solve a problem that someone else has already solved. Perl comes with a number of modules, which are smart extensions to Perl that add to its functionality. And if those aren’t enough, there are many other useful modules available on CPAN, with new ones being added every week. You (or, better yet, your system administrator) can install them if you need their functionality.

In the rest of this section, we’ll show you how to use some of the features of a couple of simple modules that come with Perl. (There’s more that these modules can do; this is just an overview to illustrate the general principles of how to use a simple module.)

Alas, we can’t show you everything you’d need to know about using modules in general, since you’d have to understand advanced topics like references and objects in order to use some modules.^[17] But this section should prepare you for using many simple modules. Further information on some interesting and useful modules is included in Appendix B.

The File::Basename Module

In the previous example, we found the basename of a filename in a way that’s not portable. We showed that something that seemed straightforward was susceptible to subtle mistaken assumptions (here, the assumption was that newlines would never appear in file or directory names). And we were re-inventing the wheel, solving a problem that others have solved (and debugged) many times before us.

Here’s a better way to extract the basename of a filename. Perl comes with a module called File::Basename. With the command perldoc File::Basename, or with your system’s documentation system, you can read about what it does. That’s the first step when using a new module. (It’s often the third and fifth step, as well.)

Soon you’re ready to use it, so you declare it with a use directive near the top of your program:^[18]

use File::Basename;

During compilation, Perl sees that line and loads up the module. Now, it’s as if Perl has some new functions that you may use in the remainder of your program.^[19] The one we wanted in the earlier example is the basename function itself:

my $name = "/usr/local/bin/perl";
my $basename = basename $name;  # gives 'perl'

Well, that worked for Unix. What if our program were running on MacPerl or Windows or VMS, to name a few? There’s no problem—this module can tell which kind of machine you’re using, and it uses that machine’s filename rules by default. (Of course, you’d have that machine’s kind of filename string in $name, in that case.)

There are some related functions also provided by this module. One is the dirname function, which pulls the directory name from a full filename. The module also lets you separate a filename from its extension, or change the default set of filename rules.^[20]

Using Only Some Functions from a Module

Suppose you discovered that when you went to add the File::Basename module to your existing program, you already have a subroutine called &dirname—that is, you already have a subroutine with the same name as one of the module’s functions.^[21] Now there’s trouble, because the new dirname is also implemented as a Perl subroutine (inside the module). What do you do?

Simply give File::Basename, in your use declaration, an import list showing exactly which function names it should give you, and it’ll supply those and no others. Here, we’ll get nothing but basename:

use File::Basename qw/ basename /;

And here, we’ll ask for no new functions at all:

use File::Basename qw/ /;

Why would you want to do that? Well, this directive tells Perl to load up File::Basename, just as before, but not to import any function names. Importing lets us use the short, simple function names like basename and dirname. But even if we don’t import those names, we can still use the functions. When they’re not imported, though, we have to call them by their full names:

use File::Basename qw/ /;                     # import no function names

my $betty = &dirname($wilma);                 # uses our own subroutine &dirname 
                                                   #(not shown)

my $name = "/usr/local/bin/perl";
my $dirname = File::Basename::dirname $name;  # dirname from the module

As you see, the full name of the dirname function from the module is File::Basename::dirname. We can always use the function’s full name (once we’ve loaded the module) whether we’ve imported the short name dirname or not.

Most of the time, you’ll want to use a module’s default import list. But you can always override that with a list of your own, if you want to leave out some of the default items. Another reason to supply your own list would be if you wanted to import some function not on the default list, since most modules include some (infrequently needed) functions that are not on the default import list.

As you’d guess, some modules will, by default, import more symbols than others. Each module’s documentation should make it clear which symbols it imports, if any, but you are always free to override the default import list by specifying one of your own, just as we did with File::Basename. Supplying an empty list imports no symbols.

The File::Spec Module

Now you can find out a file’s basename. That’s useful, but you’ll often want to put that together with a directory name to get a full filename. For example, here we want to take a filename like /home/rootbeer/ice-2.1.txt and add a prefix to the basename:

use File::Basename;

print "Please enter a filename: ";
chomp(my $old_name = <STDIN>);

my $dirname = dirname $old_name;
my $basename = basename $old_name;

$basename =~ s/^/not/;  # Add a prefix to the basename
my $new_name = "$dirname/$basename";

rename($old_name, $new_name)
  or warn "Can't rename '$old_name' to '$new_name': $!";

Do you see the problem here? Once again, we’re making the assumption that filenames will follow the Unix conventions and use a forward slash between the directory name and the basename. Fortunately, Perl comes with a module to help with this problem, too.

The File::Spec module is used for manipulating file specifications , which are the names of files, directories, and the other things that are stored on filesystems. Like File::Basename, it understands what kind of system it’s running on, and it chooses the right set of rules every time. But unlike File::Basename, File::Spec is an object-oriented (often abbreviated “OO”) module.

If you’ve never caught the fever of OO, don’t let that bother you. If you understand objects, that’s great; you can use this OO module. If you don’t understand objects, that’s okay, too. You just type the symbols as we show you, and it works just as if you knew what you were doing.

In this case, we learn from reading the documentation for File::Spec that we want to use a method called catfile. What’s a method? It’s just a different kind of function, as far as we’re concerned here. The difference is that you’ll always call the methods from File::Spec with their full names, like this:

use File::Spec;

.
.  # Get the values for $dirname and $basename as above
.

my $new_name = File::Spec->catfile($dirname, $basename);

rename($old_name, $new_name)
  or warn "Can't rename '$old_name' to '$new_name': $!";

As you can see, the full name of a method is the name of the module (called a class , here), a small arrow, and the short name of the method. It is important to use the small arrow, rather than the double-colon that we used with File::Basename.

Since we’re calling the method by its full name, though, what symbols does the module import? None of them. That’s normal for OO modules. So you don’t have to worry about having a subroutine with the same name as one of the many methods of File::Spec.

Should you bother using modules like these? It’s up to you, as always. If you’re sure your program will never be run anywhere but on a Unix machine, say, and you’re sure you completely understand the rules for filenames on Unix,^[22] then you may prefer to hardcode your assumptions into your programs. But these modules give you an easy way to make your programs more robust in less time—and more portable at no extra charge.

Exercises

The programs here are potentially dangerous! Be careful to test them in a mostly empty directory to make it difficult to accidentally delete something useful.

See Section A.12 for answers to the following exercises.

[6] Write a program that works like rm, deleting any files named on the command line. (You don’t need to handle any of the options of rm.)
[10] Write a program that works like mv, renaming the first command-line argument to the second command-line argument. (You don’t need to handle any of the options of mv or additional arguments.) Remember to allow for the destination to be a directory; if it is, use the same original basename in the new directory.
[7] If your operating system supports it, write a program that works like ln, making a hard link from the first command-line argument to the second. (You don’t need to handle options of ln or more arguments.) If your system doesn’t have hard links, just print out a message telling what operation you would perform if it were available. Hint: This program has something in common with the previous one—recognizing that could save you time in coding.
[7] If your operating system supports it, fix up the program from the previous exercise to allow an optional -s switch before the other arguments to indicate that you want to make a soft link instead of a hard link. (Even if you don’t have hard links, see whether you can at least make soft links with this program.)
[7] If your operating system supports it, write a program to find any symbolic links in the current directory and print out their values (like ls -l would: name -> value).

^[1]Some of these folks know that rm would generally ask before deleting such a file. But rm is a command, and unlink is a system call. System calls never ask permission, and they never say they’re sorry.

^[2]Of course, if you’re silly enough to try this kind of thing when you are logged in as the system administrator, you deserve what you get.

^[3]And the files must reside on the same filesystem. We’ll see why this rule exists a little later in this chapter.

^[4]This isn’t just any old frequent question; the question of renaming a batch of files at once is the mostfrequent question asked in these newsgroups. And that’s why it’s the first question answered in the FAQs for those newsgroups. And yet, it stays in first place. Hmmm.

^[5]On Unix systems (others don’t generally have inodes, hard links, and such), you can use the ls command’s -i option to see files’ inode numbers. Try a command like ls -ail. When two or more inode numbers are the same for multiple items on a given filesystem, there’s really just one file involved, one piece of the disk.

^[6]The Unix system root directory has no parent. In that directory, .. is the same directory as ., which is the system root directory itself.

^[7]This implies that the link count of a directory is always equal to two plus the number of directories it contains. On some systems that’s true, in fact, but some other systems work differently.

^[8]In the traditional output of ls -l, the number of hard links to the item appears just to the right of the permission flags (like "-rwxr-xr-x“). Now you know why this number is more than one for directories and nearly always 1 for ordinary files.

^[9]If you experiment with making links and changing text files, be aware that most text editors don’t edit the file “in place” but instead save a modified copy. If Barney were to edit egg with a text editor, he’d most likely end up with a new file called egg and the old file called chicken—two separate files, rather than two links to the same file.

^[10]Although the system won’t necessarily overwrite this inode right away, there’s no easy way in general to get the data back once the link count has gone to zero. Have you made a backup recently?

^[11]The one exception is the special .. entry in the volume’s root directory, which refers to the directory in which that volume is mounted.

^[12]Some veryold Unix systems don’t support symlinks, but those are pretty rare nowadays.

^[13]Deleting carroll would merely remove the symlink, of course.

^[14]The permission value is modified by the umask value in the usual way. See umask(2) for further information.

^[15]Unless you’ve installed and invoke the File::chmod module from CPAN, which can apparently upgrade the chmod operator to understand symbolic mode values.

^[16]These two are among the ugliest function names known to mankind. But don’t blame Larry for them; he’s just giving them the same names that the folks at Berkeley did.

^[17]As we’ll see in the next few pages, though, you may be able to use a module that uses objects and references without having to understand those advanced topics.

^[18]It’s traditional to declare modules near the top of the file, since that makes it easy for the maintenance programmer to see which modules you’ll be using. That greatly simplifies matters when it’s time to install your program on a new machine, for example.

^[19]You guessed it: there’s more to the story, having to do with packages and fully qualified names. When your programs are growing beyond a few hundred lines in the main program (not counting code in modules), which is quite large in Perl, you should probably read up about these advanced features. Start with the perlmod manpage.

^[20]You might need to change the filename rules if you were trying to work with a Unix machine’s filenames from a Windows machine—perhaps while sending commands over an FTP connection, for example.

^[21]Well, it’s not likely that you would already have a &dirname subroutine that you use for another purpose, but this is just an example. Some modules offer hundreds (really!) of new functions, making a name collision that much more frequent.

^[22]If you didn’t know that filenames and directory names could contain newline characters, as we mentioned earlier in this section, then you don’t know all the rules, do you?

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 13. Manipulating Files and Directories

Create new playlist

Sign In

Sign Up