There are plenty of applications involving files that do not involve opening a filehandle. Examples include copying, moving, or renaming files, and interrogating files for their size, permissions, or ownership. In this chapter, we cover the manipulation of files and directories in ways other than opening filehandles to read or write them using Perl's built-in functions. We also look at the various modules provided as standard with Perl to make file handling both simpler and portable across different platforms, finding files with wildcards through file name globbing, and creating temporary files.
While the bulk of this chapter is concerned with files, we also spend some time looking at directories, which while fundamentally different entities to files, turn out to provide broadly similar techniques to manipulate them.
Files are located in a file system, which stores attributes for every file it references. The most obvious attribute a file has is a file name, but we can also test filing system entries for properties such as their type (file, directory, link) and access permissions with Perl's file test operators. To retrieve detailed information about a file's attributes, we also have the stat
and lstat
functions at our disposal. Perl also provides built-in functions to manipulate file permissions, ownership, and create or destroy file names within the file system. However, the built-in functions are limited in how versatile they can be on different platforms, because the concepts they embody originate with Unix in mind and are not always portable in themselves.
In addition to the built-in functions, Perl provides a toolkit of modules for handling files portably, regardless of the underlying platform. Some wrap built-in functions, like File::stat
, or aggregate many file test operations into a single function call, like File::CheckTree
. Others provide useful features such as finding or comparing files, like File::Find
and File::Compare
. Most of these modules are built on top of File::Spec
, which provides basic support for cross-platform file name handling. It is almost always a good idea to use these modules in place of a built-in function whenever portability is a concern.
Beyond basic file handling, Perl also provides the glob
function for retrieving file names through wildcard specifications. The built-in glob function is actually implemented in terms of a family of standard modules—each handling a different platform—that we can also use directly for greater control.
A final but important aspect of file handling is the creation and use of temporary files. Apparently simple on the surface, there are several ways to create a temporary file, each with its own advantages, drawbacks, and portability issues.
Perl provides built-in support for handling user and group information on Unix platforms through the getpwent
and getgrent
families of functions. This support is principally derived from the under-lying C library functions of the same names, which are in turn dependent on the details of the implementation provided by the operating system. All Unix platforms provide broadly the same features for user and group management, but they vary slightly in what additional information they store. While Perl makes a reasonable attempt to unify all the variations, the system documentation is the best source of information on what values these functions return.
Unix platforms define user and group information in the /etc/passwd
and /etc/group
files, but this oversimplifies the actual process of looking up user and group information for two reasons. First, if a shadow password file is in use, then the user information in /etc/passwd
will not contain an encrypted password in the password field. Second, if alternative sources of user and group information are configured (such as NIS or NIS+), then requesting user or group information may initiate a network lookup to retrieve information from a remote server. The order in which local and remote information sources are consulted is typically defined by the file /etc/nsswitch.conf
.
Support for other security models and platforms is not provided through built-in functions, but through extension modules. Windows programmers, for example, can make use of the Win32::AdminMisc
module to gain access to the Win32 Security API. Windows and other non-Unix platforms do not support getpwent
or getgrent
, though the Cygwin environment does provide a veneer of Unix security that allows these functions to work on Windows platforms with limited functionality, enough for Perl programs that use them to function. Access Control Lists (ACLs) and other advanced security features are beyond the reach of the built-in functions even on Unix platforms, but they can be handled via various modules available from CPAN.
User Information
Unix platforms store local user information in the /etc/passwd
file (though as noted previously they may also retrieve information remotely). The format varies slightly but typically has a structure like this:
fred:RGdmsaynFgP56:301:200:Fred A:/home/fred:/bin/bash
jim:Edkl1y7NMtO/M:302:200:Jim B:/home/jim:/bin/ksh
mysql:!!:120:120:MySQL server:/var/lib/mysql:/bin/csh
Each line contains the following fields: name, password, user ID, primary group ID, comment/GECOS, home directory, and login shell. In this case, we are not using a shadow password file, so the password field contains an encrypted password. The first two lines are for regular users, while the third defines an identity for a MySQL database server to run as. It does not need a password since it is not intended as a login user, so the password is disabled with !!
(*
is often also used for this purpose).
The getpwent
function (pwent
is short for "password entry") retrieves one entry from the user information file at a time, starting from the first. In list context, it returns no less than ten fields:
($name, $passwd, $uid, $gid, $quota, $comment, $gcos, $dir, $shell, $expire)
= getpwent;
Since the format and source of user information varies, not all these fields are always defined, and some of them have alternate meanings. A summary of each field and its possible meanings is given in Table 13-1; consult the manual page for the passwd
file (typically via man 5 passwd
) for exact details of what fields are provided on a given platform.
In scalar context, getpwent
returns just the name of the user, that is, the first field. To illustrate, we can generate a list of user names with a program like the following:
#!/usr/bin/perl
# listusers.pl
use warnings;
use strict;
my @users;
while (my $name = getpwent) {
push @users, $name;
}
print "Users: @users
";
Supporting getpwent
are the setpwent
and endpwent
functions. The setpwent
function resets the pointer for the next record returned by getpwent
to the start of the password file. It is analogous to the rewinddir
function in the same way that getpwent
is analogous to both opendir
and readdir
combined. Since there only is one password file, it takes no arguments:
setpwent;
The endpwent
function is analogous to closedir
: it closes the internal file pointer created whenever we use getpwent
(or getpwnam/getpwuid
, detailed in the upcoming text). We cannot get access to this internal filehandle, but it may be freed in order to recapture consumed resources. Additionally, if a network query was made, then this will close the connection:
endpwent;
The getpwnam
and getpwuid
functions look up user names and user IDs from each other. getpwnam
takes a user name as an argument and returns the user ID in scalar context or the full list of ten in a list context:
$uid = getpwnam($username);
@fields = getpwname($username);
Similarly, getpwuid
takes a numeric user ID and returns either the name or a list of fields, depending on context:
$username = getpwuid($uid);
@fields = getpwuid($uid);
Both functions also have the same effect as setpwent
in that they reset the position of the pointer used by getpwent
, so they cannot be combined with it in loops.
Since ten fields is rather a lot to manage, Perl supplies the User::pwent
module to provide an object-oriented interface to the pw
functions. It is one of several modules that all behave similarly; others are User::grent
(for group information), Net::hostent, Net::servent, Net::netent, Net::protoent
(for network information), and Stat
(for the stat
and lstat
functions).
User::pwent
works by overloading the built-in getpwent, getpwnam
, and getpwuid
functions with object-oriented methods returning a pw
object, complete with methods to extract the relevant fields. It also has the advantage of knowing what methods actually apply, which we can determine using the pw_has
class method. Here is an object-oriented user information listing program, which uses getpwent
to illustrate how the User::pwent
module is used:
#!/usr/bin/perl
# listobjpw.pl
use warnings;
use strict;
use User::pwent qw(:DEFAULT pw_has);
print "Supported fields: ", scalar(pw_has), "
";
while (my $user = getpwent) {
print 'Name : ', $user->name, "
";
print 'Password: ', $user->passwd, "
";
print 'User ID : ', $user->uid, "
";
print 'Group ID: ', $user->gid, "
";
# one of quota, change or age
print 'Quota : ', $user->quota, "
" if pw_has('quota'),
print 'Change : ', $user->change, "
" if pw_has('change'),
print 'Age : ', $user->age, "
" if pw_has('age'),
# one of comment or class (also possibly gcos is comment)
print 'Comment : ', $user->comment, "
" if pw_has('comment'),
print 'Class : ', $user->class, "
" if pw_has('class'),
print 'Home Dir: ', $user->dir, "
";
print 'Shell : ', $user->shell, "
";
# maybe gecos, maybe not
print 'GECOS : ',$user->gecos,"
" if pw_has('gecos'),
# maybe expires, maybe not
print 'Expire : ', $user->expire, "
" if pw_has('expire'),
# separate records
print "
";
}
If called with no arguments, the pw_has
class method returns a list of supported fields in list context, plus a space-separated string suitable for printing in scalar context. Because we generally want to use it without prefixing User::pwent::
we specify it in the import list. However, to retain the default imports that override getpwent
and the like, we also need to specify the special :DEFAULT
tag.
We can also import scalar variables for each field and avoid the method calls by adding the :FIELDS
tag (which also implies :DEFAULT
) to the import list. This generates a set of scalar variables with the same names as their method equivalents but prefixed with pw_
. The equivalent of the preceding object-oriented script written using field variables is
#!/usr/bin/perl
# listfldpw.pl
use warnings;
use strict;
use User::pwent qw(:FIELDS pw_has);
print "Supported fields: ", scalar(pw_has), "
";
while (my $user = getpwent) {
print 'Name : ', $pw_name, "
";
print 'Password: ', $pw_passwd, "
";
print 'User ID : ', $pw_uid, "
";
print 'Group ID: ', $pw_gid, "
";
# one of quota, change or age
print 'Quota : ', $pw_quota, "
" if pw_has('quota'),
print 'Change : ', $pw_change, "
" if pw_has('change'),
print 'Age : ', $pw_age, "
" if pw_has('age'),
# one of comment or class (also possibly gcos is comment)
print 'Comment : ', $pw_comment, "
" if pw_has('comment'),
print 'Class : ', $pw_class, "
" if pw_has('class'),
print 'Home Dir: ', $pw_dir, "
";
print 'Shell : ', $pw_shell, "
";
# maybe gcos, maybe not
print 'GCOS : ', $pw_gecos, "
" if pw_has('gecos'),
# maybe expires, maybe not
print 'Expire : ', $pw_expire, "
" if pw_has('expire'),
# separate records
print "
";
}
We may selectively import variables if we want to use a subset, but since this overrides the default import, we must also explicitly import the functions we want to override:
use User::grent qw($pw_name $pw_uid $pw_gid getpwnam);
To call the original getpwent, getpwnam
, and getpwuid
functions, we can use the CORE::
prefix. Alternatively, we could suppress the overrides by passing an empty import list or a list containing neither :DEFAULT
or :FIELDS
. As an example, here is another version of the preceding script that invents a new object method, has
, for the Net::pwent
package, then uses that and class method calls only, avoiding all imports:
#!/usr/bin/perl
# listcorpw.pl
use warnings;
use strict;
use User::pwent();
sub User::pwent::has {
my $self = shift;
return User::pwent::pw_has(@_);
}
print "Supported fields: ", scalar(User::pwent::has), "
";
while (my $user = User::pwent::getpwent) {
print 'Name : ', $user->name, "
";
print 'Password: ', $user->passwd, "
";
print 'User ID : ', $user->uid, "
";
print 'Group ID: ', $user->gid, "
";
# one of quota, change or age
print 'Quota : ', $user->quota, "
" if $user->has('quota'),
print 'Change : ', $user->change, "
" if $user->has('change'),
print 'Age : ', $user->age, "
" if $user->has('age'),
# one of comment or class (also possibly gcos is comment)
print 'Comment : ', $user->comment, "
" if $user->has('comment'),
print 'Class : ', $user->class, "
" if $user->has('class'),
print 'Home Dir: ', $user->dir, "
";
print 'Shell : ', $user->shell, "
";
# maybe gcos, maybe not
print 'GECOS : ', $user->gecos, "
" if $user->has('gecos'),
# maybe expires, maybe not
print 'Expire : ', $user->expire, "
" if $user->has('expire'),
# separate records
print "
";
}
As a convenience, the Net::pwent
module also provides the getpw
subroutine, which takes either a user name or a user ID, returning a user object either way:
$user = getpw($user_name_or_id);
If the passed argument looks numeric, then getpwuid
is called underneath to do the work; otherwise, getpwnam
is called.
Unix groups are a second tier of privileges between the user's own privileges and that of all users on the system. All users belong to one primary group, and files they create are assigned to this group. This information is locally recorded in the /etc/passwd
file and can be found locally or remotely with the getpwent, getpwnam
, and getpwuid
functions as described previously. In addition, users may belong to any number of secondary groups. This information, along with the group IDs (or gid
s) and group names, is locally stored in the /etc/group
file and can be extracted locally or remotely with the getgrent, getgrnam
, and getgrgid
functions.
The getgrent
function reads one entry from the group's file each time it is called, starting with the first and returning the next entry in turn on each subsequent call. It returns four fields, the group name, a password (which is usually not defined), the group ID, and the users who belong to that group:
#!/usr/bin/perl
# listgr.pl
use warnings;
use strict;
while (my ($name, $passwd, $gid, $members) = getgrent) {
print "$gid: $name [$passwd] $members
";
}
Alternatively, calling getgrent
in a scalar context returns just the group name:
#!/usr/bin/perl
# listgroups.pl
use warnings;
use strict;
my @groups;
while (my $name = getgrent) {
push @groups, $name;
}
print "Groups: @groups
";
As with getpwent
, using getgrent
causes Perl (or more accurately, the underlying C library) to open a filehandle (or open a connection to an NIS or NIS+ server) internally. Mirroring the supporting functions of getpwent, setgrent
resets the pointer of the group filehandle to the start, and endgrent
closes the file (and/or network connection) and frees the associated resources.
Perl provides the User::grent
module as an object-oriented interface to the getgrent, getgrnam
, and getgrid
functions. It works very similarly to User::pwent
, but it provides fewer methods as it has fewer fields to manage. It also does not have to contend with the variations of field meanings that User::pwent
does, and it is consequently simpler to use. Here is an object-oriented group lister using User::getgrent
:
#!/usr/bin/perl
# listbigr
use warnings;
use strict;
use User::grent;
while (my $group = getgrent) {
print 'Name : ', $group->name, "
";
print 'Password: ', $group->passwd, "
";
print 'Group ID: ', $group->gid, "
";
print 'Members : ', join(', ', @{$group->members}), "
";
}
Like User::pwent
(and indeed all similar modules like Net::hostent
, etc.), we can import the :FIELDS
tag to variables that automatically update whenever any of getgrent, getgrnam
, or getgrgid
are called. Here is the previous example reworked to use variables:
#!/usr/bin/perl
# listfldgr.pl
use warnings;
use strict;
use User::grent qw(:FIELDS);
while (my $group = getgrent) {
print 'Name : ', $gr_name, "
";
print 'Password: ', $gr_passwd, "
";
print 'Group ID: ', $gr_gid, "
";
print 'Members : ', join(', ', @{$group->members}), "
";
}
We can also selectively import variables if we only want to use some of them:
use User::grent qw($gr_name $gr_gid);
In this case, the overriding of getgrent
and the like will not take place, so we would need to call User::grent::getgrent
rather than just getgrent
, or pass getgrent
as a term in the import list. To avoid importing anything at all, just pass an empty import list.
Perl provides a full complement of file test operators. They test file names for various properties, for example, determining whether they are a file, directory, link, or other kind of file, determining who owns them and what their access privileges are. All of these file tests consist of a single minus followed by a letter, which determines the nature of the test and either a filehandle or a string containing the file name. Here are a few examples:
-r $filename # return true if file is readable by us
-w $filename # return true if file is writable by us
-d DIRECTORY # return true if DIRECTORY is opened to a directory
-t STDIN # return true if STDIN is interactive
Collectively these functions are known as the -X
or file test operators.
The slightly odd-looking syntax comes from the Unix file test utility test
and the built-in equivalents in most Unix shells. Despite their strange appearance, the file test operators are really functions that behave just like any other built-in unary (single argument) Perl operator, including support for parentheses:
print "It's a file!" if -f($filename);
If no file name or handle is supplied, then the value of $_
is used as a default, which makes for some very terse if somewhat algebraic expressions:
foreach (@files) {
print "$_ is readable textfile
" if -r && -T; # -T for 'text' file
}
Only single letters following a minus sign are interpreted as file tests, so there is never any confusion between file test operators and negated expressions:
-o($name) # test if $name is owned by us
-oct($name) # return negated value of $name interpreted as octal
The full list of file tests follows, loosely categorized into functional groups. Note that not all of these tests may work, depending on the underlying platform. For instance, operating systems that do not understand ownership in the Unix model will not make a distinction between -r
and -R
, since this requires the concept of real and effective user IDs. (The Win32 API does support "impersonation," but this is not the same thing and is supported by Windows-specific modules instead.) They will also not return anything useful for -o
. Similarly, the -b
and -c
tests are specific to Unix device files and have no relevance on other platforms.
This tests for the existence of a file:
-e |
Return true if file exists. Equivalent to the return value of the stat function. |
These test for read, write, and execute for effective and real users. On non-Unix platforms, which don't have the concepts of real and effective users, the uppercase and lowercase versions are equivalent:
-r |
Return true if file is readable by effective user ID. |
-R |
Return true if file is readable by real user ID. |
-w |
Return true if file is writable by effective user ID. |
-W |
Return true if file is writable by real user ID. |
-x |
Return true if file is executable by effective user ID. |
-X |
Return true if file is executable by real user ID. |
The following test for ownership and permissions (-o
returns 1
, others ' '
on non-Unix platforms). Note that these are Unix-based commands. On Windows, files are owned by "groups" as opposed to "users":
-o |
Return true if file is owned by our real user ID. |
-u |
Return true if file is setuid (chmod u+S , executables only). |
-g |
Return true if file is setgid (chmod g+S . executables only). This does not exist on Windows. |
-k |
Return true if file is sticky (chmod +T , executables only). This does not exist on Windows. |
These tests for size work on Windows as on Unix:
-z |
Return true if file has zero length (that is, it is empty). |
-s |
Return true if file has non-zero length (opposite of -z ). |
The following are file type tests. While -f, -d
, and -t
are generic, the others are platform dependent:
The -T
and -B
tests determine whether a file is text or binary (for details see "Testing Binary and Text Files" coming up shortly):
-T |
Return true if file is a text file. |
-B |
Return true if file is not a text file. |
The following tests return timestamps, and also work on Windows:
-M |
Return the age of the file as a fractional number of days, counting from the time at which the application started (which avoids a system call to find the current time). To test which of two files is more recent, we can write $file = (-M $file1 > -M $file2)? $file1: $file2; |
-A |
Return last access time. |
-C |
On Unix, return last inode change time. (Not creation time, as is commonly misconceived. This does return the creation time, but only so long as the inode has not changed since the file was created.) On other platforms, it returns the creation time. |
Link Transparency and Testing for Links
This section is only relevant if our chosen platform supports the concept of symbolic links, which is to say all Unix variants but not most other platforms. In particular, Windows "shortcuts" are an artifact of the desktop and unfortunately have nothing to do with the actual filing system.
The stat
function, which is the basis of all the file test operators (except -l
) automatically follows symbolic links and returns information based on the real file, directory, pipe, etc., that it finds at the end of the link. Consequently, file tests like -f
and -d
return true if the file at the end of the link is a plain file or directory. Therefore we do not have to worry about links when we just want to know if a file is readable:
my @lines;
if (-e $filename) {
if (-r $filename) {
open FILE, $filename; # open file for reading
@lines = <FILE>;
} else {
die "Cannot open $filename for reading
";
}
} else {
die "Cannot open $filename - file does not exist
";
}
If we want to find out if a file is actually a link, we have to use the -l
test. This gathers infor-mation about the link itself and not the file it points to, returning true if the file is in fact a link. A practical upshot of this is that we can test for broken links by testing -l
and -e
:
if (-l $file and !-e $file) {
print "'$file' is a broken link!
";
}
This is also useful for testing that a file is not a link when we do not expect it to be. A utility designed to be run under "root" should check that files it writes to have not been replaced with links to /etc/passwd
, for example.
Testing Binary and Text Files
The -T
and -B
operators test files to see if they are text or binary. They do this by examining the start of the file and counting the number of nontext characters present. If this number exceeds one third, the file is determined to be binary; otherwise, it is determined to be text. If a null (ASCII 0) character is seen anywhere in the examined data, then the file is assumed to be binary.
Since -T
and -B
only make sense in the context of a plain file, they are commonly combined with -f
:
if (-f $file && -T $file) {
...
}
-T
and -B
differ from the other file test operators in that they perform a read of the file in question. When used on a filehandle, both tests read from the current position of the file pointer. An empty file or a filehandle positioned at the end of the file will return true for both -T
and -B
, since in these cases there is no data to determine which is the correct interpretation.
Reusing the Results of a Prior Test
The underlying mechanism behind the file test operators is a call to either stat
or, in the case of -l, lstat
. In order to test the file, each operator will make a call to stat
to interrogate the file for information. If we want to make several tests, this is inefficient, because a disc access needs to be made in each case.
However, if we have already called stat
or lstat
for the file we want to test, then we can avoid these extra calls by using the special filehandle _
, which will substitute the results of the last call to stat
(or lstat
) in place of accessing the file. Here is a short example that tests a file name in six different ways based on one call to lstat
:
#!/usr/bin/perl
# statonce.pl
use warnings;
use strict;
print "Enter filename to test: ";
my $filename = <>;
chomp $filename;
if (lstat $filename) {
print "$filename is a file
" if -f _;
print "$filename is a directory
" if -d _;
print "$filename is a link
" if -l _;
print "$filename is readable
" if -r _;
print "$filename is writable
" if -w _;
print "$filename is executable
" if -x _;
} else {
print "$filename does not exist
";
}
Note that in this example we have used lstat
, so the link test -l _
will work correctly. -l
requires an lstat
and not a stat
, and it will generate an error if we try to use it with the results of a previous stat
:
The stat preceding -l _ wasn't an lstat...
Caching of the results of stat
and lstat
works for prior file tests too, so we could also write something like this:
if (-e $filename) {
print "$filename exists
";
print "$filename is a file
" if -f _;
}
Or:
if (-f $filename && -T _) {
print "$filename exists and is a text file
";
}
The only drawback to this is that only -l
calls lstat
, so we cannot test for a link this way unless the first test is -l
.
Access Control Lists, the Superuser, and the filestat Pragma
The file tests -r, -w
, and -x
and their uppercase counterparts determine their return value from the results of the stat
function. Unfortunately, this does not always produce an accurate result. Some of the reasons that these file tests may produce incorrect or misleading results include
All these cases tend to produce "false positive" results, implying that the file is accessible when in fact it is not. For example, the file may be writable, but the file system is not.
In the case of the superuser, -r, -R, -w
, and -W
will always return true, even if the file is set as unreadable and unwritable, because the superuser can just disregard the actual file permissions. Similarly, -x
and -X
will return true if any of the execute permissions (user, group, other) are set. To check if the file is really writable, we must use stat
and check the file permissions directly:
$mode = ((stat $filename)[2]);
$writable = $mode & 0200; # test for owner write permission
Tip Again, this is a Unix-specific example. Other platforms do not support permissions or support them in a different way.
For the other cases, we can try to use the filetest
pragma, which alters the operation of the file tests for access by overriding them with more rigorous tests that interrogate the operating system instead. Currently there is only one mode of operation, access
, which causes the file test operators to use the underlying access
system call, if available:
use filetest 'access';
This modifies the behavior of the file test operators to use the operating system's access
call to check the true permission of a file, as modified by access control lists, or file systems that are mounted read-only. It also provides an access
subroutine, which allows us to make our own direct tests of file names (note that it does not work on filehandles). It takes a file name and a numeric flag containing the permissions we want to check for. These are defined as constants in the POSIX
module and are listed in Table 13-2.
Table 13.2. POSIX File Permission Constants
Constant | Description |
R_OK |
Test file has read permission. |
W_OK |
Test file has write permission. |
X_OK |
Test file has execute permission. |
F_OK |
Test that file exists. Implied by R_OK, W_OK , or X_OK . |
Note that F_OK
is implied by the other three, so it need never be specified directly (to test for existence, we can as easily use the -e
test, or -f
if we require a plain file).
While access
provides no extra functionality over the standard file tests, it does allow us to make more than one test simultaneously. As an example, to test that a file is both readable and writable, we would use
use filetest 'access';use POSIX;
...
$can_readwrite = access($filename, R_OK|W_OK);
The return value from access
is undef
on failure and "0 but true" (a string that evaluates to zero in a numeric context and true in any other) on success, for instance, an if
or while
condition. On failure, $!
is set to indicate the reason.
Automating Multiple File Tests
We often want to perform a series of different file tests across a range of different files. Installation scripts, for example, often do this to verify that all the installed files are in the correct place and with the correct permissions.
While it is possible to manually work through a list of files, we can make life a little simpler by using the File::CheckTree
module instead. This module provides a single subroutine, validate
, that takes a series of file names and -X
style file tests and applies each of them in turn, generating warnings as it does so.
Unusually for a library subroutine, validate
accepts its input in lines, in order to allow the list of files and tests to be written in the style of a manifest list. In the following example, validate
is being used to check for the existence of three directories and an executable file installed by a fictional application:
$warnings = validate(q{
/home/install/myapp/scripts -d
/home/install/myapp/docs -d
/home/install/myapp/bin -d
/home/install/myapp/bin/myapp -fx
});
validate
returns the number of warnings generated during the test, so we can use it as part of a larger installation script. If we want to disable or redirect the warnings, we can do so by defining a signal handler:
$SIG{__WARN__} = { }; # do nothing
$SIG{__WARN__} = {print LOGFILE @_}; # redirect to install log
The same file may be listed any number of times, with different tests applied each time. Alternatively, multiple tests may be bunched together into one file test, so that instead of specifying two tests one after the other, they can be done together. Hence, instead of writing two lines:
/home/install/myapp/bin/myapp -f
/home/install/myapp/bin/myapp -x
we can write both tests as one line:
/home/install/myapp/bin/myapp -fx
The second test is dependent on the first, so only one warning can be generated from a bunched test. If we want to test for both conditions independently (we want to know if it is not a plain file, and we also want to know if it is not executable), we need to put the tests on separate lines.
Tests may also be negated by prefixing them with a !
, in which case all the individual tests must fail for the line to succeed. For example, to test whether a file is neither setuid
or setgid
:
validate(q{
/home/install/myapp/scripts/myscript.pl !-ug
})
Normal and negated tests cannot be bunched, so if we want to test that a file name corresponds to a plain file that is not executable, we must use separate tests:
validate(q{
/home/install/myapp/scripts/myscript.pl -f
/home/install/myapp/scripts/myscript.pl !-xug
})
Rather than a file test operator, we can also supply the command cd
. This causes the directory named at the start of the line to be made the current working directory. Any relative paths given after this are taken relative to that directory until the next cd
, which may also be relative:
validate(q{
/home/install/myapp cd || die
scripts -rd
cgi cd
guestbook.cgi -xg
guestbook.cgi !-u
.. cd
about_us.html -rf
text.bin -f || warn "Not a plain file"
});
Tip validate
is insensitive to extra whitespace, so we can use additional spacing to clarify what file is being tested where. In the preceding example, we have indented the files to make it clear which directory they are being tested in.
We can supply our own warnings and make tests fatal by suffixing the file test with ||
and either warn
or die
. These work in exactly the same way as their Perl function counterparts. If our own error messages are specified, we can use the variable $file
, supplied by the module, to insert the name of the file whose test failed:
validate(q{
/etc -d || warn "What, no $file directory?
"
/var/spool -d || die
})
This trick relies on the error messages being interpolated at run time, so using single quotes or the q
quoting operator is essential in this case.
One of the advantages of File::CheckTree
is that the file list can be built dynamically, possibly generated from an existing file tree created by File::Find
(detailed in "Finding Files" later in this chapter). For example, using File::Find
, we can determine the type and permissions of each file and directory in a tree, then generate a test list suitable for File::CheckTree
to validate new installations of that tree. See the section "Finding Files" and the other modules in this section for pointers.
If we want to know more than one attribute of a file, we can skip multiple file test operators and instead make use of the stat
or lstat
functions directly. Both functions return details of the file name or filehandle supplied as their argument. lstat
is identical to stat
except in the case of a symbolic link, where stat
will return details of the file pointed to by the link and lstat
will return details of the link itself. In either case, a 13-element list is returned:
# stat filehandle into a list
@stat_info = stat FILEHANDLE;
# lstat file name into separate scalars
($dev, $ino, $mode, $nlink, $uid, $gid, $rdev, $size,
$time, $mtime, $ctime, $blksize, $blocks) = lstat $filename;
The stat
function will also work on a filehandle, though the information returned is greatly influenced by the type of filehandle under interrogation:
my @stdin_info=stat STDIN; # stat standard input
opendir CWD, ".";
my @cwd_info=stat CWD; # stat a dir handle
Note The lstat
function will not work on a filehandle and will generate a warning if we try. lstat
only makes sense for actual files since it is concerned with symbolic links, a filing system concept that does not translate to filehandles.
The thirteen values are always returned, but they may not be defined or have meaning in every case, either because they do not apply to the file or filehandle being tested or because they have no meaning on the underlying platform. Thirteen values is a lot, so the File::stat
module provides an object-oriented interface that lets us refer to these values by name instead. The full list of values, including the meanings and index number, is provided in Table 13-3. The name in the first column is the conventional variable name used previously and also the name of the method provided by the File::stat
module.
Several of the values returned by stat
relate to the "inode" of the file. Under Unix, the inode of a file is a numeric ID, which it is allocated by the file system, and which is its "true" identity, with the file name being just an alias. On platforms that support it, more than one file name may point to the same file, and the number of hard links is returned in the nlink
value and may be greater than one, but not less (since that would mean the inode had no file names). The ctime
value indicates the last time the node of the file changed. It may often mean the creation time. Conversely, the access and modification times refer to actual file access.
On other platforms, some of these values are either undefined or meaningless. Under Windows, the device number is related to the drive letter, there is no inode, and the value of nlink
is always 1. The uid
and gid
values are always zero, and no value is returned for either blocksize
or blocks
, either. There is a mode, though only the file type is useful; the permissions are always 777
. While Windows NT/2000/XP supports a fairly complex permissions system, it is not accessible this way; the Win32::FileSecurity
and Win32::FilePermissions
modules must be used instead. Accessing the values returned by stat
can be a little inconvenient, not to mention inelegant. For example, this is how we find the size of a file:
$size = (stat $filename)[7];
Or, printing it out:
print ((stat $filename)[7]); # need to use extra parentheses with print
Unless we happen to know that the eighth element is the size or we are taking care to write particularly legible code, this leads to unfriendly code. Fortunately, we can use the File::stat
module instead.
Using stat Objects
The File::stat
module simplifies the use of stat
and lstat
by overriding them with subroutines that return stat
objects instead of a list. These objects can then be queried using one of File::stat
's methods, which have the same names as the values that they return.
As an example, this short program uses the size, blksize
, and blocks
methods to return the size of the file supplied on the command line:
#!/usr/bin/perl
# filesize.pl
use warnings;
use strict;
use File::stat;
print "Enter filename: ";
my $filename = <>;
chomp $filename;
if (my $stat = stat $filename) {
print "'$filename' is ", $stat->size,
" bytes and occupies ", $stat->blksize * $stat->blocks,
" bytes of disc space
";
} else {
print "Cannot stat $filename: $|
";
}
As an alternative to using object methods, we can import 13 scalar variables containing the results of the last stat
or lstat
into our program by adding an import list of :FIELDS
. Each variable takes the same name as the corresponding method prefixed with the string st_
. For example:
#!/usr/bin/perl
# filesizefld.pl
use warnings;
use strict;
use File::stat qw(:FIELDS);
print "Enter filename: ";
my $filename = <>;
chomp($filename);
if (stat $filename) {
print "'$filename' is ", $st_size,
" bytes and occupies ", $st_blksize * $st_blocks,
" bytes of disc space
";
} else {
print "Cannot stat $filename: $|
";
}
The original versions of stat
and lstat
can be used by prefixing them with the CORE::
package name:
use File::stat;
...
my @new_stat = stat $filename; # use new 'stat'
my @old_stat = CORE::stat $filename; # use original 'stat'
Alternatively, we can prevent the override from happening by supplying an empty import list:
use File::stat qw(); # or '', etc.
We can now use the File::stat stat
and lstat
methods by qualifying them with the full package name:
my $stat = File::stat::stat $filename;
print "File is ", $stat->size(), " bytes
";
There are three basic kinds of file attribute that we can read and attempt to modify: ownership, access permissions, and the access and modification timestamps. Unix and other platforms that support the concept of file permissions and ownership can make use of the chmod
and chgrp
functions to modify the permissions of a file from Perl. chmod
modifies the file permissions of a file for the three categories: user, group
, and other
. The chown
function modifies which user corresponds to the user
permissions and which group corresponds to the group
permissions. Every other user and group falls under the other
category. Ownership and permissions are therefore inextricably linked and are combined into the mode
value returned by stat
.
File Ownership
File ownership is a highly platform-dependent concept. Perl grew up on Unix systems, and so it attempts to handle ownership in a Unix-like way. Under Unix and other platforms that borrowed their semantics from Unix, files have an owner, represented by the file's user ID, and a group owner, represented by the file's group ID. Each relates to a different set of file permissions, so the user may have the ability to read and write a file, whereas other users in the same group may only get to read it. Others may not have even that, depending on the setting of the file permissions.
File ownership is handled by the chown
function, which maps to both the chown
and chgrp
system calls. It takes at least three parameters, a user ID, a group ID, and one or more files to change:
my @successes = chown $uid, $gid, @files;
The number of files successfully changed is returned. If only one file is given to chown
, this allows a simple Boolean test to be used to determine success:
unless (chown $uid, $gid, $filename) {
die "chown failed: $!
";
}
To change only the user or group, supply -1
as the value for the other parameter. For instance, a chgrp
function can be simulated with
sub chgrp {
return chown(shift, -1, @_);
}
Note that on most systems (that is, most systems that comprehend file ownership in the first place), usually only the superuser can change the user who owns the file, though the group can be changed to another group that the same user belongs to. It is possible to determine if a change of ownership is permitted by calling the sysconf
function:
my $chown_restricted = sysconf(_PC_CHOWN_RESTRICTED);
If this returns a true value, then a chown
will not be permitted.
chown
needs a user or group ID to function; it will not accept a user or group name. To deduce a user ID from the name, at least on a Unix-like system, we can use the getpwnam
function. Likewise, to deduce a group ID from the name, we can use the getgrnam
function. We can use getpwent
and getgrent
instead to retrieve one user or group respectively, as we saw in the section "Getting User and Group Information" earlier in the chapter. As a quick example, the following script builds tables of user and group IDs, which can be subsequently used in chown
:
#!/usr/bin/perl
use warnings;
use strict;
# get user names and primary groups
my (%users, %usergroup);
while (my ($name, $passwd, $uid, $gid) = getpwent) {
$users{$name} = $uid;
$usergroup{$name} = $gid;
}
# get group names and gids
my (%groups, @groups);
while (my ($name, $passwd, $gid) = getgrent) {
$groups{$name} = $gid;
$groups[$gid] = $name;
}
# print out basic user and group information
foreach my $user (sort {$users{$a} <=> $users{$b}} keys %users) {
print "$users{$user}: $user, group ", $usergroup{$user},
" (", $groups[$usergroup{$user}], ")
";
}
File Permissions
Perl provides two functions that are specifically related to file permissions, chmod
and umask
. As noted earlier, these will work for any Unix-like platform, including MacOS X, but not Windows, where the Win32::FileSecurity
and Win32::FilePermissions
modules must be used. The chmod
function allows us to set the permissions of a file. Permissions are grouped into three categories: user
, which applies to the file's owner, group
, which applies to the file's group owner, and other
, which applies to anyone who is not the file's owner or a member of the file's group owner. Within each category each file may be given read, write, and execute permission.
chmod
represents each of the nine values (3 categories × 3 permissions) by a different numeric flag, which are traditionally put together to form a three-digit octal number, each digit corresponding to the respective category. The flag values within each digit are 4
for read permission, 2
for write permission, and 1
for execute permission, as demonstrated by the following examples (prefixed by a leading 0
to remind us that these are octal values):
0200 |
Owner write permission |
0040 |
Group read permission |
0001 |
Other execute permission |
The total of the read, write, and execute permissions for a category is 7
, which is why octal is so convenient to represent the combined permissions flag. Read, write, and execute permission for the owner only would be represented as 0700
. Similarly, read, write, and execute permission for the owner, read and execute permission for the group, and execute-only permission for everyone else would be 0751
, which is 0400
+ 0200
+ 0100
+ 0040
+ 0010
+ 0001
.
Having explained the permissions flag, the chmod
function itself is comparatively simple, taking a permissions flag, as calculated previously, as its first argument and applying it to one or more files given as the second and subsequent arguments. For example:
chmod 0751, @files;
As with chown
, the number of successfully chmodded
files is returned, or zero if no files were changed successfully. If only one file is supplied, the return value of chmod
can be tested as a Boolean result in an if
or unless
statement:
unless (chmod 0751, $file) {
die "Unable to chmod: $!
";
}
The umask
function allows us to change the default permissions mask used whenever Perl creates a new file. The bits in the umask
have the opposite meaning to the permissions passed to chmod
. They unset the corresponding bits in the permissions from the permissions used by open
or sysopen
when the file is created and the resulting permissions set. Thus the permission bits of the umask
mask the permissions that open
and sysopen
try to set. Table 13-4 shows the permission bits that can be used with umask
and their meanings.
Table 13-4. umask File Permissions
umask Number | File Permission |
0 |
Read and write |
1 |
Read and write |
2 |
Read only |
3 |
Read only |
4 |
Write only |
5 |
Write only |
6 |
No read and no write |
7 |
No read and no write |
umask
only defines the access permissions. Called without an argument, it returns the current value of the umask
, which is inherited from the shell and is typically set to a value of 002
(mask other write permission) or 022
(mask group and other write permissions):
$umask = umask;
Alternatively, umask
may be called with a single numeric parameter, traditionally expressed in octal or alternatively as a combination of mode flags as described previously. For example:
umask 022;
Overriding the umask
explicitly is not usually a good idea, since the user might have it set to a more restrictive value. A better idea is to combine the permissions we want to restrict with the existing umask
, using a bitwise OR
. For example:
umask (022 | umask);
The open
function always uses permissions of 0666
(read and write for all categories), whereas sysopen
allows the permissions to be specified in the call. Since umask
controls the permissions of new files by removing unwanted permissions, we do not need to (and generally should not) specify more restrictive permissions to sysopen
.
File Access Times
The built-in utime
function provides the ability to change the last access and last modification time of one or more files. It takes at least three arguments: the new access time, in seconds since 1970/1/1 00:00:00, the new modification time, also in seconds, and then the file or files whose times are to be changed. For example:
my $onedayago=time - 24*60*60;
utime $onedayago, time(), "myfile.txt", "my2ndfile.txt";
This will set the specified files to have a last access time of exactly a day ago and a last modification time of right now. From Perl 5.8, we can also specify undef
to mean "right now," so to emulate the Unix touch
command on all C or C++ files in the current directory, we could use
utime undef, undef, <*.c>, <*.cpp>
The Fcntl
module provides symbolic constants for all of the flags contained in both the permissions and the file type parts of the mode value. It also provides two functions for extracting each part, as an alternative to computing the values by hand:
use Fcntl qw(:mode); # import file mode constants
my $type = IFMT($mode); # extract file type
my $perm = IFMODE($mode); # extract file permissions
printf "File permissions are: %o
", $perm;
The file type part of the mode defines the type of the file and is the basis of the file test operators like -d, -f
, and -l
that test for the type of a file. The Fcntl
module defines symbolic constants for these, and they are summarized in Table 13-5.
Table 13-5. Fcntl Module File Test Symbols
Name | Description | Operator |
S_IFREG |
Regular file | -f |
S_IFDIR |
Directory | -d |
S_IFLNK |
Link | -l |
S_IFBLK |
Block special file | -b |
S_IFCHR |
Character special file | -c |
S_IFIFO |
Pipe or named fifo | -p |
S_IFSOCK |
Socket | -S |
S_IFWHT |
Interactive terminal | -t |
Note that Fcntl
also defines a number of subroutines that test the mode for the desired property. These have very similar names, for example, S_IFDIR
and S_ISFIFO
, and it is easy to get the subroutines and flags confused. Since we have the file test operators, we do not usually need to use these subroutines, so we mention them only to eliminate possible confusion.
These flags can also be used with sysopen, IO::File
's new
method, and the stat
function described previously, where they can be compared against the mode value. As an example of how these flags can be used, here is the equivalent of the -d
file test operator written using stat
and the Fcntl
module:
my $mode = ((stat $filename)[2]);
my $is_directory = $mode & S_IFDIR;
Or, to test that a file is neither a socket or a pipe:
my $is_not_special = $mode & ^(S_IFBLK | S_IF_CHR);
The Fcntl
module also defines functions that do this for us. Each function takes the same name as the flag but with S_IF
replaced with S_IS
. For instance, to test for a directory, we can instead use
my $is_directory = S_ISDIR($mode);
Of course, the -d
file test operator is somewhat simpler in this case.
The permissions part of the mode defines the read, write, and execute privileges that the file grants to the file's owner, the file's group, and others. It is the basis of the file test operators like -r, -w, -u
, and -g
that test for the accessibility of a file. The Fcntl
module also defines symbolic constants for these, summarized in Table 13-6.
Table 13-6. Fcntl Module File Permission Symbols
For example, to test a file for user read and write permission, plus execute permission, we could use
$perms_ok = $mode & S_IRUSR | S_IWUSR | S_IRGRP;
To test that a file has exactly these permissions and no others, we would instead write
$exact_perms = $mode == S_IRUSR | S_IWUSR | S_IRGRP;
The file permission flags are useful not only for making sense of the mode value returned by stat
, but also as inputs for the chmod
function. Consult the manual page for the chmod
system call (on Unix platforms) for details of the more esoteric bits such as sticky
and swap
.
The presence of file names can be manipulated directly with the link
and unlink
built-in functions. These provide the ability to edit the entries for files in the file system, creating new ones or removing existing ones. They are not the same as creating and deleting files, however. On platforms that support the concept, link
creates a new link (entry in the filing system) to an existing file, it does not create a copy (except on Windows, where it does exactly this). Likewise, unlink
removes a file name from the filing system, but if the file has more than one link, and therefore more than one file name, the file will persist. This is an important point to grasp, because it often leads to confusion.
Linking Files
The link
function creates a new link (sometimes called a hard link, to differentiate it from a soft or symbolic link) for the named file. It only works on platforms that support multiple hard links for the same file:
if (link $currentname, $newname) {
print "Linked $currentname to $newname ok
";
} else {
warn "Failed to link: $!
";
}
link
will not create links for directories, though it will create links for all other types of files. For directories, we can create symbolic links only. Additionally, we cannot create hard links between different file systems and not between directories on some file systems (for example, AFS). On Unix, link
works by giving two names in the file system the same underlying inode. On Windows and other file systems that do not support this concept, an attempt to link will create a copy of the original file.
On success, link
returns true, and a new file name will exist for the file. The old one continues to exist and can either be used to read or alter the contents of the file. Both links are therefore exactly equivalent. Immediately after creation, the new link will carry the same permissions and ownership as the original, but this can subsequently be changed with the chmod
and chown
built-in functions to, for example, create a read-only and a read-write entry point to the same data.
Deleting and Unlinking Files
The opposite of linking is unlinking. Files can be unlinked with the built-in unlink
function, which takes one or more file names as a parameter. If no file name is supplied, unlink
uses $_
:
unlink $currentname; # single file
foreach (<*.*>) {
unlink if /.bak$/; # unlink $_ if it ends '.bak'
}
unlink <*.bak>; # the same, via a file glob
On platforms where unlinking does not apply (because multiple hard links are not permissible), unlink
simply deletes the file. Otherwise, unlink
is not necessarily the same as deleting a file, for two reasons. First, if the file has more than one link, then it will still be available by other names in the file system. Although we cannot (easily) find out the names of the other links, we can find out how many links a file has through stat
. We can establish in advance if unlink
will really delete the file or just remove one of its links by calling stat
:
my $links = (stat $filename)[3];
Or more legibly with the File::stat
module:
my $stat = new File::stat($filename);
my $links = $stat->nlink;
Second, on platforms that support it (generally Unix-like ones), if any process has an open filehandle for the file, then it will persist for as long as the filehandle persists. This means that even after an unlink
has completely removed all links to a file, it will still exist and can be read, written, and have its contents copied to a new file. Indeed, the new_tmpfile
method of IO::File
does exactly this if it is possible and true anonymous temporary files are not available—"Temporary Files" covers this in detail later in this chapter. On other platforms (for example, Windows), Perl will generally reject the attempt to unlink the file so long as a process holds an open filehandle on it. Do not rely on the underlying platform allowing a file to be deleted while it is still open; close it first to be sure.
The unlink
function will not unlink directories unless three criteria are met: we are on Unix, Perl was given the -U
flag, and we have superuser privilege. Even so, it is an inadvisable thing to do, since it will also remove the directory contents, including any subdirectories and their contents from the filing system hierarchy, but it will not recycle the same space that they occupy on the disc. Instead they will appear in the lost+found
directory the next time an fsck
filing system check is performed, which is unlikely to be what we intended. The rmdir
built-in command covered later in the chapter is the preferred approach, or see the rmtree
function from File::Path
for more advanced applications involving multiple directories.
Given the preceding, renaming a file is just a case of linking it to a new name, then unlinking it from the old, at least under Unix. The following subroutine demonstrates a generic way of doing this:
sub rename {
my ($current, $new) = @_;
unlink $current if link($current, $new);
}
The built-in rename
function is essentially equivalent to the preceding subroutine:
# using the built-in function:
rename($current, $new);
This is effective for simple cases, but it will fail in a number of situations, most notably if the new file name is on a different file system from the old (a floppy disk to a hard drive, for instance). rename
uses the rename
system call, if available. It may also fail on (non-Unix) platforms that do not allow an open file to be renamed.
For a properly portable solution that works across all platforms, consider using the move
routine from the File::Copy
module. For the simpler cases it will just use rename, but it will also handle special cases and platform limitations.
Symbolic Links
On platforms that support it, we can also create a soft
or symbolic link with the built-in symlink
function. This is syntactically identical to link
but creates a pointer to the file rather than a direct hard link:
if (symlink $currentname, $newname) {
die "Failed to link: $!
";
}
The return value from symlink
is 1
on success or 0
on failure. On platforms that do not support symbolic links (a shortcut
is an invention of the Windows desktop, not the file system), symlink
produces a fatal error. If we are writing code to be portable, then we can protect against this by using eval
:
$linked = eval {symlink($currentname, $newname);};
if (not defined $linked) {
warn "Symlink not supported
";
} else {
warn "Link failed: $!
";
}
To test whether symlink
is available without actually creating a symbolic link, supply an empty file name for both arguments:
my $symlinking = eval {symlink('',''), 1};
If the symlink
fails, eval
will return undef
when it tries to execute the symlink
. If it succeeds, the 1
will be returned. This is a generically useful trick for all kinds of situations, of course.
Symbolic links are the links that the -l
and lstat
functions check for; hard links are indistinguishable from ordinary file names because they are ordinary file names. Most operations performed on symbolic links (with the notable exceptions of -l
and lstat
of course) are transferred to the linked file, if it exists. In particular, symbolic links have the generic file permissions 777
, meaning everyone is permitted to do everything. However, this only means that the permissions of the file that the link points towards take priority. An attempt to open the link for writing will be translated into an attempt to open the linked file and check its permissions rather than those of the symbolic link. Even chmod
will affect the permissions of the real file, not the link.
Symbolic links may legally point to other symbolic links, in which case the end
of the link is the file that the last symbolic link points to. If the file has subsequently been moved or deleted, the symbolic link is said to be "broken." We can check for broken links with
if (-l $linkname and !-e $linkname) {
print "$linkname is a broken link!
";
}
See "Interrogating Files with stat and lstat" earlier in the chapter for more on this (and in particular why the special file name _
cannot be used after -e
in this particular case) and some variations on the same theme.
One way to copy a file to a new name is to open a filehandle for both the old and the new names and copy data between them, as this rather simplistic utility attempts to do:
#!/usr/bin/perl
# dumbcopy
use warnings;
use strict;
print "Filename: ";
my $infile = <>;
chomp $infile;
print "New name: ";
my $outfile = <>;
chomp $outfile;
open IN, $infile;
open OUT, "> $outfile";
print OUT <IN>;
close IN;
close OUT;
The problem with this approach is that it does not take into account the existing file permissions and ownerships. If we run this on a Unix platform and the file we are copying happens to be executable, the copy will lose the executable permissions. If we run this on a system that cares about the difference between binary and text files, the file can become corrupted unless we also add a call to binmode
. Fortunately, the File::Copy
module handles these issues for us.
The File::Copy
module provides subroutines for moving and copying files without having to directly manipulate them via filehandles. It also correctly preserves the file permissions. To make use of it, we just need to use it:
use File::Copy;
File::Copy
contains two primary subroutines, copy
and move
. copy
takes the names of two files or filehandles as its arguments and copies the contents of the first to the second, creating it if necessary. If the first argument is a filehandle, it is read from; if the second is a filehandle, it is written to. For example:
copy "myfile", "myfile2"; # copy one file to another
copy "myfile", *STDOUT; # copy file to standard output
copy LOG, "logfile"; # copy input to filehandle
If neither argument is a filehandle, copy
does a system copy in order to preserve file attributes and permissions. This copy is directly available as the syscopy
subroutine and is portable across platforms, as we will see in a moment.
copy
also takes a third, optional argument, which if specified determines the buffer size to use. For instance, to copy the file in chunks of 16K, we might use
copy "myfile", "myfile2", 16 * 1024;
Without a buffer size, copy
will default to the size of the file, or 2MB, whichever is smaller. Setting a smaller buffer will cause the copy to take longer, but it will use less memory while doing it.
move
takes the names of two files (not filehandles) as its arguments and attempts to move the file named by the first argument to have the name given as the second. For example:
move "myfile", "myfile2"; # move file to another name
If possible, move
will rename the file using the link
and unlink
functions. If not, it will copy the file using copy
and then delete the original. Note, however, that in this case we cannot set a buffer size as an optional third parameter.
If an error occurs with either copy
or move
, the file system may run out of space. Then the destination file may be incomplete. In the case of a move
that tried to copy the file, this will lose information. In this case, attempting to copy the file and then unlinking the original is safer.
On platforms that care about binary and text files (for example, Windows), to make a copy explicitly binary, use binmode
or make use of the open
pragmatic module described earlier in the chapter.
Here is a rewritten version of the file copy utility we started with. Note that it is not only better, but also it is considerably smaller:
#!/usr/bin/perl
# smartcopy.pl
use warnings;
use strict;
use File::Copy;
print "Filename: ";
my $infile = <>;
chomp $infile;
print "New name: ";
my $outfile = <>;
chomp $outfile;
unless (copy $infile, $outfile) {
print "Failed to copy '$infile' to '$outfile': $!
";
}
As a special case, if the first argument to copy
or move
is a file name and the second is a directory, then the destination file is placed inside the directory with the same name as the source file.
Unix aficionados will be happy to know that the aliases cp
and mv
are available for copy
and move
and can be imported by specifying one or both of them in the import list:
use File::Copy qw(cp mv);
System Level Copies and Platform Portability
As well as the standard copy
, which works with either file names or filehandles, File::Copy
defines the syscopy
subroutine, which provides direct access to the copy
function of the underlying operating system. The copy
subroutine calls syscopy
if both arguments are file names and the second is not a directory (as seen in the previous section); otherwise, it opens whichever argument is not a filehandle and performs a read-write copy through the filehandles.
The syscopy
calls the underlying copy
system call supplied by the operating system and is thus portable across different platforms. Under Unix, it calls the copy
subroutine, as there is no system copy
call. Under Windows, it calls the Win32::CopyFile
module. Under OS/2 and VMS, it calls syscopy
and rmscopy
, respectively. This makes the File::Copy
module an effective way to copy files without worrying about platform dependencies.
The File::Compare
module is a standard member of the Perl standard library that provides portable file comparison features for our applications. It provides two main subroutines, compare
and compare_text
, both of which are available when using the module:
use File::Compare;
The compare
subroutine simply compares two files or filehandles byte for byte, returning 0
if they are equal, 1
if they are not, and -1
if an error was encountered:
SWITCH: foreach (compare $file1, $file2) {
/^0/ and print("Files are equal"), last;
/^1/ and print("Files are not equal"), last;
print "Error comparing files: $!
";
}
compare
also accepts a third optional argument, which if specified defines the size of the buffer used to read from the two files or filehandles. This works in an identical manner to the buffer size of File::Copy
's copy
subroutine, defaulting to the size of the file or 2MB, whichever is smaller, if no buffer size is specified. Note that compare
automatically puts both files into a binary mode for comparison.
The compare_text
function operates identically to compare
but takes as its third argument an optional code reference to an anonymous comparison subroutine. Unlike compare, compare_text
compares files in text mode (assuming that the operating system draws a distinction), so without the third parameter, compare_text
simply compares the two files in text mode.
The comparison subroutine, if supplied, should return a Boolean result that returns 0
if the lines should be considered equal and 1
otherwise. The default that operates when no explicit comparison is provided is equivalent to
sub {$_[0] ne $_[1]}
We can supply our own comparison subroutines to produce different results. For example, this comparison checks files for case-insensitive equivalence:
my $result = compare_text ($file1, $file2, sub {lc($_[0]) ne lc($_[1])});
Similarly, this comparison uses a named subroutine that strips extra whitespace from the start and end of lines before comparing them:
sub stripcmp {
($a, $b) = @_;
$a =˜s/^s*(.*?)s*$/$1/;
$b =˜s/^s*(.*?)s*$/$1/;
return $a ne $b;
}
my $result = compare_text ($file1, $file2, &stripcmp);
For those who prefer more Unix-like nomenclature, cmp
may be used as an alias for compare
by importing it specifically:
use File::Compare qw(cmp);
The File::Find
module provides a multipurpose file-finding subroutine that we can configure to operate in a number of different ways. It supplies one subroutine, find
, which takes a first parameter of either a code or hash reference that configures the details of the search and one or more subsequent parameters defining the starting directory or directories to begin from. A second, finddepth
, finds the same files as find
but traverses them in order of depth. This can be handy in cases when we want to modify the file system as we go, as we will see later.
If the first parameter to either find
or finddepth
is a code reference, then it is treated as a wanted
subroutine that tests for particular properties in the files found. Otherwise, it is a reference to a hash containing at least a wanted
key and code reference value and optionally more of the key-value pairs displayed in Table 13-7.
Table 13-7. File::Find Configuration Fields
The following call to find
searches for and prints out all files under /home
, following symbolic links, untainting as it goes, and skipping over any directory that fails the taint check. At the same time, it pushes the files it finds onto an array to store the results of the search:
my @files;
find({
wanted => sub {
print $File::Find::fullname;
push @files, $File::Find::fullname;
},
follow => 1, untaint => 1, untaint_skip => 1
}, '/home'),
The power of find
lies in the wanted
subroutine. find
does not actually return any value, so without this subroutine the search will be performed but will not actually produce any useful result. In particular, no list of files is built automatically. We must take steps to store the names of files we wish to record within the subroutine if we want to be able to refer to them afterwards. While this is simple enough to do, the File::Find::Wanted
module from CPAN augments File::Find
and fixes this detail by providing a find_wanted
subroutine. Used in place of find
, it modifies the interface behavior of the wanted
subroutine to return a Boolean value, which it uses to build a list of values when the return value is true. The list is then returned from find_wanted
. To specify a wanted
subroutine, we can specify a code reference to an anonymous subroutine (possibly derived from a named subroutine) either directly or as the value of the wanted
key in the configuration hash. Each file that is located is passed to this subroutine, which may perform any actions it likes, including removing or renaming the file. For example, here is a simple utility script that renames all files in the target directory or directories using lowercase format:
#!/usr/bin/perl
# lcall.pl
use warnings;
use strict;
use File::Find;
use File::Copy;
die "Usage: $0 <dir> [<dir>...]
" unless @ARGV;
foreach (@ARGV) {
die "'$_' does not exist
" unless -e $_;
}
sub lcfile {
print "$File::Find::dir - $_
";
move ($_, lc $_);
}
finddepth (&lcfile, @ARGV);
In order to handle subdirectories correctly, we use finddepth
so files are renamed first and the directories that contain them second. We also use the move
subroutine from File::Copy
, since this deals with both files and directories without any special effort on our part.
Within the subroutine, the variable $_
contains the current file name, and the variable $File::Find::dir
contains the directory in which the file was found. If follow
or follow_fast
is in effect, then $File::Find::fullname
contains the complete absolute path to the file with all symbolic links resolved to their true paths. If no_chdir
has been specified, then $_
is the absolute pathname of the file, same as $File::Find::fullname
; otherwise, it is just the leafname of the file.
If follow
or follow_fast
is set, then the wanted
subroutine can make use of the results of the lstat
that both these modes use. File tests can then use the special file name _
without any initial file test or explicit lstat
. Otherwise, no stat
or lstat
has been done, and we need to use an explicit file test on $_
. As a final example, here is a utility script that searches for broken links:
#!/usr/bin/perl
# checklink.pl
use warnings;
use strict;
use File::Find;
my $count = 0;
sub check_link {
if (-l && !-e) {
$count++;
print " $File::Find::name is broken
";
}
}
print "Scanning for broken links in ", join(', ', @ARGV), ":
";
find(&check_link, @ARGV);
print "$count broken links found
";
Note that it has to do both an explicit -l
and -e
to work, since one requires an lstat
and the other a stat
, and we do not get a free lstat
because in this case as we do not want to follow
symbolic links. (In follow
mode, broken links are discarded before the wanted
subroutine is called, which would rather defeat the point.)
Another way to create utilities like this is through the find2perl
script, which comes as standard with Perl. This emulates the syntax of the traditional Unix find
command, but instead of performing a search, it generates a Perl script using File::Find
that emulates the action of the original command in Perl. Typically, the script is faster than find
, and it is also an excellent way to create the starting point for utilities like the examples in this section. For example, here is find2perl
being used to generate a script, called myfind.pl
, that searches for and prints all files ending in .bak
that are a week or more old, starting from the current directory:
> find2perl . -name '*.bak' -type f -mtime +7 -print > myfind.pl
We don't need to specify the -print
option in Perl 5.8 since it is now on by default, but it doesn't do any harm either. find2perl
takes a lot of different options and arguments, including ones not understood by find
, to generate scripts that have different outcomes and purposes such as archiving. This command is, however, a fairly typical example of its use. This is the myfind.pl
script that it produces:
#! /usr/bin/perl -w
eval 'exec /usr/bin/perl -S $0 ${1+"$@"}'
if 0; #$running_under_some_shell
use strict;
use File::Find ();
# Set the variable $File::Find::dont_use_nlink if you're using AFS,
# since AFS cheats.
# for the convenience of &wanted calls, including -eval statements:
use vars qw/*name *dir *prune/;
*name = *File::Find::name;
*dir = *File::Find::dir;
*prune = *File::Find::prune;
# Traverse desired file systems
File::Find::find({wanted => &wanted}, '.'),
exit;
sub wanted {
my ($dev, $ino, $mode, $nlink, $uid, $gid);
/^.*.bakz/s
&& (($dev, $ino, $mode, $nlink, $uid, $gid) = lstat($_))
&& -f _
&& (int(-M _) > 7)
&& print("$name
");
}
Often we want to make a record of the files that are of interest. Since the wanted
subroutine has no way to pass back values to us, the caller, this means adding files to a global array or hash of some kind. Since globals are undesirable, this is an excellent opportunity to make use of a closure: a subroutine and a my
-declared variable nested within a bare block. Here is an example:
#!/usr/bin/perl
# filefindclosure.pl
use strict;
use warnings;
use File::Find;
die "Give me a directory
" unless @ARGV;
{ # closure for processing File::Find results
my @results;
sub wanted { push @results, $File::Find::name }
sub findfiles {
@results=();
find &wanted, $_[0];
return @results;
}
}
foreach my $dir (@ARGV) {
print("Error: $dir is not a directory
"), next unless -d;
my @files=findfiles($dir);
print "$_ contains @files
";
}
For more recent versions of Perl, File::Find
implements its own warnings category to issue diagnostics about any problems it encounters traversing the filing system, such as broken symbolic links or a failure to change to or open a directory. We might not find these warnings that helpful, so we can disable them (but leave all other warnings enabled) with
use warnings;
no warnings 'File::Find';
The File::Basename
module provides subroutines to portably dissect file names. It contains one principal subroutine, fileparse
, which attempts to divide a file name into a leading directory path, a basename, and a suffix:
use File::Basename;
# 'glob' all files with a three character suffix and parse pathname
foreach (</home/*/*.???>) {
my ($path, $leaf, $suffix) = fileparse($_, '.w{3}'),
...
}
The path and basename are determined according to the file naming conventions of the underlying file system, as determined by the operating system or configured with fileparse_set_fstype
. The suffix list, if supplied, provides one or more regular expressions, which are anchored at the end of the file name and tested. The first one that matches is used to separate the suffix from the basename. For example, to find any dot + three letter suffix, we can use .www
, or as in the preceding example, .w{3}
.
To search for a selection of specific suffixes, we can either supply a list or combine all combinations into a single expression. Which we choose depends only on which is more likely to execute faster:
fileparse ($filename, '.txt', '.doc'), # list of suffixes
fileparse ($filename, '.(txt|doc)); # combined regular expression
fileparse ($filename, '.htm', '.html', .shtml); # list of suffixes
fileparse ($filename, '.s?html?)); # combined regular expression
Remember when supplying suffixes that they are regular expressions. Dots in particular must be escaped if they are intended to mean a real dot (however, see the basename
subroutine detailed next for an alternative approach).
In addition to fileparse, File::Basename
supplies two specialized subroutines, basename
and dirname
, which return the leading path and the basename only:
my $path = dirname($filename);
my $leaf = basename($filename, @suffixes);
basename
returns the same result as the first item returned by fileparse
except that metacharacters in the supplied suffixes (if any) are escaped with Q
...E
before being passed to fileparse
. As a result, suffixes are detected and removed from the basename
only if they literally match:
# scan for .txt and .doc with 'fileparse'
my ($path, $leaf, $suffix) = fileparse($filename, '.(txt|doc)'),
Or:
# scan for .txt and .doc with 'basename'
my $leaf = basename($filename, '.txt', '.doc'),
dirname
returns the same result as the second item returned by fileparse
(the leading directory) on most platforms. For Unix and MSDOS, however, it will return .
if there is no leading directory or a directory is supplied as the argument. This differs from the behavior produced by fileparse
:
# scan for leading directory with 'fileparse'
print (fileparse('directory/file'), # produce 'file'
print (fileparse('file')[1]); # produce 'file'
print (fileparse('directory/')[1]; # produce 'directory/'
Or:
# scan for leading directory with 'dirname'
print dirname('directory/file'), # produce 'file'
print dirname('file'), # produce '.'
print dirname('directory/'), # produce '.'
The file system convention for the pathname can be set to one of several different operating systems with the fileparse_set_fstype
configuration subroutine. This can take one of the following case-insensitive values shown in Table 13-8, each corresponding to the appropriate platform.
Table 13-8. File::Basename File System Conventions
Value | Platform |
AmigaOS |
Amiga syntax |
MacOS |
Macintosh (OS9 and earlier) syntax |
MSWin32 |
Microsoft Windows long file names syntax |
MSDOS |
Microsoft DOS short file names (8.3) syntax |
OS2 |
OS/2 syntax |
RISCOS |
Acorn RiscOS syntax |
VMS |
VMS syntax |
If the syntax is not explicitly set with fileparse_set_fstype
, then a default value is deduced from the special variable $^O
(or $OSNAME
with use English
). If $^O
is none of the preceding file system types, Unix-style syntax is assumed. Note that if the pathname contains /
characters, then the format is presumed to be Unix style whatever the file system type specified.
For a more comprehensive approach to portable file name handling, the low-level File::Spec
module provides an interface to several different filing system and platform types. It is extensively used by other modules, including File::Basename
and the File::Glob
modules (and in fact most of the File::
family of modules). We do not usually need to use it directly because these other modules wrap its functionality in more purposeful and friendly ways, but it is useful to know it is there nonetheless. Specific filing system support is provided by submodules like File::Spec::Unix, File::Spec::Win32
, and File::Spec::Mac
. The correct module is used automatically to suit the platform of execution, but if we want to manage Macintosh file names on a Windows system, accessing the platform-specific module will give us the ability to do so.
Several functions of File::Spec
are worth mentioning here, because they relate to the handling of pathnames. The module is an object-oriented one to allow it to be easily used in other file system modules, and so the functions are actually provided as methods, not subroutines—a functional but otherwise identical interface to the available subroutines is offered by the File::Spec::Functions
module. None of the methods shown in Table 13-9 actually touch the filing system directly. Instead, they provide answers to questions like "Is this filing system case insensitive?" and "Is this an absolute or relative file name?"
Table 13-9. File::Spec Methods
Method | Description |
File::Spec->curdir() |
Return the native name for the current working directory—that is, . on most platforms. For the actual path, we need Cwd . |
File::Spec->rootdir() |
Return the native name for the root directory. On Unix, that's / . On Windows and Mac, it depends on the currently active volume. |
File::Spec->devnull() |
The name of the null device, for reading nothing or dumping output to nowhere. /dev/null on Unix, nul on Windows. |
File::Spec->canonpath($path) |
Clean up the passed path into a canonical form, removing cruft like redundant . or trailing / elements appropriately. It does not remove .. elements—see File::Spec->no_upwards(@files) . |
File::Spec->updir() |
Return the native name for the parent directory—that is, .. on most platforms. For the actual path, Cwd and File::Basename are needed. |
File::Spec->no_upwards(@files) |
Examine the list of files and remove upwards directory elements (typically ..—see File::Spec->updir() ) along with the preceding directory element. |
File::Spec->case_tolerant() |
Return true if the platform differentiates upper- and lowercase, false otherwise. |
File::Spec->file_name_is_absolute() |
Return true if the file name is absolute on the current platform. |
File::Spec->path() |
Return the current path, as understood by the underlying shell. This is the PATH environment variable for Unix and Windows, but it varies for other platforms. |
File::Spec->rel2abs($path,$to) |
Return the absolute path given a relative path and, optionally, a base path to attach the relative path to. If not specified, the current working directory is used. |
File::Spec->abs2rel($path,$from) |
The inverse of rel2abs , this takes an absolute path and derives the relative path from the optional base path supplied as the second argument. Again, the current working directory is used if only one argument is supplied. |
In addition to these routines, we also have access to catfile, catdir, catpath, join, splitdir, splitpath
, and tmpdir
. With the exception of tmpdir
, these are all involved in the construction or deconstruction of pathnames to and from their constituent parts. The File::Basename
and File::Path
modules provide a more convenient interface to most of this functionality, so we generally would not need to access the File::Spec
methods directly. The tmpdir
method returns the location of the system-supplied temporary directory, /tmp
on most Unix platforms. It is used by modules that create temporary files, and we discuss it in more detail later on.
To call any of these methods, for example path
, we can use either the object-oriented approach:
use File::Spec; # object-oriented
print File::Spec->path();
or use the equivalent functional interface:
use File::Spec::Functions; # functional
print path();
By default, File::Spec::Functions
automatically exports canonpath, catdir, catfile, curdir, rootdir, updir, no_upwards, file_name_is_absolute
, and path
. We can choose to import all functions with :ALL
or select individual functions in the usual way.
The majority of operating system shells support a wildcard syntax for specifying multiple files. For instance, *.doc
means all files ending with .doc
. Perl provides this same functionality through the file glob operator glob
, which returns a list of all files that match the specified wildcard glob pattern:
my @files = glob '*.pod'; # return all POD documents in current directory
The glob pattern, not to be confused with a regular expression search pattern, accepts any pattern that would normally be accepted by a shell, including directories, wildcard metacharacters such as asterisks (zero-or-more), question marks (zero-or-one), and character classes. The following examples demonstrate the different kinds of glob operation that we can perform:
# match html files in document roots of all virtual hosts
my @html_files = glob '/home/sites/site*/web/*.html';
# match all files in current directory with a three-letter extension
my @three_letter_extensions = '*.???';
# match all files beginning with a to z
my @lcfirst = '[a-z]*';
# match 'file00 to file 49'
my @numbered_files = glob 'file[0-4][0-9]';
# match any file with a name of three or more characters
my @three_or_more_letter_files = glob '???*';
The order in which files are returned is by default sorted alphabetically and case sensitively (so uppercase trumps lowercase). We can alter this behavior by passing flags to the File::Glob
module, which underlies glob
, as well as allow more extended syntaxes than those in the preceding examples.
Before embarking on a closer examination of the glob
function, keep in mind that while the underlying platform-specific glob
modules do a good job of presenting the same interface and features, the opendir, readdir
, and closedir
functions are more reliable in cross-platform use, if more painstaking to use. This is particularly important with older versions of Perl (especially prior to version 5.6) where glob
is less portable.
The glob
operator can be used with two different syntaxes. One, the glob
built-in function, we have already seen:
my @files = glob '*.pl' # explicit glob
The other is to use angle brackets in the style of the readline operator:
my @files = <*.pl> # angle-bracket glob
How does Perl tell whether this is a readline or a glob? When Perl encounters an angle bracket construction, it examines the contents to determine whether it is a syntactically valid filehandle name or not. If it is, the operator is interpreted as a readline. Otherwise, it is handled as a file glob. Which syntax we use is entirely arbitrary. The angle bracket version looks better in loops, but it resembles the readline <>
operator, which can create ambiguity for readers of the code:
foreach (<*.txt>) {
print "$_ is not a textfile!" if !-T;
}
One instance we might want to use glob
is when we want to perform a file glob on a pattern contained in a variable. A variable between angle brackets is ambiguous, so at compile time Perl guesses it is a readline operation. We can insert braces to force Perl to interpret the expression as a file glob, but in these cases it is often simpler to use glob
instead:
@files = <$filespec>; # ERROR: attempts to read lines
@files = <${filespec}>; # ok, but algebraic
@files = glob $filespec; # better
The return value from the globbing operation is a list containing the names of the files that matched. Files are matched according to the current working directory if a relative pattern is supplied; otherwise, they are matched relative to the root of the file system. The returned file names reflect this too, incorporating the leading directory path if one was supplied:
@files = glob '*.html'; # relative path
@files = glob '/home/httpd/web/*.html'; # absolute path
glob
combines well with file test operators and array processing functions like map
and grep
. For example, to locate all text files in the current directory, we can write
my @textfiles = grep {-f && -T _} glob('*'),
The glob
function does not recurse, however. To do the same thing over a directory hierarchy, we can use the File::Find
module with a wanted
subroutine containing something similar:
sub wanted {
push @textfiles, $File::Find::name if -f && -T _;
}
The glob
operator was originally a built-in Perl function, but since version 5.6 it is implemented in terms of the File::Glob
module, which implements Unix-style file globbing and overrides the built-in core glob
. An alternative module, File::DosGlob
, implements Windows/DOS-style globbing, with some extensions.
Unix-Style File Globbing
The standard glob
does file globbing in the style of Unix, but it will still work on other platforms. The forward slash is used as a universal directory separator in patterns and will match matching files on the file system irrespective of the native directory separator. On Windows/DOS systems, the backslash is also accepted as a directory separator.
We automatically trigger use of the File::Glob
module whenever we make use of the glob operator in either of its guises, but we can modify and configure the operator more finely by using the module directly. File::Glob
defines four import tags that can be imported to provide different features, listed in Table 13-10.
Table 13-10. File::Glob Import Tags
Label | Function |
:glob |
Import symbols for the flags of glob 's optional flag argument. See Table 13-11 for a list and description of each flag. |
:case |
Treat the file glob pattern as case sensitive. For example, *.doc will match file.doc but not file.DOC . |
:nocase |
Treat the file glob pattern as case insensitive. For example, *.doc will match both file.doc and file.DOC . |
:globally |
Override the core For example, to import the optional flag symbols and switch the file globbing operator to a case-insensitive mode, we would write use File::Glob qw(:glob :nocase); |
If not explicitly defined, the case sensitivity of glob
is determined by the underlying platform (as expressed by the special variable $^O
). The :case
and :nocase
labels allow us to override this default. For individual uses, temporary case sensitivity can be controlled by passing a flag to the glob
operator instead, as we will see next.
Extended File Globbing
The glob
operator accepts a number of optional flags that modify its behavior. These flags are given as a second parameter to glob
and may be bitwise OR
ed together to produce multiple effects. To import a set of constants to name the flags, use File::Glob
, explicitly specifying the :glob
label:
use File::Glob qw(:glob);
The core glob
function takes only one argument, a prototype, which is still enforced even though it is now based on a two-argument subroutine. To supply flags, we call the glob
subroutine in the File::Glob
package, where the prototype does not apply. For example, to enable brace expansions and match case insensitively, we would use
my @files = File::Glob::glob $filespec, GLOB_BRACE|GLOB_NOCASE;
The full list of flags is displayed in Table 13-11.
Table 13-11. File::Glob Operator Flags
Handling Globbing Errors
If glob
encounters an error, it puts an error message in $!
and sets the package variable File::Glob::GLOB_ERROR
to a non-zero value with a symbolic name defined by the module:
GLOB_NOSPACE |
Perl ran out of memory. |
GLOB_ABEND |
Perl aborted due to an error. |
If the error occurs midway through the scan, and some files have already been found, then the incomplete glob is returned as the result. This means that getting a result from glob
does not necessarily mean that the file glob completed successfully. In cases where this matters, check $File::Glob::GLOB_ERROR
:
@files = glob $filespec;
if ($File::Glob::GLOB_ERROR) {
die "Error globbing '$filespec': $!
";
}
DOS-style file globbing is provided by the File::DosGlob
module, an alternative to File::Glob
that implements file globs in the style of Windows/DOS, with extensions. In order to get DOS-style globbing, we must use this module explicitly, to override the Unix-style globbing that Perl performs automatically (for instance, if we are running on a Windows system, we may receive wildcard input from the user that conforms to DOS rather than Unix style):
use File::DosGlob; # provide File::DosGlob::glob
use File::DosGlob qw(glob); # override core/File::Glob's 'glob'
Unlike File::Glob, File::DosGlob
does not allow us to configure aspects of its operation by specifying labels to the import list, and it does not even override the core glob
unless explicitly asked, as shown in the second example earlier. Even if we do not override glob
, we can call the File::DosGlob
version by naming it in full:
@dosfiles = File::DosGlob::glob ($dosfilespec);
Even with glob
specified in the import list, File::DosGlob
will only override glob
in the current package. To override it everywhere, we can use GLOBAL_glob
:
use File::DosGlob qw(GLOBAL_glob);
This should be used with extreme caution, however, since it might upset code in other modules that expects glob
to work in the Unix style.
Unlike the DOS shell, File::DosGlob
works with wildcarded directory names, so a file spec of C:/*/dir*/file*
will work correctly (although it might take some time to complete).
my @dosfiles = glob ('mydosfilepath*.txt'), # single quoted
The module also understands DOS-style backslashes as directory separators, although these may need to be protected:
my @dosfiles = <my\dos\filepath\*.txt>; # escaped
Any mixture of forward and backslashes is acceptable to File::DosGlob
's glob
(and indeed Perl's built-in one, on Windows); translation into the correct pattern is done transparently and automatically:
my @dosfiles = <my/dos/filepath\*.txt>; # a mixture
To search in file names or directories that include spaces, we can escape them using a backslash (which means that we must interpolate the string and therefore protect literal backslashes):
my @programfiles = <C:\Program Files\*.*>;
If we use the glob
literally, we can also use double quotes if the string is enclosed in single quotes (or the q
quoting operator):
my @programfiles = glob 'C:/"Program Files"/*.*';
This functionality is actually implemented via the Text::ParseWords
module, covered in Chapter 19.
Finally, multiple glob patterns may be specified in the same pattern if they are separated by spaces. For example, to search for all .exe
and .bat
files, we could use
my @executables = glob('*.exe *.bat'),
There have always been two basic approaches for creating temporary files in Perl, depending on whether we just want a scratchpad that we can read and write or want to create a temporary file with a file name that we can pass around. To do the first, we can create a filehandle with IO::File
that points to a temporary file that exists only so long as the filehandle is open. To do the second, we can deduce the name of a unique temporary file and then open and close it like an ordinary file, using the POSIX tmpnam
function.
From Perl 5.6.1, we have a third approach that involves using File::Temp
, which returns both a file name and a filehandle. From Perl 5.8, we have a fourth, an anonymous temporary file that we can create by passing a file name of undef
to the built-in open function. This is essentially the same as the first approach, but using a new native syntax. We covered anonymous temporary files in the last chapter, so here we will examine the other three approaches.
Creating a Temporary Filehandle
Temporary filehandles can be created with the new_tmpfile
method of the IO::File
module. new_tmpfile
takes no arguments and opens a new temporary file in read-update (and binary, for systems that care) mode, returning the generated filehandle. In the event of an error, undef
is returned and $!
is set to indicate the reason. For example:
my $tmphandle = IO::File->new_tmpfile();
unless ($tmphandle) {
print "Could not create temporary filehandle: $!
";
}
Wherever possible, the new_tmpfile
method accesses the operating system tmpfile
library call (on systems that provide it). This makes the file truly anonymous and is the same interface provided by open
in sufficiently modern versions of Perl. On these generally Unix-like systems, a file exists as long as something is using it, even if it no longer has a file name entered in the file system. new_tmpfile
makes use of this fact to remove the file system entry for the file as soon as the file-handle is created, making the temporary file truly anonymous. When the filehandle is closed, the file ceases to exist, since there will no longer be any references to it. This behavior is not supported on platforms that do not support anonymous temporary files, but IO::File
will still create a temporary file for us. See Chapter 12 for more information on filehandles and temporary anonymous files.
Temporary File Names via the POSIX Module
While IO::File
's new_tmpfile
is very convenient for a wide range of temporary file applications, it does not return us a file name that we can use or pass to other programs. To do that, we need to use the POSIX
module and the tmpnam
routine. Since POSIX
is a large module, we can import just tmpnam
with
use POSIX qw(tmpnam);
The tmpnam
routine takes no arguments and returns a temporary file name guaranteed to be unique at the moment of inquiry. For example:
my $tmpname = tmpnam();
print $tmpname; # produces something like '/tmp/fileV9vJXperl'
File names are created with a fixed and unchangeable default path, defined by the P_tmpdir
value given in the C standard library's studio.h
header file. This path can be changed subsequently, but this does not guarantee that the file does not exist in the new directory. To do that, we might resort to a loop like this:
do {
my $tmpname = tmpnam();
$tmpname =˜ m|/ ([^/]+) $| && $tmpname = $1; # strip '/tmp'
$tmpname = $newpath.$tmpname; # add new path
} while (-e $tmpname);
This rather defeats the point of tmpnam
, however, which is to create a temporary file name quickly and easily in a place that is suitable for temporary files (/tmp
on any vaguely Unix-like system). It also does not handle the possibility that other processes might be trying to create temporary files in the same place. This is a significant possibility and a potential source of race conditions. Two processes may call tmpnam
at the same time, get the same file name in return, then both open it. To avoid this, we open the temporary file using sysopen
and specify the O_EXCL
flag, which requires that the file does not yet exist. Here is a short loop that demonstrates a safe way to open the file:
# get an open (and unique) temporary file
do {
my $tmpname = tmpnam();
sysopen TMPFILE, $tmpname, O_RDWR|O_CREAT|O_EXCL;
} until (defined fileno(TMPFILE));
If another process creates the same file in between our call to tmpnam
and the sysopen
, the O_EXCL
will cause it to fail; TMPFILE
will not be open, and so the loop repeats (see the next section for a better approach). Note that if we only intend to write to the file, O_WRONLY
would do just as well, but remember to import the symbols from the POSIX
or Fcntl
modules. Once we have the file open, we can use it:
# place data into the file
print TMPFILE "This is only temporary
";
close TMPFILE;
# use the file - read it, write it some more, pass the file name to another
# process, etc.
# remember to tidy up afterwards!
unlink $tmpname;
Since we have an actual tangible file name, we can pass it to other processes. This is a common approach when reading the output of another command created with a piped open. For example, here is an anonymous FTP command-line client, which we can use to execute commands on a remote FTP server:
#!/usr/bin/perl -w
# ftpclient.pl
use warnings;
use strict;
use POSIX qw(O_RDWR O_CREAT O_EXCL tmpnam);
use Sys::Hostname; # for 'hostname'
die "Simple anonymous FTP command line client
".
"Usage: $0 <server> <command>
" unless scalar(@ARGV)>=2;
my ($ftp_server,@ftp_command)=@ARGV;
# get an open and unique temporary file
my $ftp_resultfile;
do {
# generate a new temporary file name
$ftp_resultfile = tmpnam();
# O_EXCL ensures no other process successfully opens the same file
sysopen FTP_RESULT, $ftp_resultfile, O_RDWR|O_CREAT|O_EXCL;
# failure means something else opened this file name first, try again
} until (defined fileno(FTP_RESULT));
# run ftp client with autologin disabled (using -n)
if (open (FTP, "|ftp -n > $ftp_resultfile 2>&1")) {
print "Client running, sending command
";
# command: open connection to server
print FTP "open $ftp_server
";
# command: specify anonymous user and email as password
my $email=getlogin.'@'.hostname;
print FTP "user anonymous $email
";
# command: send command (interpolate list to space arguments)
print FTP "@ftp_command
";
close FTP;
} else {
die "Failed to run client: $!
";
}
print "Command sent, waiting for response
";
my @ftp_results = <FTP_RESULT>;
check_result(@ftp_results);
close FTP_RESULT;
unlink $ftp_resultfile;
print "Done
";
sub check_result {
return unless @_;
print "Response:
";
# just print out the response for this example
print " $_" foreach @_;
}
We can use this (admittedly simplistic) client like this:
$ ftpclient.pl ftp.alphacomplex.com get briefing.doc
Using File::Temp
As of Perl 5.6.1, we have a better approach to creating temporary files, using the File::Temp
module. This module returns the name and filehandle of a temporary file together. This eliminates the possibility of a race condition. Instead of using sysopen
with the O_EXCL
flag, as we showed in the previous section, File::Temp
provides us with the following much simpler syntax using its tempfile
function:
my ($FILEHANDLE, $filename) = tempfile();
However, tempfile
can take arguments that we can use to gain more control over the created temporary file, as shown in the following:
my ($FILEHANDLE, $filename) = tempfile($template, DIR => $dir, SUFFIX = $suffix);
The template should contain at least four trailing X
s, which would then be replaced with random letters, so $template
could be something like filenameXXXXX
. By specifying an explicit directory with DIR
, we can specify the directory where we want the temporary file to be created. Otherwise, the file will be created in the directory specified for temporary files by the function tmpdir
in File::Spec
.
Finally, at times we might need our temporary file to have a particular suffix, possibly for subsequent processing by other applications. The following will create a temporary file called fileXXXX.tmp
(where the four X
s are replaced with four random letters) in the directory /test/files
:
my ($FILEHANDLE, $filename) = tempfile("fileXXXX", DIR => "/test/files",
SUFFIX => ".tmp");
However, the recommended interface is to call tempfile
in scalar instead of list context, returning only the filehandle:
my $FILEHANDLE = tempfile("fileXXXX", DIR => "/test/files", SUFFIX => ".tmp");
The file itself will be automatically deleted when closed. No way to tamper with the file name means no possibility of creating a race condition.
To create temporary directories, File::Temp
provides us with the tempdir
function. Using the function without argument creates a temporary directory in the directory set by tmpdir
in File::Spec
:
my $tempdir = tempdir();
As with tempfile
, we can specify a template and explicit directory as arguments to tempdir. Here also the template should have at least four trailing X
s that will be translated into four random letters. The DIR
option overrides the value of File::Spec
's tmpdir
:
my $tempdir = tempdir("dirXXXX", DIR => "/test/directory");
This will create a temporary directory called something like /test/directory/dirdnar
, where dnar
are four random letters that replaced the four X
s. If the template included parent directory specifications, then they are removed before the directory is prepended to the template. In the absence of a template, the directory name is generated from an internal template.
Removing the temporary directory and all its files, whether created by File::Temp
or not, can be achieved using the option CLEANUP => 1
.
In addition to the functions tempfile and tempdir, File::Temp
provides Perl implementations of the mktemp
family of temp file generation system calls. These are shown in Table 13-12.
Table 13-12. File::Temp Functions
Funtion | Description |
mkstemp |
Using the provided template, this function returns the name of the temporary file and a filehandle to it: my ($HANDLE, $name) = mkstemp($template);
If we are interested only in the filehandle, then we can use |
mkstemps |
This is similar to my ($HANDLE, $name) = mkstemps($template, $suffix); |
mktemp |
This function returns a temporary file name but does not ensure that the file will not be opened by a different process: my $unopened = mktemp($template); |
Mkdtemp |
This function uses the given template to create a temporary directory. The name of the directory is returned upon success and undefined otherwise: my $dir = mktemp($template); |
Finally, the File::Temp
module provides implementations of the POSIX's tmpname
and tmpfile
functions. As mentioned earlier, POSIX uses the value of P_tmpdir
in the C standard library's studio.h
header file as the directory for the temporary file. File::Temp
, on the other hand, uses the setting of tmpdir
. With a call to mkstemp
using an appropriate template, tmpname
returns a filehandle to the open file and a file name:
my ($HANDLE, $name) = tmpname();
In scalar context, tmpname
uses mktemp
and returns the full name of the temporary file:
my $name = tmpname();
While this ensures that the file does not already exist, it does not guarantee that this will remain the case. In order to avoid a possible race condition, we should use tmpname
in list context.
The File::Temp
implementation of the POSIX's tmpfile
returns the filehandle of a temporary file. There is no access to the file name, and the file is removed when the filehandle is closed or when the program exits:
my $HANDLE = tmpfile();
For further information on File::Temp
, consult the documentation.
Directories are similar to files in many ways; they have names, permissions, and (on platforms that support it) owners. They are significantly different in others ways, however. At their most basic, files can generally be considered to be content, that is, data. Directories, on the other hand, are indices of metadata. Record based, each entry in a directory describes a file, directory, link, or special file that the directory contains. It only makes sense to read a directory in terms of records and no sense at all to write to the directory index directly—the operating system handles that when we manipulate the contents.
Accordingly, operating systems support a selection of functions specifically oriented to handling directories in a record-oriented context, which Perl wraps and makes available to us as a collection of built-in functions with (reasonably) platform-independent semantics. They provide a more portable but lower-level alternative to the glob
function discussed earlier in the chapter.
Directories can also be created and destroyed. Perl supports these operations through the functions mkdir
and rmdir
, which should be synonymously familiar to those with either a Windows or a Unix background. For more advanced applications, the File::Path
module provides enhanced directory-spanning analogues for these functions.
A discussion of directories is not complete without the concept of the current working directory. All of Perl's built-in functions that take a file name as an argument, from open
to the unary file test operators, base their arguments relative to the current working directory whenever the given file name is not absolute. We can both detect and change the current working directory either using Perl's built-in functions or with the more flexible Cwd
module.
Although directories cannot be opened and read like ordinary files, the equivalent is possible using directory handles. For each of the file-based functions open, close, read, seek, tell
, and rewind
, there is an equivalent that performs the same function for directories. For example, opendir
opens a directory and returns a directory handle:
opendir DIRHANDLE, $dirname;
Although similar to filehandles in many respects, directory handles are an entirely separate subspecies; they only work with their own set of built-in functions and even occupy their own internal namespace within a typeglob, so we can quite legally have a filehandle and a directory handle with the same name. Having said that, creating a filehandle and a directory handle with the same name is more than a little confusing.
If opendir
fails for any reason (the obvious ones being that the directory does not exist or is in fact a file), it returns undef
and sets $!
to indicate the reason. Otherwise, we can read the items in the directory using readdir
:
if (opendir DIRHANDLE, $dirname) {
print "$dirname contains: $_
" foreach readdir DIRHANDLE;
}
readdir
is similar in spirit to the readline operator, although we cannot use an equivalent of the <>
syntax to read from a directory filehandle. If we do, Perl thinks we are trying to read from a filehandle with the same name. However, like the readline operator, readdir
can be called in either a scalar context, where it returns the next item in the directory, or in a list context, where it returns all remaining entries:
my $diritem = readdir DIRHANDLE; # read next item
my @diritems = readdir DIRHANDLE; # read all (remaining) items
(Another example of list context is the foreach
in the previous example.)
Rather than return a line from a file, readdir
returns a file name from the directory. We can then go on to test the file name with file test operators or stat/lstat
to find out more about them. However, if we do this, we should take care to append the directory name first or use chdir
; otherwise, the file test will not take place where we found the file but in the current working directory:
opendir DIRHANDLE, '..'; # open parent directory
foreach (readdir DIRHANDLE) {
print "$_ is a directory
" if -d "../$_";
}
closedir DIRHANDLE;
Or, using chdir
:
opendir DIRHANDLE, '..'; # open parent directory
chdir '..'; # change to parent directory
foreach (readdir DIRHANDLE) {
print "$_ is a directory
" if -d; # use $_
}
closedir DIRHANDLE;
Note that when finished with a directory handle, it should be closed, again using a specialized version of close, closedir
. In the event closedir
fails, it also returns undef
and sets $!
to indicate the error. Otherwise, it returns true.
Directory Positions
Directory filehandles also have positions, which can be manipulated with the functions seekdir, telldir
, and rewinddir
, direct directory analogues for the file position functions seek, tell
, and rewind
. Keep in mind that the former set of functions only work on directories (the plain file counterparts also work on directories, but not very usefully), and a directory position set with seekdir
must be deduced from telldir
, in order to know what positions correspond to the start of directory entries:
# find current position of directory handle
my $dpos = telldir DIRHANDLE;
# read an item, moving the position forward
my $item = readdir DIRHANDLE;
# reset position back to position read earlier
seekdir DIRHANDLE, $dpos;
# reset position back to start of directory
rewinddir DIRHANDLE;
Although they are analogous, these functions are not as similar to their file-based counterparts as their names might imply. In particular, seekdir
is not nearly as smart as seek
, because it does not accept an arbitrary position. Instead, seekdir
is only good for setting the position to 0, or a position previously found with telldir
.
Directory Handle Objects
As an alternative to the standard directory handling functions, we can instead use the IO::Dir
module. IO::Dir
inherits basic functionality from IO::File
, then overloads and replaces the file-specific features with equivalent methods for directories.
my $dirh = new IO::Dir($directory);
Each of the standard directory handling functions is supported by a similarly named method in IO::Dir
, minus the trailing dir
. Instead of using opendir
, we can create a new, unassociated IO::Dir
object and then use open
:
my $dirh = new IO::Dir;
my $dirh->open ($directory);
Likewise, we can use read
to read from a directory filehandle, seek, tell
, and rewind
to move around inside the directory, and close
to close it again:
my $entry = $dirh->read; # read an entry
my $dpos = $dirh->tell; # find current position
$dirh->seek($dpos); # set position
$dirh->rewind; # rewind to start
my @entries = $dirh->read; # read all entries
Directories As Tied Hashes
As an alternative to the object-oriented interface, IO::Dir
also supports a tied hash interface, where the directory is represented by a hash and the items in it as the keys of the hash. The values of the hash are lstat
objects created via the File::stat
package, called on the key in question. These are created at the moment that we ask for it so as not to burden the system with unnecessary lstat
calls. If the main purpose of interrogating the directory is to perform stat
-type operations (including file tests), we can save time by using this interface:
# list permissions of all files in current directory
my %directory;
tie %directory, IO::Dir, '.';
foreach (sort keys %directory) {
printf ("$_ has permissions %o
", $directory{$_}->mode & 0777);
}
untie %directory;
IO::Dir
makes use of the tied hash interface to extend its functionality in other ways too. Assigning an integer as the value of an existing key in the hash will cause the access and modification time to be changed to that value. Assigning a reference to an array of two integers will cause the access and modification times to be altered to the first and second, respectively. If, on the other hand, the entry does not exist, then an empty file of the same name is created in the directory, again with the appropriate timestamps:
# set all timestamps to the current time:
@Code:my $now = time;
foreach (keys %directory) {
$directory{$_} = $now;
}
# create a new file, modified one day ago, accessed now:
$directory{'newfile'} = [$now, $now-24 * 60 * 60];
Deleting a key-value pair will also delete a file, but only if the option DIR_UNLINK
is passed to the tie as a fourth parameter:
# delete backup files ending in .bak or ˜
tie %directory, IO::Dir, $dirname, DIR_UNLINK;
foreach (keys %directory) {
delete $directory{$_} if /(.bak|˜)$/;
}
untie %directory;
With DIR_UNLINK
specified, deleting an entry from the hash will either call unlink
or rmdir
on the items in question, depending on whether it is a file or a directory. In the event of failure, the return value is undef
and $!
is set to indicate the error, as usual.
Finding the Name of a Directory or File from Its Handle
As a practical example of using the directory functions, the following example is a solution to the problem of finding out the name of a directory or file starting from a handle, assuming we know the name of the parent directory:
sub find_name {
my ($handle, $parentdir) = @_;
# find device and inode of directory
my ($dev, $ino) = lstat $handle;
open PARENT, $parentdir or return;
foreach (readline PARENT) {
# find device and node of parent directory entry
my ($pdev, $pino) = lstat '../$_';
# if it is a match, we have our man
close PARENT, return $_ if ($pdev == $dev && $pino == $ino);
}
close PARENT;
return; # didn't find it...strange!
}
my $name = find_name (*HANDLE, "/parent/directory");
close HANDLE;
First, we use lstat
to determine the device and inode of the parent directory—or possibly the symbolic link that points to the directory, which is why we use lstat
and not stat
. We then open the parent and scan each entry in turn using lstat
to retrieve its device and inode. If we find a match, we must be talking about the same entry, so the name of this entry must be the name of the file or directory (or a name, on Unix-like platforms where multiple names can exist for the same file).
We can adapt this general technique to cover whole file systems using the File::Find
module, though if we plan to do this a lot, caching the results of previous lstat
commands will greatly improve the run time of subsequent searches.
The simplest way to create and destroy directories is to use the mkdir
and rmdir
functions. These both create or destroy a single directory, starting at the current working directory if the supplied name is relative. For more advanced applications, we can use the File::Path
module, which allows us to create and destroy multiple nested directories.
Creating Single Directories
To create a new directory, we use the built-in mkdir
function. This takes a directory name as an argument and attempts to create a directory with that name. The pathname given to mkdir
may contain parent directories, in which case they must exist for the directory named as the last part of the pathname to be created. If the name is absolute, it is created relative to the root of the filing system. If it is relative, it is created relative to the current working directory:
# relative - create directory 'scripts' in current working directory
mkdir 'scripts';
# absolute - create 'web' in /home/httpd/sites/$site, which must already exist
mkdir "/home/httpd/sites/$site/web";
# relative - create directory 'scripts' in subdirectory 'lib' in current
# working directory. POSSIBLE ERROR: 'lib' must already exist to succeed.
mkdir 'lib/scripts';
mkdir
may be given an optional second parameter consisting of a numeric permissions mask, as described earlier in the chapter. This is generally given as an octal number specifying the read, write, and execute permissions for each of the user, group, and other categories. For example, to create a directory with 755
permissions, we would use
mkdir $dirname, 0755;
We can also use the mode symbols from the Fcntl
module if we import them first. Here is an example of creating a directory with 0775
permissions, using the appropriate Fcntl
symbols:
use Fcntl qw(:mode);
# $dirname with 0775 permissions
mkdir $dirname, S_RWXU | S_RWXG | S_ROTH | S_XOTH;
The second parameter to mkdir
is a permissions mask, also known as umask, not a generic file mode. It applies to permissions only, not the other specialized mode bits such as the sticky, setuid
, or setgid
bits. To set these (on platforms that support them), we must use chmod
after creating the directory.
The setting of umask
may also remove bits; it is merged with the permissions mask parameter to define the value supplied to mkdir
. The default permissions mask is 0777
modified by the umask
setting. (A umask
setting of octal 022
would modify the stated permissions of a created directory from 0777
to 0755
, for example.) This is generally better than specifying a more restricted permissions mask in the program as it allows permissions policy to be controlled by the user.
Creating Multiple Directories
The mkdir
function will only create one directory at a time. To create multiple nested directories, we can use the File::Path
module instead.
File::Path
provides two routines, mkpath
and rmtree
. mkpath
takes a path specification containing one or more directory names separated by a forward slash, a Boolean flag to enable or disable a report of created directories, and a permissions mask in the style of mkdir
. It is essentially an improved mkdir
, with none of the drawbacks of the simpler function. For example, to create a given directory path:
use File::Path;
# create path, reporting all created directories
my $verbose = 1;
my $mask = 0755;
mkpath ('/home/httpd/sites/mysite/web/data/', $verbose, $mask);
One major advantage mkpath
has over mkdir
is that it handles preexisting directories in stride, using them if present and creating new directories otherwise. It also handles directory naming conventions of VMS and OS/2 automatically. In other respects, it is like mkdir
, using the same permission mask and creating directories from the current working directory if given a relative pathname:
# silently create scripts in lib, creating lib first if it does not exist.
mkpath "lib/scripts";
If mkpath
is only given one parameter, as in the preceding example, the verbose flag defaults to 0
, resulting in a silent mkpath
. And like mkdir
, the permissions mask defaults to 0777
.
mkpath
can also create multiple chains of directories if its first argument is a list reference rather than a simple scalar. For instance, to create a whole installation tree for a fictional application, we could use something like this:
mkpath ([
'/usr/local/apps/myapp/bin',
'/usr/local/apps/myapp/doc',
'/usr/local/apps/myapp/lib',
], 1, 0755);
In the event of an error, mkpath
will croak and return with $!
set to the reason of the failed mkdir
. To trap a possible croak
, put the mkpath
into an eval
:
unless (defined eval {mkpath(@paths, 0, 0755)}) {
print "Error from mkpath: $@ ($!)
";
}
Otherwise, mkpath
returns the list of all directories created. If a directory already exists, then it is not added to this list. As any return from mkpath
indicates that the call was successful overall, an empty list means simply that all the directories requested already exist. Since we often do not care if directories were created or not, just so long as they exist, we usually do not actually check the return value, only trap the error as in the preceding example.
Destroying Single Directories
To delete a directory, we use the rmdir
function, which returns 1
on success and 0
otherwise, setting $!
to indicate the reason for the error. rmdir
takes a single directory name as an argument or uses the value of $_
if no file name is given:
rmdir $dirname; # remove dirname
rmdir; # delete directory named by $_
rmdir
typically fails if the given name is not a valid pathname or does not point to a directory (it might be a file or a symbolic link to a directory). It will also fail if the directory is not empty.
Deleting nested directories and directories with contents is more problematic. If we happen to be on a Unix system, logged in as superuser, and if we specified the -U
option to Perl when we started our application, then we can use unlink
to remove the directory regardless of its contents. In general, however, the only recourse we have is to traverse the directory using opendir
, removing files and traversing into subdirectories as we go. Fortunately, we do not have to code this ourselves, as there are a couple of modules that will greatly simplify the process.
Destroying Multiple or Nonempty Directories
As well as mkpath
, the File::Path
module provides a second routine, rmtree
, that performs (loosely speaking) the opposite function.
rmtree
takes three parameters: the first, like mkpath
, is a single scalar directory path. It comprises one or more directories separated by forward slashes, or alternatively a reference to an anonymous array of scalar directory paths. Paths may be either absolute or relative to the current working directory.
The second is, just like mkpath
, a Boolean verbosity flag, set to false by default. If enabled, rmtree
reports on each file or directory it encounters, indicating whether it used unlink
or rmdir
to remove it, or whether it skipped over it. Symbolic links are deleted but not followed.
The third parameter is a safety
flag, also Boolean and false by default. If true, rmtree
will skip over any file for which the program does not have write permission (or more technically, the program's effective user ID does not have write permission), except for VMS, which has the concept of "delete permission." Otherwise, it will attempt to delete it anyway, which depends not on the file's permissions or owner but on the permissions of the parent directory, like rmdir
.
Consider the following simple script, which simply wraps rmtree
:
#!/usr/bin/perl
# rmtree.pl
use strict;
use warnings;
use File::Path;
my $path=$ARGV[0];
my $verbose = 0;
my $safe = 1;
rmtree $path, $verbose, $safe;
With an array reference instead of a scalar pathname, all the paths in the array are deleted. We can remove $path
from the preceding script and replace all of the script below it with
# remove all paths supplied, silently and safely.
rmtree(@ARGV, 0, 1);
On success, rmtree
returns the number of files deleted. On a fatal error, it will croak like mkpath
and can be trapped in the same way. Other, nonfatal, errors are carped (via the Carp
module) and must be trapped by a warning signal handler:
$SIG{__WARN__} = handle_warnings();
If the safety
flag is not set, rmtree
attempts to force the permissions of file directories to make them deletable. In the event of it failing to delete them afterwards, it may also be unable to restore the original permissions, leading to potentially insecure permissions. In all such cases, the problem will be reported via carp
and trapped by the warning signal handler if present.
All of Perl's directory handling functions from opendir
to rmdir
understand both absolute and relative pathnames. Relative is in relation to the current working directory, which initially is the directory that the shell was in when it started our application. Desktop icons, for example, provided by Windows shortcuts, supply the ability to specify the working directory before running the program the shortcut points to. Perl programs started from other processes inherit the current working directory, or CWD for short, of the parent. In a command shell, we commonly find the cd
command changes the current working directory.
We can change the current working directory in Perl with the chdir
function. chdir
takes a directory path as its argument and attempts to change the current working directory accordingly. If the path is absolute, it is taken relative to the root directory; otherwise, it is taken relative to the current working directory. It returns true on success and false on failure. For example:
unless (chdir $newdir) {
"Failed to change to $newdir: $!
";
}
Without an argument, chdir
changes to the home directory, equivalent to entering "cd" on its own on the command line. An argument of undef
also behaves this way, but this is now deprecated behavior since it is too easy to accidentally feed chdir
an undefined value through an unset variable that was meant to hold a file name.
On Windows things are a bit more complicated, since Windows preserves a current directory for each drive available to the system. The current directory as understood by Perl is therefore a combination of the currently selected drive and the current working directory on that drive. If we pass a directory to chdir
without a drive letter, we remain on the current drive.
There is no direct way in Perl to determine what the current working directory is, since the concept means different things to different platforms. Shells often maintain the current working directory in an environment variable that we can simply check, such as $ENV{PWD}
(the name is derived from the Unix pwd
command, which stands for "print working directory"). More formally, we can use either the POSIX
module or the more specialized Cwd
module to find out.
Using the POSIX
module, we can find the current working directory by calling the getcwd
routine, which maps onto the underlying getcwd
or getwd
(regional variations may apply) routine provided by the standard C library. It takes no parameters and returns the current working directory as a string:
use POSIX qw(getcwd);
my $cwd = getcwd;
This will work for most, but not all, platforms—a credible getcwd
or getwd
-like function must be available for the POSIX
module to use it. Alternatively, we can use the Cwd
module. This is a specialized module dedicated to all issues surrounding the current working directory in as portable a way as possible. It supplies three different ways to determine the current directory:
getcwd
and fastcwd
are pure Perl implementations that are therefore maximally portable. cwd
attempts to use the most natural and safe method to retrieve the current working directory supported by the underlying platform, which might be getcwd
or some other operating system interface, depending on whether it be Unix, Windows, VMS, OS/2, and so on.
getcwd
is an implementation of the real getcwd
as provided by POSIX written purely in Perl. It works by opening the parent directory with opendir
, then scanning each file in turn through readdir
and lstat
, looking for a match with the current directory using the first two values returned (the dev
and lno
fields). From this it deduces the name of the current directory, and so on all the way to the top of the filing system. This makes getcwd
slow, but it will work in the absence of additional cooperation from the operating system. getcwd
avoids using chdir
, because having chdir
ed out of the current directory, permissions may not allow it to chdir
back in again. Instead it assembles an increasingly long string of /../../../
to access each directory in turn. This makes it safe but slow.
fastgetcwd
is also a pure Perl implementation. It works just like getcwd
but assumes chdir
is always safe. Instead of accessing each parent directory through an extending string of /
.., it uses chdir
to jump up to the parent directory and analyze it directly. This makes it a lot faster than getcwd
, but it may mean that the current working directory changes if fastgetcwd
fails to restore the current working directory due to its permissions.
cwd
attempts to use the best safe and "natural" underlying mechanism available for determining the current working directory, essentially executing the native command to return the current working directory—on a Unix platform this is the pwd
command, on Windows it is command /c cd
, and so on. It does not use the POSIX
module. If all else fails, the Perl-only getcwd
covered previously is used. This makes it the best solution for most applications, since it takes advantage of OS support if any is available, but it can survive happily (albeit slowly) without. However, it is slower than the POSIX
module because it usually executes an external program.
All three methods will return the true path to the file, resolving and removing any symbolic links (should we be on a platform that supports them) in the pathname. All four functions (including the alias getfastcwd
) are automatically imported when we use the module and are called in the same way, taking no parameters and returning the current working directory:
use Cwd; # import 'getcwd', 'fastcwd', 'fastgetcwd', and 'cwd'
$cwd = getcwd; # slow, safe Perl
$cwd = fastcwd; # faster but potentially unsafe Perl
$cwd = getfastcwd; # alias for 'fastcwd'
$cwd = cwd; # use native platform support
If we only want to use one of these functions, say cwd
, we can tell the module to export just that one function with
use Cwd qw(cwd);
Sometimes we want to find the path to a directory other than the one we are currently in. One way to do that is to chdir
to the directory in question, determine the current working directory, and then chdir
back. Since this is a chore, the Cwd
module encapsulates the process in abs_path
(alias realpath
) and fast_abs_path
functions, each of which can be imported into our application by explicitly naming them. Both take a path to a file or directory and return the true absolute path to it, resolving any symbolic links and instances of .
or ..
as they go:
use Cwd qw(abs_path realpath fast_abs_path);
# find the real path of 'filename'
$absdir = abs_path('symboliclink/filename'),
# 'realpath' is an alias for 'abs_path'
$absdir = realpath('symboliclink/filename'),
# find the real path of our great grand parent directory
$absdir = fast_abs_path('../../..'),
The cwd
function is actually just a wrapper to abs_path
with an argument of .. By contrast, fast_abs_path
is a wrapper for getcwd
that uses chdir
to change to the requested directory beforehand and chdir
again to restore the current working directory afterward.
In addition to the various cwd
functions and the abs_path
routines, Cwd
supplies one more routine, chdir
, that improves the standard built-in chdir
by automatically tracking changes in the environment variable $ENV{PWD}
in the same manner as some shells do. We can have this chdir
override the standard chdir
by importing it specifically:
# override system 'chdir'
use Cwd qw(chdir);
After this, chdir
will automatically update $ENV{PWD}
each time we use it. The original chdir
is still available as CORE::chdir
, of course.
In this chapter, we covered Perl's interaction with the filing system, including the naming of files and directories, testing for the existence of files, using the built-in stat
and lstat
functions, and deleting, renaming, copying, moving, comparing, and finding files.
Doing all of this portably can be a challenge, but fortunately Perl helps us out, first by natively understanding Unix-style filenaming conventions on almost any platform, and second by providing the File::
family of modules for portable file system operations. File::Spec
and File::Spec::Functions
are the underlying foundation for these modules, while modules like File::Basename
and File::Copy
provide higher-level functionality we can use for portable file system manipulation.
We also looked at Perl's glob
operator and the underlying File::Glob::
modules that modern Perls invoke when we use it. We went on to look at the creating and use of temporary files, a special case of filing system interaction that can be very important to get right. Finally, we took a special look at the particular properties and problems of managing directories, which are like files in some ways but quite unlike them in many others.