We’ve covered a lot in this book, but there’s even more. In this appendix, we’ll tell about a little more of what Perl can do, and give some references on where to learn the details. Some of what we mention here is on the bleeding edge and may have changed by the time that you’re reading this book, which is one reason why we frequently send you to the documentation for the full story. We don’t expect many readers to read every word of this appendix, but we hope you’ll at least skim the headings so that you’ll be prepared to fight back when someone tells you “You just can’t use Perl for project X, because Perl can’t do Y.”
The documentation that comes with Perl may seem overwhelming at first. Fortunately, you can use your computer to search for keywords in the documentation. When searching for a particular topic, it’s often good to start with the perltoc (table of contents) and perlfaq (frequently asked questions) sections. On most systems, the perldoc command should be able to track down the documentation for Perl, installed modules, and related programs (including perldoc itself).
Yes, there’s even more about regular expressions than we mentioned. Mastering Regular Expressions by Jeffrey Friedl is one of the best technical books we’ve ever read.[1] It’s half about regular expressions in general, and half about Perl’s regular expressions. It goes into good detail about how the regular expression engine works internally, and why one way of writing a pattern may be much more efficient than another. Anyone who is serious about Perl should read this book. Also see the perlre manpage (and its companion perlretut and perlrequick manpages in newer versions of Perl).
Packages[2] allow you to
compartmentalize the namespaces. Imagine that you have ten
programmers all working on one big project. If you use the global
names $fred
, @barney
,
%betty
, and &wilma
in your
part of the project, what happens when I accidentally use one of
those same names in my part?
Packages let us
keep these separate; I can access your $fred
, and
you can access mine, but not by accident. Packages are needed to make
Perl scalable, so that we can manage large programs.
One of the most common pieces of good advice heard in the Perl discussion forums is that you shouldn’t reinvent the wheel. Other folks have written code that you can put to use. The most frequent way to add to what Perl can do is by using a library or module. Many of these come with Perl, while others are available from CPAN. Of course, you can even write your own libraries and modules.
Many programming languages offer support for libraries much as Perl does. Libraries are collections of (mostly) subroutines for a given purpose. In modern Perl, though, it’s more common to use modules than libraries.
A module is a “smart library”. A module will typically offer a collection of subroutines that act as if they were built in functions, for the most part. Modules are smart in that they keep their details in a separate package, only importing what you request. This keeps a module from stomping on your code’s symbols.
Although many useful modules are written in pure Perl, others are
written using a language like C. For example, the MD5 algorithm is
sort of like a high-powered checksum.[3] It uses a lot of low-level
bit-twiddling that could be done in Perl, but hundreds of times more
slowly;[4] it’s an algorithm that was
designed to be efficiently implemented in C. So, the
Digest::MD5
module is made to use the compiled C
code. When you use that module, it’s as if your Perl had a
built in function to calculate MD5 digests.
Maybe your system already has the module you need. But how can you find out which modules are installed? You can use the program inside, which should be available for download from CPAN in the directory http://www.cpan.org/authors/id/P/PH/PHOENIX/.
If none of the modules already available on your system suits your
needs, you can search for Perl modules on
CPAN at
http://search.cpan.org/. To install a module
on your system, see the
perlmodinstall
manpage.
When using a module, you’ll generally put the required
use
directives at the top of your
program. That makes it easy for someone who is installing your
program on a new system to see at a glance which modules it needs.
We describe some of the most important features[5] of the most important modules[6] in this section. These modules that we discuss here should generally be found on every machine that has Perl, except where mentioned. You can always get the latest ones from CPAN.
Many people use Perl to write programs that a web server will run,
generally called CGI programs. The
CGI
module comes with Perl, while the
CGI_Lite
module is available separately from
CPAN. See Section B.16 later in this
appendix.
Sometimes you need to know what the current working directory’s
name is. (Well, you could often use
".
“, but maybe you need to save the
name so that you can change back to this directory later.) The
Cwd
module, which comes with Perl, provides
the cwd
function, which you can use to
determine the
current working directory.
use Cwd; my $directory = cwd;
If you get tired of writing "or die
"
after every invocation of open
or
chdir
, then maybe the
Fatal
module is for you. Just tell it which
functions to work with, and those will be automatically checked for
failure, as if you’d written "or
die
" and a suitable message after each one. This
won’t affect such calls in someone else’s package (that
is, code contained within a module you’re using, for example),
so don’t use this to fix up poorly written code. It’s
just a timesaver, mostly for simple programs in which you don’t
need direct control over the error message itself. For example:
use Fatal qw/ open chdir /; chdir '/home/merlyn'; # "or die" is now supplied automatically
We covered this module in Chapter 13. It’s primary uses are to portably pull the basename or directory name from a full filename:
use File::Basename; for (@ARGV) { my $basename = basename $_; my $dirname = dirname $_; print "That's file $basename in directory $dirname. "; }
When you need to copy or move files, the
File::Copy
module is for you. (It’s
often tempting to simply call a system program to do these things,
but that’s not portable.) This module provides the functions
move
and copy
, which may be
used much as the corresponding system programs would be used:
use File::Copy; copy("source", "destination") or die "Can't copy 'source' to 'destination': $!";
When you need to manipulate a
filename (more formally called
a
"file
specification”), it’s generally more portable and
reliable to use the
File::Spec
module than to do the work yourself from
Perl. For example, you can use the catfile
function to put together a directory name and a filename to produce a
long filename (as we saw in Chapter 13), but you
don’t have to know whether the system your program is running
on uses a forward slash or some other character to separate those. Or
you could use the curdir
function to get the name
of the current directory (”.
“, on Unix
systems).
The File::Spec
module is object-oriented, but you
don’t need to understand objects to use it. Just call each
function (“method”, really) by using
File::Spec
and a small arrow before the
function’s name, like this:
use File::Spec; my $current_directory = File::Spec->curdir; opendir DOT, $current_directory or die "Can't open current directory '$current_directory': $!";
When you have an
image file, you’ll often want
to know what its height and width are. (This is handy for making
programs that write HTML, if you wish for an IMG tag to indicate the
image’s dimensions.) The
Image::Size
module, which is available from CPAN,
understands the common GIF, JFIF (JPEG), and PNG image types, and
some others. For example:
use Image::Size; # Get the size of fred.png my($fred_height, $fred_width) = imgsize("fred.png"); die "Couldn't get the size of the image" unless defined $fred_height;
If you want your program to be able to send
email through an
SMTP server (which is the way most of
us send email these days, whether you knew that or not), you may use
the Net::SMTP
module to do the work.[7] This module, which is
available from CPAN, is object-oriented, but you may simply follow
the syntax to use it. You will need to change the name of your SMTP
host and the other items to make this work on your system. Your
system administrator or local expert can tell you what to use. For
example:
use Net::SMTP; my $from = 'YOUR_ADDRESS_GOES_HERE'; # maybe [email protected] my $site = 'YOUR_SITE_NAME_GOES_HERE'; # maybe bedrock.edu my $smtp_host = 'YOUR_SMTP_HOST_GOES_HERE'; # maybe mail or mailhost my $to = '[email protected]'; my $smtp = Net::SMTP->new($smtp_host, Hello => $site); $smtp->mail($from); $smtp->to($to); $smtp->data( ); $smtp->datasend("To: $to "); $smtp->datasend("Subject: A message from my Perl program. "); $smtp->datasend(" "); $smtp->datasend("This is just to let you know, "); $smtp->datasend("I don't care what those other people say about you, "); $smtp->datasend("I still think you're doing a great job. "); $smtp->datasend(" "); $smtp->datasend("Have you considered enacting a law naming Perl "); $smtp->datasend("the national programming language? "); $smtp->dataend( ); # Not datasend! $smtp->quit;
If you need access to the POSIX (IEEE Std 1003.1) functions, the
POSIX
module is for you. It provides many
functions that C programmers may be used to, such as trigonometric
functions (asin
, cosh
), general
mathematical functions (floor
,
frexp
), character-identification functions
(isupper
, isalpha
), low-level
IO functions (creat
, open
), and
some others (asctime
, clock
).
You’ll probably want to call each of these with its
“full” name; that is, with POSIX
and a
pair of colons as a prefix to the function’s name:
use POSIX; print "Please enter a number: "; chomp(my $str = <STDIN>); $! = 0; # Clear out the error indicator my($num, $leftover) = POSIX::strtod($str); if ($str eq '') { print "That string was empty! "; } elsif ($leftover) { my $remainder = substr $str, -$leftover; print "The string '$remainder' was left after the number $num. "; } elsif ($!) { print "The conversion function complained: $! "; } else { print "The seemingly-valid number was $num. "; }
The
Sys::Hostname
module provides the
hostname
function, which will be the network
name of your machine, if that can be determined. (If it can’t
be determined, perhaps because your machine is not on the Internet or
not properly configured, the function will die automatically;
there’s no point in using or die
here.) For
example:
use Sys::Hostname; my $host = hostname; print "This machine is known as '$host'. ";
The Text::Wrap
module supplies the
wrap
function, which lets you implement
simple word-wrapping. The first
two parameters specify the indentation of the first line and the
others, respectively; the remaining parameters make up the
paragraph’s text:
use Text::Wrap; my $message = "This is some sample text which may be longer " . "than the width of your output device, so it needs to " . "be wrapped to fit properly as a paragraph. "; $message x= 5; print wrap(" ", "", "$message ");
If you have a time (for example, from the
time
function) that needs to be converted to a
list of year, month, day, hour, minute, and second values, you can do
that with Perl’s built-in localtime
function
in a list context.[8] (In
a scalar context, that gives a nicely formatted string representing
the time, which is more often what you’d want.) But if you need
to go in the other direction, you may use the
timelocal
function from the
Time::Local
module instead. It’s important
to note that the value of $mon
and
$year
for January 2004 are not
1
and 2004
as you might expect,
so be sure to read the documentation before you use this module. For
example:
use Time::Local; my $time = timelocal($sec, $min, $hr, $day, $mon, $year);
Pragmas
are special modules that come with each release of Perl and tell
Perl’s internal compiler something about your code.
You’ve already used the strict
pragma. The
pragmas available for your release of Perl should be listed in the
perlmodlib
manpage.
You use pragmas much like you’d use ordinary modules, with a
use
directive. Some pragmas are lexically scoped, like lexical
(”my
“) variables are, and they
therefore apply to the smallest enclosing block or file. Others may
apply to the entire program or to the current package. (If you
don’t use any packages, the pragmas apply to your entire
program.) Pragmas should generally appear near the top of your source
code. The documentation for each pragma should tell you how
it’s scoped.
If you’ve used other languages, you’ve probably seen the
ability to declare constants in one way or another. Constants are
handy for making a setting just once, near the beginning of a
program, but that can be easily updated if the need arises. Perl can
do this with the package-scoped
constant
pragma, which tells the compiler that a
given identifier has a constant value, which may thus be optimized
wherever it appears. For example:
use constant DEBUGGING => 0; use constant ONE_YEAR => 365.2425 * 24 * 60 * 60; if (DEBUGGING) { # This code will be optimized away unless DEBUGGING is turned on ... }
Perl’s diagnostic messages often seem somewhat cryptic, at
least the first time you see them. But you can always look them up in
the perldiag
manpage to find out what they mean,
and often a little about what’s likely to be the problem and
how to fix it. But you can save yourself the trouble of searching
that manpage if you use the
diagnostics
pragma, which tells Perl to track down
and print out the related information for any message. Unlike most
pragmas, though, this one is not intended for
everyday use, as it makes your program read the entire
perldiag
manpage just to get started. (This is
potentially a significant amount of overhead, both in terms of time
and memory.) Use this pragma only when you’re debugging
and expecting to get error message you
don’t yet understand. It affects your entire program. The
syntax is:
use diagnostics;
It’s nearly always best to install modules in the standard
directories, so that they’re available for everyone, but only
the system administrator can do that. If you install your own
modules, you’ll have to store them in your own
directories—so, how will Perl know where to find them?
That’s what the lib
pragma is all about. It tells Perl that
the given directory is the first place to look for
modules. (That means that it’s also
useful for trying out a new release of a given module.) It affects
all modules loaded from this point on. The syntax is:
use lib '/home/rootbeer/experimental';
Be sure to use a nonrelative pathname as the argument, since there’s no telling what will be the current working directory when your program is run. This is especially important for CGI programs (that is, programs run by a web server).
You’ve been using use
strict
for a while already without having to
understand that it’s a pragma. It’s lexically scoped, and
it enforces some good programming rules. See its documentation to
learn what restrictions are available in your release of Perl.
In the rare case that you truly need a global variable while
use strict
is in effect, you may declare it with
the vars
pragma.[9] This package-scoped pragma tells
Perl that you are intentionally using one or more global variables:
use strict; use vars qw/ $fred $barney /; $fred = "This is a global variable, but that's all right. ";
Starting in Perl version 5.6, you may choose to have lexically scoped
warnings with the
warnings
pragma.[10] That is,
rather than using the -w
option crudely to turn warnings on or
off for the entire program at once, you may specify that you want no
warnings about undefined values in just one section of code, while
other warnings should be available. This also serves as a signal to
the maintenance programmer that says, “I know that this code
would produce warnings, but I know what I’m doing
anyway.” See the documentation for this pragma to learn about
the categories of warnings available in your release of
Perl.
If you’ve got a database, Perl can work with it. This section describes some of the common types of databases.
Perl can directly access some system databases, sometimes with the help of a module. These are databases like the Windows Registry (which holds machine-level settings), or the Unix password database (which lists which username corresponds to which number, and related information), as well as the domain-name database (which lets you translate an IP number into a machine name, and vice versa).
If you’d like to access your own flat-file databases from Perl, there are modules to help you with doing that (seemingly a new one every month or two, so any list here would be out of date). You can even do quite a bit without a module, with what we give in Chapter 16.
Relational databases include Sybase, Oracle, Informix, mysql, and others. These are complex enough that you generally do need to know about modules to use them. But if you use the DBI module, whose name stands for “database-independent,” you can minimize your dependence upon any one type of database—then, if you have to move from mysql to Oracle, say, you might not even need to change anything at all in your program.
Yes, there are more operators and functions than we can fit here,
from the scalar ..
operator to the scalar
,
operator, from wantarray
to
goto
(!), from caller
to
chr
. See the
perlop
and
perlfunc
manpages.
Perl can do just about any kind of mathematics you can dream up.
All of the basic mathematical functions
(square root,
cosine,
logarithm,
absolute
value, and many others) are available as built in functions; see the
perlfunc
manpage for details. Some others (like
tangent or base-10 logarithm) are omitted, but those may be easily
created from the basic ones, or loaded from a simple module that does
so. (See the POSIX
module for many common math functions.)
Although the core of Perl doesn’t directly support them, there
are modules available for working with
complex numbers. These overload the
normal operators and functions, so that you can still multiply with
*
and get a square root with
sqrt
, even when using complex numbers. See the
Math::Complex
module.
Perl has a number of features that make it easy to manipulate an entire list or array.
We mentioned (in Chapter 17) the
map
and grep
list-processing operators. They can do more than we could include
here; see the perlfunc manpage for more
information and examples.
With the splice
operator, you can add items to the middle
of an array, or remove them, letting the array grow or shrink as
needed. (Roughly, this is like what substr
lets
you do with strings.) This effectively eliminates the need for linked
lists in Perl. See the
perlfunc
manpage.
You can work with an array of bits (a
bitstring
) with the
vec
operator, setting bit number 123,
clearing bit number 456, and checking to see the state of bit 789.
Bitstrings may be of arbitrary size. The vec
operator can also work with chunks of other sizes, as long as the
size is a small power of two, so it’s useful if you need to
view a string as a compact array of nybbles, say. See the
perlfunc
manpage.
Perl’s formats are an easy way to make fixed-format template-driven reports with automatic page headers. In fact, they are one of the main reasons Larry developed Perl in the first place, as a Practical Extraction and Report Language. But, alas, they’re limited. The heartbreak of formats happens when someone discovers that he or she needs a little more than what formats provide. This usually means ripping out the program’s entire output section and replacing it with code that doesn’t use formats. Still, if you’re sure that formats do what you need, all that you’ll need, and all that you’ll ever need, they are pretty cool. See the perlform manpage.
If there’s a way that programs on your machine can talk with others, Perl can probably do it. This section shows some common ways.
The standard functions for System V IPC (interprocess communication) are all supported by Perl, so you can use message queues, semaphores, and shared memory. Of course, an array in Perl isn’t stored in a chunk of memory in the same way[11] that an array is stored in C, so shared memory can’t share Perl data as-is. But there are modules that will translate data, so that you can pretend that your Perl data is in shared memory. See the perlfunc manpage and the perlipc module.
Perl has full support for
TCP/IP sockets, which means that you
could write a web server in Perl, or a web browser, Usenet news
server or client, finger daemon or client, FTP daemon or client, SMTP
or POP or SOAP server or client, or either end of pretty much any
other kind of protocol in use on the
Internet. Of course, there’s no need
to get into the low-level details yourself; there are modules
available for all of the common protocols. For example, you can make
a web server or client with the
LWP
module and one or two lines of
additional code.[12] The LWP
module (actually,
a tightly integrated set of modules, which together implement nearly
everything that happens on the Web) is also a great example of
high-quality Perl code, if you’d like to copy from the best.
For other protocols, search for a module with the protocol’s
name.
Perl has a number of strong security-related features that can make a program written in Perl more secure than the corresponding program written in C. Probably the most important of these is data-flow analysis, better known as taint checking . When this is enabled, Perl keeps track of which pieces of data seem to have come from the user or environment (and are therefore untrustworthy). Generally, if any such piece of so-called “tainted” data is used to affect another process, file, or directory, Perl will prohibit the operation and abort the program. It’s not perfect, but it’s a powerful way to prevent some security-related mistakes. There’s more to the story; see the perlsec manpage.
There’s a very good debugger that comes with Perl and supports breakpoints, watchpoints, single-stepping, and generally everything you’d want in a command-line Perl debugger. It’s actually written in Perl (so, if there are bugs in the debugger, we’re not sure how they get those out). But that means that, in addition to all of the usual debugger commands, you can actually run Perl code from the debugger—calling your subroutines, changing variables, even redefining subroutines—while your program is running. See the perldebug manpage for the latest details.
Another debugging tactic is to use the
B::Lint
module, which is still preliminary as of
this writing.
One of the most popular uses for Perl on the Web is in writing CGI programs. These run on a web server to process the results of a form, perform a search, produce dynamic web content, or count the number of accesses to a web page.
The CGI
module, which comes with Perl, provides
an easy way to access the form parameters and to generate some HTML
in responses. (If you don’t want the overhead of the full
CGI
module, the
CGI_Lite
module provides access to the form
parameters without all the rest.) It may be tempting to skip the
module and simply copy-and-paste one of the snippets of code that
purport to give access to the form parameters, but nearly all of
these are buggy.[13]
When writing CGI programs, though, there are several big issues to keep in mind. These make this topic one too broad to fully include in this book:[14]
It’s easy to have several processes that are concurrently trying to access a single file or resource.
No matter how hard you try, you probably won’t be able to test your program thoroughly with more than about 1 or 2% of the web browsers and servers that are in use today.[15] That’s because there are literally thousands of different programs available, with new ones popping up every week. The solution is to follow the standards, so your program will work with all of them.[16]
Since the CGI program runs in a different environment than you’re likely to be able to access directly, you’ll have to learn new techniques for troubleshooting and debugging.
There, we’ve said it again. Don’t forget security—it’s the first and last thing to think about when your program is going to be available to everyone in the world who wants to try breaking it.
And that list didn’t even mention URI-encoding, HTML entities,
HTTP and response codes, Secure Sockets Layer (SSL), Server-side
Includes (SSI), here documents, creating graphics on the fly,
programmatically generating HTML tables, forms, and widgets, hidden
form elements, getting and setting cookies, path info, error
trapping, redirection, taint checking, internationalization and
localization, embedding Perl into HTML (or the other way around),
working with Apache and mod_perl
, and using the
LWP
module.[17] Most or all of those topics should be covered in any good
book on using Perl with the Web.
CGI Programming with
Perl by Scott Guelich, et al. (O’Reilly &
Associates, Inc.) is mighty nice here, as is Lincoln Stein’s
Network Programming with Perl (Addison-Wesley).
There are many different command-line options available in Perl; many let you write useful programs directly from the command line. See the perlrun manpage.
Perl has dozens of built-in variables (like
@ARGV
and $0
), which provide
useful information or control the operation of Perl itself. See the
perlvar manpage.
There are more tricks you could do with Perl syntax, including the
continue
block and the
BEGIN
block. See the
perlsyn
and
perlmod
manpages.
Perl’s references are similar to C’s pointers, but in operation, they’re more like what you have in Pascal or Ada. A reference “points” to a memory location, but because there’s no pointer arithmetic or direct memory allocation and deallocation, you can be sure that any reference you have is a valid one. References allow object-oriented programming and complex data structures, among other nifty tricks. See the perlreftut and perlref manpages.
References allow us to make complex data structures in Perl. For example, suppose you want a two-dimensional array? You can do that,[18] or you can do something much more interesting, like have an array of hashes, a hash of hashes, or a hash of arrays of hashes.[19] See the perldsc (data-structures cookbook) and perllol (lists of lists) manpages.
Yes, Perl has objects; it’s buzzword-compatible with all of those other languages. Object-oriented (OO) programming lets you create your own user-defined datatypes with associated abilities, using inheritance, overriding, and dynamic method lookup.[20]
Unlike some object-oriented languages, though, Perl doesn’t
force you to use objects. (Even many object-oriented modules can be
used without understanding objects.) But if your program is going to
be larger than N lines of code, it may be more efficient for the
programmer (if a tiny bit slower at runtime) to make it
object-oriented. No one knows the precise value of N, but we estimate
it’s around a few thousand or so. See the
perlobj
and
perlboot
manpages for a
start, and Damian Conway’s excellent
Object-Oriented
Perl (Manning Press) for more advanced information.
Odd as it may sound at first, it can be useful to have a subroutine without a name. Such subroutines can be passed as parameters to other subroutines, or they can be accessed via arrays or hashes to make jump tables.
Closures are a powerful concept that comes to Perl from the world of Lisp. A closure is (roughly speaking) an anonymous subroutine with its own private data.
Do you remember how the DBM hash (in Chapter 16) is “magically” connected to a file, so that accesses to the hash are really working with the corresponding DBM file? You can actually make any variable magical in that way. A tied variable may be accessed like any other, but using your own code behind the scenes. So you could make a scalar that is really stored on a remote machine, or an array that always stays sorted. See the perltie manpage.
You can redefine operators like addition, concatenation, comparison,
or even the implicit string-to-number conversion with the
overload
module. This is how a module implementing
complex numbers (for example) can let you multiply a complex number
by 8
to get a complex number as a result.
The basic idea of dynamic loading is that your program decides at runtime that it needs more functionality than what’s currently available, so it loads it up and keeps running. You can always dynamically load Perl code, but it’s even more interesting to dynamically load a binary extension.[21] This is how non-Perl modules are made.
The reverse of dynamic loading (in a sense) is embedding.
Suppose you want to make a really cool word processor, and you start writing it in (say) C++.[22] Now, you decide you want the users to be able to use Perl’s regular expressions for an extra-powerful search-and-replace feature, so you embed Perl into your program. Then you realize that you could open up some of the power of Perl to your users. A power user could write a subroutine in Perl that could become a menu item in your program. Users can customize the operation of your word processor by writing a little Perl. Now you open up a little space on your website where users can share and exchange these Perl snippets, and you’ve got thousands of new programmers extending what your program can do at no extra cost to your company. And how much do you have to pay Larry for all this? Nothing—see the licenses that come with Perl. Larry is a really nice guy. You should at least send him a thank-you note.
Although we don’t know of such a word processor, some folks
have already used this technique to make other powerful programs. One
such example is Apache’s
mod_perl
, which embeds Perl into an
already-powerful web server. If you’re thinking about embedding
Perl, you should check out mod_perl
; since
it’s all open source, you can see just how it works.
If you’ve got old sed and awk programs that you wish were written in Perl, you’re in luck. Not only can Perl do everything that those can do, there’s also a conversion program available, and it’s probably already installed on your system. Check the documentation for s2p (for converting from sed) or a2p (for converting from awk).[23] Since programs don’t write programs as well as people do, the results won’t necessarily be the best Perl—but it’s a start, and it’s easy to tweak. The translated program may be faster or slower than the original, too. But after you’ve fixed up any gross inefficiencies in the machine-written Perl code, it should be comparable.
Do you have C algorithms you want to use from Perl? Well,
you’ve still got some luck on your side; it’s not too
hard to put C code into a compiled module that can be used from Perl.
In fact, any language that compiles to make object code can generally
be used to make a module. See the perlxs
manpage, and the Inline
module, as well as the
SWIG system.
Do you have a shell script that you want to convert to Perl? Your
luck just ran out. There’s no automatic way to convert shell to
Perl. That’s because the shell hardly does anything by itself;
it spends all of its time running other programs. Sure, we could make
a program that would mostly just call system
for
each line of the shell, but that would be much slower than just
letting the shell do things in the first place. It really takes a
human-level of intelligence to see how the shell’s use of
cut, rm,
sed, awk, and
grep can be turned into efficient Perl code.
It’s better to rewrite the shell script from scratch.
A common task for a system administrator is to recursively search the directory tree for certain items. On Unix, this is typically done with the find command. We can do that directly from Perl, too.
The find2perl command, which comes with Perl, takes the same arguments that find does. Instead of finding the requested items, however, the output of find2perl is a Perl program that finds them. Since it’s a program, you can edit it for your own needs. (The program is written in a somewhat odd style.)
One useful argument that’s available in
find2perl but not in the standard
find is the
-eval
option. This says that what follows
it is actual Perl code that should be run each time that a file is
found. When it’s run, the current directory will be the
directory in which some item is found, and $_
will
contain the item’s name.
Here’s an example of how you might use find2perl. Suppose that you’re a system administrator on a Unix machine, and you want to find and remove all of the old files in the /tmp directory.[24] Here’s the command that writes the program to do that:
$ find2perl /tmp -atime +14 -eval unlink >Perl-program
That command says to search in /tmp (and
recursively in subdirectories) for items whose atime (last access
time) is at least 14 days ago. For each item, the program should run
the Perl code unlink
, which will use
$_
by default as the name of a file to remove. The
output (redirected to go into the file
Perl-program) is the program that does all of
this. Now you merely need to arrange for it to be run as needed.
If you’d like to make programs that take
command-line options (like Perl’s
own -w
for warnings, for example), there are
modules that let you do this in a standard way. See the documentation
for the
Getopt::Long
and
Getopt::Std
modules.
Perl’s own documentation is written in pod (plain-old documentation) format. You can embed this documentation in your own programs, and it can then be translated to text, HTML, or many other formats as needed. See the perlpod manpage.
There are other modes to use in opening a filehandle; see the perlopentut manpage.
It’s a small world, after all. In order to work properly in places where even the alphabet is different, Perl has support for locales and Unicode.
Locales tell Perl how things are done locally. For example, does the character æ sort at the end of the alphabet, or between ä and å? And what’s the local name for the third month? See the perllocale manpage (not to be confused with the perllocal manpage).
See the perlunicode manpage for the latest on how your version of Perl deals with Unicode. As of this writing, each new release of Perl has many new Unicode-related changes, but we hope things will settle down soon.
Perl now has support for
threads. Although
this is experimental (as of this writing), it can be a useful tool
for some applications. Using
fork
(where it’s available) is
better supported; see the
perlfork
and
perlthrtut
manpages.
A large and powerful module set is
Tk
, which lets you make on screen
interfaces that work on more
than one platform. See
Learning
Perl/Tk by Nancy Walsh or the upcoming Mastering
Perl/Tk by Nancy Walsh and Steve Lidie (O’Reilly
& Associates, Inc.).
If you check out the module list on CPAN, you’ll find modules for even more purposes, from generating graphs and other images to downloading email, from figuring the amortization of a loan to figuring the time of sunset. New modules are added all the time, so Perl is even more powerful today than it was when we wrote this book. We can’t keep up with it all, so we’ll stop here.
Larry himself says he no longer keeps up with all of the development of Perl, because the Perl universe is big and keeps expanding. And he can’t get bored with Perl, because he can always find another corner of this ever-expanding universe. And we suspect, neither will we. Thank you, Larry!
[1] And we’re not just saying that because it’s also published by O’Reilly & Associates, Inc. It’s really a great book.
[2] The name “package” is perhaps an
unfortunate choice, in that it makes many people think of a
packaged-up chunk of code (in Perl, that’s a module or a
library). All that a package does is define a
namespace (a collection of global symbol
names, like $fred
or
&wilma
). A namespace is not
a chunk of code.
[3] It’s not really a checksum, but that’s good enough for this explanation.
[4] The module Digest::Perl::MD5
is a pure Perl implementation of the MD5 algorithm.
Although your mileage may vary, we found it to be about 280 times
slower than the Digest::MD5
module on one sample
dataset. Remember that many of the bit-twiddling operations in the C
algorithm compile down to a single machine
instruction; thus, entire lines of code can take a mere handful of
clock cycles to run. Perl is fast, but let’s not be
unrealistic.
[5] We’re including here merely the most important features of each module; see the module’s own documentation to learn more.
[6] To be sure, there are other important modules whose use is too complex for most readers of this book, typically because using the module requires understanding Perl’s references or objects.
[7] Yes, this means that you are now able to use Perl to send spam. Please don’t.
[8] The actual return value of
localtime
in a list context is a little different
than you might expect; see the documentation.
[9] If your program
will never be used with a version of Perl prior to 5.6, you should
use the our
keyword instead of the vars
pragma.
[10] If your program
may be used with a version of Perl prior to 5.6, you should not use
the warnings
pragma.
[11] In fact, it would generally be a lie to say that a Perl array is stored in “a chunk of memory” at all, as it’s almost certainly spread among many separate chunks.
[12] Although LWP
makes it easy to make a simple “web browser” that pulls
down a page or image, actually rendering that to the user is another
problem. You can drive an X11 display with Tk or Gtk widgets though,
or use curses to draw on a character terminal. It’s all a
matter of downloading and installing the right modules from
CPAN.
[13] There are some details of the interface that these snippets don’t support. Trust us; it’s better to use a module.
[14] Several of the reviewers who looked over a draft of this book for us wished we could cover more about CGI programming. We agree, but it wouldn’t be fair to the reader to give just enough knowledge to be dangerous. A proper discussion of the problems inherent in CGI programming would probably add at least 50% to the size (and cost) of this book.
[15] Remember that every new release of each brand of browser on each different platform counts as a new one that you’re probably not going to be able to test. We really chuckle when we hear someone tested a web site with “both browsers” or when they say “I don’t know if it works with the other one.”
[16] At the very least, following the standards lets you put the blame squarely on the other programmer, who didn’t.
[17] Do you see why we didn’t try to fit all of that into this book?
[18] Well, not really, but you can fake it so well that you’ll hardly remember that there’s a difference.
[19] Actually, you can’t make any of these things; these are just verbal shorthands for what’s really happening. What we call “an array of arrays” in Perl is really an array of references to arrays.
[20] OO has its own set of jargon words. In fact, the terms used in any one OO language aren’t even the same ones that are typically used in another.
[21] Dynamic loading of binary extensions is generally available if your system supports that. If it doesn’t, you can compile the extensions statically—that is, you can make a Perl binary with the extension built in, ready for use.
[22] That’s probably the language we’d use for writing a word processor. Hey, we love Perl, but we didn’t swear an oath in blood to use no other language. When language X is the best choice, use language X. But often, X equals Perl.
[23] If you’re using gawk or nawk or some other variant, a2p may not be able to convert it. Both of these conversion programs were written long ago and have had few updates except when needed to keep working with new releases of Perl.
[24] This is a task typically done by a cron job at some early-morning hour each day.