Appendix B. Beyond the Llama

We’ve covered a lot in this book, but there’s even more. In this appendix, we’ll tell about a little more of what Perl can do, and give some references on where to learn the details. Some of what we mention here is on the bleeding edge and may have changed by the time that you’re reading this book, which is one reason why we frequently send you to the documentation for the full story. We don’t expect many readers to read every word of this appendix, but we hope you’ll at least skim the headings so that you’ll be prepared to fight back when someone tells you “You just can’t use Perl for project X, because Perl can’t do Y.”

Further Documentation

The documentation that comes with Perl may seem overwhelming at first. Fortunately, you can use your computer to search for keywords in the documentation. When searching for a particular topic, it’s often good to start with the perltoc (table of contents) and perlfaq (frequently asked questions) sections. On most systems, the perldoc command should be able to track down the documentation for Perl, installed modules, and related programs (including perldoc itself).

Regular expressions

Yes, there’s even more about regular expressions than we mentioned. Mastering Regular Expressions by Jeffrey Friedl is one of the best technical books we’ve ever read.[1] It’s half about regular expressions in general, and half about Perl’s regular expressions. It goes into good detail about how the regular expression engine works internally, and why one way of writing a pattern may be much more efficient than another. Anyone who is serious about Perl should read this book. Also see the perlre manpage (and its companion perlretut and perlrequick manpages in newer versions of Perl).

Packages

Packages[2] allow you to compartmentalize the namespaces. Imagine that you have ten programmers all working on one big project. If you use the global names $fred, @barney, %betty, and &wilma in your part of the project, what happens when I accidentally use one of those same names in my part? Packages let us keep these separate; I can access your $fred, and you can access mine, but not by accident. Packages are needed to make Perl scalable, so that we can manage large programs.

Extending Perl’s Functionality

One of the most common pieces of good advice heard in the Perl discussion forums is that you shouldn’t reinvent the wheel. Other folks have written code that you can put to use. The most frequent way to add to what Perl can do is by using a library or module. Many of these come with Perl, while others are available from CPAN. Of course, you can even write your own libraries and modules.

Libraries

Many programming languages offer support for libraries much as Perl does. Libraries are collections of (mostly) subroutines for a given purpose. In modern Perl, though, it’s more common to use modules than libraries.

Modules

A module is a “smart library”. A module will typically offer a collection of subroutines that act as if they were built in functions, for the most part. Modules are smart in that they keep their details in a separate package, only importing what you request. This keeps a module from stomping on your code’s symbols.

Although many useful modules are written in pure Perl, others are written using a language like C. For example, the MD5 algorithm is sort of like a high-powered checksum.[3] It uses a lot of low-level bit-twiddling that could be done in Perl, but hundreds of times more slowly;[4] it’s an algorithm that was designed to be efficiently implemented in C. So, the Digest::MD5 module is made to use the compiled C code. When you use that module, it’s as if your Perl had a built in function to calculate MD5 digests.

Finding and Installing Modules

Maybe your system already has the module you need. But how can you find out which modules are installed? You can use the program inside, which should be available for download from CPAN in the directory http://www.cpan.org/authors/id/P/PH/PHOENIX/.

If none of the modules already available on your system suits your needs, you can search for Perl modules on CPAN at http://search.cpan.org/. To install a module on your system, see the perlmodinstall manpage.

When using a module, you’ll generally put the required use directives at the top of your program. That makes it easy for someone who is installing your program on a new system to see at a glance which modules it needs.

Writing Your Own Modules

In the rare case that there’s no module to do what you need, an advanced programmer can write a new one, either in Perl or in another language (often C). See the perlmod and perlmodlib manpages for more information.

Some Important Modules

We describe some of the most important features[5] of the most important modules[6] in this section. These modules that we discuss here should generally be found on every machine that has Perl, except where mentioned. You can always get the latest ones from CPAN.

The CGI and CGI_Lite Modules

Many people use Perl to write programs that a web server will run, generally called CGI programs. The CGI module comes with Perl, while the CGI_Lite module is available separately from CPAN. See Section B.16 later in this appendix.

The Cwd Module

Sometimes you need to know what the current working directory’s name is. (Well, you could often use ".“, but maybe you need to save the name so that you can change back to this directory later.) The Cwd module, which comes with Perl, provides the cwd function, which you can use to determine the current working directory.

use Cwd;

my $directory = cwd;

The Fatal Module

If you get tired of writing "or die" after every invocation of open or chdir, then maybe the Fatal module is for you. Just tell it which functions to work with, and those will be automatically checked for failure, as if you’d written "or die" and a suitable message after each one. This won’t affect such calls in someone else’s package (that is, code contained within a module you’re using, for example), so don’t use this to fix up poorly written code. It’s just a timesaver, mostly for simple programs in which you don’t need direct control over the error message itself. For example:

use Fatal qw/ open chdir /;

chdir '/home/merlyn';  # "or die" is now supplied automatically

The File::Basename Module

We covered this module in Chapter 13. It’s primary uses are to portably pull the basename or directory name from a full filename:

use File::Basename;

for (@ARGV) {
  my $basename = basename $_;
  my $dirname = dirname $_;
  print "That's file $basename in directory $dirname.
";
}

The File::Copy Module

When you need to copy or move files, the File::Copy module is for you. (It’s often tempting to simply call a system program to do these things, but that’s not portable.) This module provides the functions move and copy, which may be used much as the corresponding system programs would be used:

use File::Copy;

copy("source", "destination")
  or die "Can't copy 'source' to 'destination': $!";

The File::Spec Module

When you need to manipulate a filename (more formally called a "file specification”), it’s generally more portable and reliable to use the File::Spec module than to do the work yourself from Perl. For example, you can use the catfile function to put together a directory name and a filename to produce a long filename (as we saw in Chapter 13), but you don’t have to know whether the system your program is running on uses a forward slash or some other character to separate those. Or you could use the curdir function to get the name of the current directory (”.“, on Unix systems).

The File::Spec module is object-oriented, but you don’t need to understand objects to use it. Just call each function (“method”, really) by using File::Spec and a small arrow before the function’s name, like this:

use File::Spec;

my $current_directory = File::Spec->curdir;
opendir DOT, $current_directory
  or die "Can't open current directory '$current_directory': $!";

The Image::Size Module

When you have an image file, you’ll often want to know what its height and width are. (This is handy for making programs that write HTML, if you wish for an IMG tag to indicate the image’s dimensions.) The Image::Size module, which is available from CPAN, understands the common GIF, JFIF (JPEG), and PNG image types, and some others. For example:

use Image::Size;

# Get the size of fred.png
my($fred_height, $fred_width) = imgsize("fred.png");
die "Couldn't get the size of the image"
  unless defined $fred_height;

The Net::SMTP Module

If you want your program to be able to send email through an SMTP server (which is the way most of us send email these days, whether you knew that or not), you may use the Net::SMTP module to do the work.[7] This module, which is available from CPAN, is object-oriented, but you may simply follow the syntax to use it. You will need to change the name of your SMTP host and the other items to make this work on your system. Your system administrator or local expert can tell you what to use. For example:

use Net::SMTP;

my $from = 'YOUR_ADDRESS_GOES_HERE';         # maybe [email protected]
my $site = 'YOUR_SITE_NAME_GOES_HERE';       # maybe bedrock.edu
my $smtp_host = 'YOUR_SMTP_HOST_GOES_HERE';  # maybe mail or mailhost
my $to = '[email protected]';

my $smtp = Net::SMTP->new($smtp_host, Hello => $site);

$smtp->mail($from);
$smtp->to($to);
$smtp->data( );

$smtp->datasend("To: $to
");
$smtp->datasend("Subject: A message from my Perl program.
");
$smtp->datasend("
");
$smtp->datasend("This is just to let you know,
");
$smtp->datasend("I don't care what those other people say about you,
");
$smtp->datasend("I still think you're doing a great job.
");
$smtp->datasend("
");
$smtp->datasend("Have you considered enacting a law naming Perl 
");
$smtp->datasend("the national programming language?
");

$smtp->dataend( );                             # Not datasend!
$smtp->quit;

The POSIX Module

If you need access to the POSIX (IEEE Std 1003.1) functions, the POSIX module is for you. It provides many functions that C programmers may be used to, such as trigonometric functions (asin, cosh), general mathematical functions (floor, frexp), character-identification functions (isupper, isalpha), low-level IO functions (creat, open), and some others (asctime, clock). You’ll probably want to call each of these with its “full” name; that is, with POSIX and a pair of colons as a prefix to the function’s name:

use POSIX;

print "Please enter a number: ";
chomp(my $str = <STDIN>);

$! = 0;  # Clear out the error indicator
my($num, $leftover) = POSIX::strtod($str);

if ($str eq '') {
  print "That string was empty!
";
} elsif ($leftover) {
  my $remainder = substr $str, -$leftover;
  print "The string '$remainder' was left after the number $num.
";
} elsif ($!) {
  print "The conversion function complained: $!
";
} else {
  print "The seemingly-valid number was $num.
";
}

The Sys::Hostname Module

The Sys::Hostname module provides the hostname function, which will be the network name of your machine, if that can be determined. (If it can’t be determined, perhaps because your machine is not on the Internet or not properly configured, the function will die automatically; there’s no point in using or die here.) For example:

use Sys::Hostname;
my $host = hostname;
print "This machine is known as '$host'.
";

The Text::Wrap Module

The Text::Wrap module supplies the wrap function, which lets you implement simple word-wrapping. The first two parameters specify the indentation of the first line and the others, respectively; the remaining parameters make up the paragraph’s text:

use Text::Wrap;

my $message = "This is some sample text which may be longer " .
  "than the width of your output device, so it needs to " .
  "be wrapped to fit properly as a paragraph. ";
$message x= 5;

print wrap("	", "", "$message
");

The Time::Local Module

If you have a time (for example, from the time function) that needs to be converted to a list of year, month, day, hour, minute, and second values, you can do that with Perl’s built-in localtime function in a list context.[8] (In a scalar context, that gives a nicely formatted string representing the time, which is more often what you’d want.) But if you need to go in the other direction, you may use the timelocal function from the Time::Local module instead. It’s important to note that the value of $mon and $year for January 2004 are not 1 and 2004 as you might expect, so be sure to read the documentation before you use this module. For example:

use Time::Local;

my $time = timelocal($sec, $min, $hr, $day, $mon, $year);

Pragmas

Pragmas are special modules that come with each release of Perl and tell Perl’s internal compiler something about your code. You’ve already used the strict pragma. The pragmas available for your release of Perl should be listed in the perlmodlib manpage.

You use pragmas much like you’d use ordinary modules, with a use directive. Some pragmas are lexically scoped, like lexical (”my“) variables are, and they therefore apply to the smallest enclosing block or file. Others may apply to the entire program or to the current package. (If you don’t use any packages, the pragmas apply to your entire program.) Pragmas should generally appear near the top of your source code. The documentation for each pragma should tell you how it’s scoped.

The constant Pragma

If you’ve used other languages, you’ve probably seen the ability to declare constants in one way or another. Constants are handy for making a setting just once, near the beginning of a program, but that can be easily updated if the need arises. Perl can do this with the package-scoped constant pragma, which tells the compiler that a given identifier has a constant value, which may thus be optimized wherever it appears. For example:

use constant DEBUGGING => 0;
use constant ONE_YEAR => 365.2425 * 24 * 60 * 60;

if (DEBUGGING) {
  # This code will be optimized away unless DEBUGGING is turned on
  ...
}

The diagnostics Pragma

Perl’s diagnostic messages often seem somewhat cryptic, at least the first time you see them. But you can always look them up in the perldiag manpage to find out what they mean, and often a little about what’s likely to be the problem and how to fix it. But you can save yourself the trouble of searching that manpage if you use the diagnostics pragma, which tells Perl to track down and print out the related information for any message. Unlike most pragmas, though, this one is not intended for everyday use, as it makes your program read the entire perldiag manpage just to get started. (This is potentially a significant amount of overhead, both in terms of time and memory.) Use this pragma only when you’re debugging and expecting to get error message you don’t yet understand. It affects your entire program. The syntax is:

use diagnostics;

The lib Pragma

It’s nearly always best to install modules in the standard directories, so that they’re available for everyone, but only the system administrator can do that. If you install your own modules, you’ll have to store them in your own directories—so, how will Perl know where to find them? That’s what the lib pragma is all about. It tells Perl that the given directory is the first place to look for modules. (That means that it’s also useful for trying out a new release of a given module.) It affects all modules loaded from this point on. The syntax is:

use lib '/home/rootbeer/experimental';

Be sure to use a nonrelative pathname as the argument, since there’s no telling what will be the current working directory when your program is run. This is especially important for CGI programs (that is, programs run by a web server).

The strict Pragma

You’ve been using use strict for a while already without having to understand that it’s a pragma. It’s lexically scoped, and it enforces some good programming rules. See its documentation to learn what restrictions are available in your release of Perl.

The vars Pragma

In the rare case that you truly need a global variable while use strict is in effect, you may declare it with the vars pragma.[9] This package-scoped pragma tells Perl that you are intentionally using one or more global variables:

use strict;
use vars qw/ $fred $barney /;

$fred = "This is a global variable, but that's all right.
";

The warnings Pragma

Starting in Perl version 5.6, you may choose to have lexically scoped warnings with the warnings pragma.[10] That is, rather than using the -w option crudely to turn warnings on or off for the entire program at once, you may specify that you want no warnings about undefined values in just one section of code, while other warnings should be available. This also serves as a signal to the maintenance programmer that says, “I know that this code would produce warnings, but I know what I’m doing anyway.” See the documentation for this pragma to learn about the categories of warnings available in your release of Perl.

Databases

If you’ve got a database, Perl can work with it. This section describes some of the common types of databases.

Direct System Database Access

Perl can directly access some system databases, sometimes with the help of a module. These are databases like the Windows Registry (which holds machine-level settings), or the Unix password database (which lists which username corresponds to which number, and related information), as well as the domain-name database (which lets you translate an IP number into a machine name, and vice versa).

Flat-file Database Access

If you’d like to access your own flat-file databases from Perl, there are modules to help you with doing that (seemingly a new one every month or two, so any list here would be out of date). You can even do quite a bit without a module, with what we give in Chapter 16.

Relational Database Access

Relational databases include Sybase, Oracle, Informix, mysql, and others. These are complex enough that you generally do need to know about modules to use them. But if you use the DBI module, whose name stands for “database-independent,” you can minimize your dependence upon any one type of database—then, if you have to move from mysql to Oracle, say, you might not even need to change anything at all in your program.

Other Operators and Functions

Yes, there are more operators and functions than we can fit here, from the scalar .. operator to the scalar , operator, from wantarray to goto(!), from caller to chr. See the perlop and perlfunc manpages.

Transliteration with tr///

The tr/// operator looks like a regular expression, but it’s really for transliterating one group of characters into another. It can also efficiently count selected characters. See the perlop manpage.

Here documents

Here documents are a useful form of multiline string quoting; see the perldata manpage.

Mathematics

Perl can do just about any kind of mathematics you can dream up.

Advanced Math Functions

All of the basic mathematical functions (square root, cosine, logarithm, absolute value, and many others) are available as built in functions; see the perlfunc manpage for details. Some others (like tangent or base-10 logarithm) are omitted, but those may be easily created from the basic ones, or loaded from a simple module that does so. (See the POSIX module for many common math functions.)

Imaginary and Complex Numbers

Although the core of Perl doesn’t directly support them, there are modules available for working with complex numbers. These overload the normal operators and functions, so that you can still multiply with * and get a square root with sqrt, even when using complex numbers. See the Math::Complex module.

Large and High-Precision Numbers

You can do math with arbitrarily large numbers with an arbitrary number of digits of accuracy. For example, you could calculate the factorial of two thousand, or determine π to ten-thousand digits. See the Math::BigInt and Math::BigFloat modules.

Lists and Arrays

Perl has a number of features that make it easy to manipulate an entire list or array.

map and grep

We mentioned (in Chapter 17) the map and grep list-processing operators. They can do more than we could include here; see the perlfunc manpage for more information and examples.

The splice Operator

With the splice operator, you can add items to the middle of an array, or remove them, letting the array grow or shrink as needed. (Roughly, this is like what substr lets you do with strings.) This effectively eliminates the need for linked lists in Perl. See the perlfunc manpage.

Bits and Pieces

You can work with an array of bits (a bitstring ) with the vec operator, setting bit number 123, clearing bit number 456, and checking to see the state of bit 789. Bitstrings may be of arbitrary size. The vec operator can also work with chunks of other sizes, as long as the size is a small power of two, so it’s useful if you need to view a string as a compact array of nybbles, say. See the perlfunc manpage.

Formats

Perl’s formats are an easy way to make fixed-format template-driven reports with automatic page headers. In fact, they are one of the main reasons Larry developed Perl in the first place, as a Practical Extraction and Report Language. But, alas, they’re limited. The heartbreak of formats happens when someone discovers that he or she needs a little more than what formats provide. This usually means ripping out the program’s entire output section and replacing it with code that doesn’t use formats. Still, if you’re sure that formats do what you need, all that you’ll need, and all that you’ll ever need, they are pretty cool. See the perlform manpage.

Networking and IPC

If there’s a way that programs on your machine can talk with others, Perl can probably do it. This section shows some common ways.

System V IPC

The standard functions for System V IPC (interprocess communication) are all supported by Perl, so you can use message queues, semaphores, and shared memory. Of course, an array in Perl isn’t stored in a chunk of memory in the same way[11] that an array is stored in C, so shared memory can’t share Perl data as-is. But there are modules that will translate data, so that you can pretend that your Perl data is in shared memory. See the perlfunc manpage and the perlipc module.

Sockets

Perl has full support for TCP/IP sockets, which means that you could write a web server in Perl, or a web browser, Usenet news server or client, finger daemon or client, FTP daemon or client, SMTP or POP or SOAP server or client, or either end of pretty much any other kind of protocol in use on the Internet. Of course, there’s no need to get into the low-level details yourself; there are modules available for all of the common protocols. For example, you can make a web server or client with the LWP module and one or two lines of additional code.[12] The LWP module (actually, a tightly integrated set of modules, which together implement nearly everything that happens on the Web) is also a great example of high-quality Perl code, if you’d like to copy from the best. For other protocols, search for a module with the protocol’s name.

Security

Perl has a number of strong security-related features that can make a program written in Perl more secure than the corresponding program written in C. Probably the most important of these is data-flow analysis, better known as taint checking . When this is enabled, Perl keeps track of which pieces of data seem to have come from the user or environment (and are therefore untrustworthy). Generally, if any such piece of so-called “tainted” data is used to affect another process, file, or directory, Perl will prohibit the operation and abort the program. It’s not perfect, but it’s a powerful way to prevent some security-related mistakes. There’s more to the story; see the perlsec manpage.

Debugging

There’s a very good debugger that comes with Perl and supports breakpoints, watchpoints, single-stepping, and generally everything you’d want in a command-line Perl debugger. It’s actually written in Perl (so, if there are bugs in the debugger, we’re not sure how they get those out). But that means that, in addition to all of the usual debugger commands, you can actually run Perl code from the debugger—calling your subroutines, changing variables, even redefining subroutines—while your program is running. See the perldebug manpage for the latest details.

Another debugging tactic is to use the B::Lint module, which is still preliminary as of this writing.

The Common Gateway Interface (CGI)

One of the most popular uses for Perl on the Web is in writing CGI programs. These run on a web server to process the results of a form, perform a search, produce dynamic web content, or count the number of accesses to a web page.

The CGI module, which comes with Perl, provides an easy way to access the form parameters and to generate some HTML in responses. (If you don’t want the overhead of the full CGI module, the CGI_Lite module provides access to the form parameters without all the rest.) It may be tempting to skip the module and simply copy-and-paste one of the snippets of code that purport to give access to the form parameters, but nearly all of these are buggy.[13]

When writing CGI programs, though, there are several big issues to keep in mind. These make this topic one too broad to fully include in this book:[14]

Security, security, security

We can’t overemphasize security. Somewhere around half of the successful attacks on computers around the world involve a security-related bug in a CGI program.

Concurrency issues

It’s easy to have several processes that are concurrently trying to access a single file or resource.

Standards compliance

No matter how hard you try, you probably won’t be able to test your program thoroughly with more than about 1 or 2% of the web browsers and servers that are in use today.[15] That’s because there are literally thousands of different programs available, with new ones popping up every week. The solution is to follow the standards, so your program will work with all of them.[16]

Troubleshooting and debugging

Since the CGI program runs in a different environment than you’re likely to be able to access directly, you’ll have to learn new techniques for troubleshooting and debugging.

Security, security, security!

There, we’ve said it again. Don’t forget security—it’s the first and last thing to think about when your program is going to be available to everyone in the world who wants to try breaking it.

And that list didn’t even mention URI-encoding, HTML entities, HTTP and response codes, Secure Sockets Layer (SSL), Server-side Includes (SSI), here documents, creating graphics on the fly, programmatically generating HTML tables, forms, and widgets, hidden form elements, getting and setting cookies, path info, error trapping, redirection, taint checking, internationalization and localization, embedding Perl into HTML (or the other way around), working with Apache and mod_perl, and using the LWP module.[17] Most or all of those topics should be covered in any good book on using Perl with the Web. CGI Programming with Perl by Scott Guelich, et al. (O’Reilly & Associates, Inc.) is mighty nice here, as is Lincoln Stein’s Network Programming with Perl (Addison-Wesley).

Command-Line Options

There are many different command-line options available in Perl; many let you write useful programs directly from the command line. See the perlrun manpage.

Built in Variables

Perl has dozens of built-in variables (like @ARGV and $0), which provide useful information or control the operation of Perl itself. See the perlvar manpage.

Syntax Extensions

There are more tricks you could do with Perl syntax, including the continue block and the BEGIN block. See the perlsyn and perlmod manpages.

References

Perl’s references are similar to C’s pointers, but in operation, they’re more like what you have in Pascal or Ada. A reference “points” to a memory location, but because there’s no pointer arithmetic or direct memory allocation and deallocation, you can be sure that any reference you have is a valid one. References allow object-oriented programming and complex data structures, among other nifty tricks. See the perlreftut and perlref manpages.

Complex Data Structures

References allow us to make complex data structures in Perl. For example, suppose you want a two-dimensional array? You can do that,[18] or you can do something much more interesting, like have an array of hashes, a hash of hashes, or a hash of arrays of hashes.[19] See the perldsc (data-structures cookbook) and perllol (lists of lists) manpages.

Object-Oriented Programming

Yes, Perl has objects; it’s buzzword-compatible with all of those other languages. Object-oriented (OO) programming lets you create your own user-defined datatypes with associated abilities, using inheritance, overriding, and dynamic method lookup.[20]

Unlike some object-oriented languages, though, Perl doesn’t force you to use objects. (Even many object-oriented modules can be used without understanding objects.) But if your program is going to be larger than N lines of code, it may be more efficient for the programmer (if a tiny bit slower at runtime) to make it object-oriented. No one knows the precise value of N, but we estimate it’s around a few thousand or so. See the perlobj and perlboot manpages for a start, and Damian Conway’s excellent Object-Oriented Perl (Manning Press) for more advanced information.

Anonymous Subroutines and Closures

Odd as it may sound at first, it can be useful to have a subroutine without a name. Such subroutines can be passed as parameters to other subroutines, or they can be accessed via arrays or hashes to make jump tables.

Closures are a powerful concept that comes to Perl from the world of Lisp. A closure is (roughly speaking) an anonymous subroutine with its own private data.

Tied Variables

Do you remember how the DBM hash (in Chapter 16) is “magically” connected to a file, so that accesses to the hash are really working with the corresponding DBM file? You can actually make any variable magical in that way. A tied variable may be accessed like any other, but using your own code behind the scenes. So you could make a scalar that is really stored on a remote machine, or an array that always stays sorted. See the perltie manpage.

Operator Overloading

You can redefine operators like addition, concatenation, comparison, or even the implicit string-to-number conversion with the overload module. This is how a module implementing complex numbers (for example) can let you multiply a complex number by 8 to get a complex number as a result.

Dynamic Loading

The basic idea of dynamic loading is that your program decides at runtime that it needs more functionality than what’s currently available, so it loads it up and keeps running. You can always dynamically load Perl code, but it’s even more interesting to dynamically load a binary extension.[21] This is how non-Perl modules are made.

Embedding

The reverse of dynamic loading (in a sense) is embedding.

Suppose you want to make a really cool word processor, and you start writing it in (say) C++.[22] Now, you decide you want the users to be able to use Perl’s regular expressions for an extra-powerful search-and-replace feature, so you embed Perl into your program. Then you realize that you could open up some of the power of Perl to your users. A power user could write a subroutine in Perl that could become a menu item in your program. Users can customize the operation of your word processor by writing a little Perl. Now you open up a little space on your website where users can share and exchange these Perl snippets, and you’ve got thousands of new programmers extending what your program can do at no extra cost to your company. And how much do you have to pay Larry for all this? Nothing—see the licenses that come with Perl. Larry is a really nice guy. You should at least send him a thank-you note.

Although we don’t know of such a word processor, some folks have already used this technique to make other powerful programs. One such example is Apache’s mod_perl , which embeds Perl into an already-powerful web server. If you’re thinking about embedding Perl, you should check out mod_perl; since it’s all open source, you can see just how it works.

Converting Other Languages to Perl

If you’ve got old sed and awk programs that you wish were written in Perl, you’re in luck. Not only can Perl do everything that those can do, there’s also a conversion program available, and it’s probably already installed on your system. Check the documentation for s2p (for converting from sed) or a2p (for converting from awk).[23] Since programs don’t write programs as well as people do, the results won’t necessarily be the best Perl—but it’s a start, and it’s easy to tweak. The translated program may be faster or slower than the original, too. But after you’ve fixed up any gross inefficiencies in the machine-written Perl code, it should be comparable.

Do you have C algorithms you want to use from Perl? Well, you’ve still got some luck on your side; it’s not too hard to put C code into a compiled module that can be used from Perl. In fact, any language that compiles to make object code can generally be used to make a module. See the perlxs manpage, and the Inline module, as well as the SWIG system.

Do you have a shell script that you want to convert to Perl? Your luck just ran out. There’s no automatic way to convert shell to Perl. That’s because the shell hardly does anything by itself; it spends all of its time running other programs. Sure, we could make a program that would mostly just call system for each line of the shell, but that would be much slower than just letting the shell do things in the first place. It really takes a human-level of intelligence to see how the shell’s use of cut, rm, sed, awk, and grep can be turned into efficient Perl code. It’s better to rewrite the shell script from scratch.

Converting find Command Lines to Perl

A common task for a system administrator is to recursively search the directory tree for certain items. On Unix, this is typically done with the find command. We can do that directly from Perl, too.

The find2perl command, which comes with Perl, takes the same arguments that find does. Instead of finding the requested items, however, the output of find2perl is a Perl program that finds them. Since it’s a program, you can edit it for your own needs. (The program is written in a somewhat odd style.)

One useful argument that’s available in find2perl but not in the standard find is the -eval option. This says that what follows it is actual Perl code that should be run each time that a file is found. When it’s run, the current directory will be the directory in which some item is found, and $_ will contain the item’s name.

Here’s an example of how you might use find2perl. Suppose that you’re a system administrator on a Unix machine, and you want to find and remove all of the old files in the /tmp directory.[24] Here’s the command that writes the program to do that:

$ find2perl /tmp -atime +14 -eval unlink >Perl-program

That command says to search in /tmp (and recursively in subdirectories) for items whose atime (last access time) is at least 14 days ago. For each item, the program should run the Perl code unlink, which will use $_ by default as the name of a file to remove. The output (redirected to go into the file Perl-program) is the program that does all of this. Now you merely need to arrange for it to be run as needed.

Command-line Options in Your Programs

If you’d like to make programs that take command-line options (like Perl’s own -w for warnings, for example), there are modules that let you do this in a standard way. See the documentation for the Getopt::Long and Getopt::Std modules.

Embedded Documentation

Perl’s own documentation is written in pod (plain-old documentation) format. You can embed this documentation in your own programs, and it can then be translated to text, HTML, or many other formats as needed. See the perlpod manpage.

More Ways to Open Filehandles

There are other modes to use in opening a filehandle; see the perlopentut manpage.

Locales and Unicode

It’s a small world, after all. In order to work properly in places where even the alphabet is different, Perl has support for locales and Unicode.

Locales tell Perl how things are done locally. For example, does the character æ sort at the end of the alphabet, or between ä and å? And what’s the local name for the third month? See the perllocale manpage (not to be confused with the perllocal manpage).

See the perlunicode manpage for the latest on how your version of Perl deals with Unicode. As of this writing, each new release of Perl has many new Unicode-related changes, but we hope things will settle down soon.

Threads and Forking

Perl now has support for threads. Although this is experimental (as of this writing), it can be a useful tool for some applications. Using fork (where it’s available) is better supported; see the perlfork and perlthrtut manpages.

Graphical User Interfaces (GUIs)

A large and powerful module set is Tk , which lets you make on screen interfaces that work on more than one platform. See Learning Perl/Tk by Nancy Walsh or the upcoming Mastering Perl/Tk by Nancy Walsh and Steve Lidie (O’Reilly & Associates, Inc.).

And More...

If you check out the module list on CPAN, you’ll find modules for even more purposes, from generating graphs and other images to downloading email, from figuring the amortization of a loan to figuring the time of sunset. New modules are added all the time, so Perl is even more powerful today than it was when we wrote this book. We can’t keep up with it all, so we’ll stop here.

Larry himself says he no longer keeps up with all of the development of Perl, because the Perl universe is big and keeps expanding. And he can’t get bored with Perl, because he can always find another corner of this ever-expanding universe. And we suspect, neither will we. Thank you, Larry!



[1] And we’re not just saying that because it’s also published by O’Reilly & Associates, Inc. It’s really a great book.

[2] The name “package” is perhaps an unfortunate choice, in that it makes many people think of a packaged-up chunk of code (in Perl, that’s a module or a library). All that a package does is define a namespace (a collection of global symbol names, like $fred or &wilma). A namespace is not a chunk of code.

[3] It’s not really a checksum, but that’s good enough for this explanation.

[4] The module Digest::Perl::MD5 is a pure Perl implementation of the MD5 algorithm. Although your mileage may vary, we found it to be about 280 times slower than the Digest::MD5 module on one sample dataset. Remember that many of the bit-twiddling operations in the C algorithm compile down to a single machine instruction; thus, entire lines of code can take a mere handful of clock cycles to run. Perl is fast, but let’s not be unrealistic.

[5] We’re including here merely the most important features of each module; see the module’s own documentation to learn more.

[6] To be sure, there are other important modules whose use is too complex for most readers of this book, typically because using the module requires understanding Perl’s references or objects.

[7] Yes, this means that you are now able to use Perl to send spam. Please don’t.

[8] The actual return value of localtime in a list context is a little different than you might expect; see the documentation.

[9] If your program will never be used with a version of Perl prior to 5.6, you should use the our keyword instead of the vars pragma.

[10] If your program may be used with a version of Perl prior to 5.6, you should not use the warnings pragma.

[11] In fact, it would generally be a lie to say that a Perl array is stored in “a chunk of memory” at all, as it’s almost certainly spread among many separate chunks.

[12] Although LWP makes it easy to make a simple “web browser” that pulls down a page or image, actually rendering that to the user is another problem. You can drive an X11 display with Tk or Gtk widgets though, or use curses to draw on a character terminal. It’s all a matter of downloading and installing the right modules from CPAN.

[13] There are some details of the interface that these snippets don’t support. Trust us; it’s better to use a module.

[14] Several of the reviewers who looked over a draft of this book for us wished we could cover more about CGI programming. We agree, but it wouldn’t be fair to the reader to give just enough knowledge to be dangerous. A proper discussion of the problems inherent in CGI programming would probably add at least 50% to the size (and cost) of this book.

[15] Remember that every new release of each brand of browser on each different platform counts as a new one that you’re probably not going to be able to test. We really chuckle when we hear someone tested a web site with “both browsers” or when they say “I don’t know if it works with the other one.”

[16] At the very least, following the standards lets you put the blame squarely on the other programmer, who didn’t.

[17] Do you see why we didn’t try to fit all of that into this book?

[18] Well, not really, but you can fake it so well that you’ll hardly remember that there’s a difference.

[19] Actually, you can’t make any of these things; these are just verbal shorthands for what’s really happening. What we call “an array of arrays” in Perl is really an array of references to arrays.

[20] OO has its own set of jargon words. In fact, the terms used in any one OO language aren’t even the same ones that are typically used in another.

[21] Dynamic loading of binary extensions is generally available if your system supports that. If it doesn’t, you can compile the extensions statically—that is, you can make a Perl binary with the extension built in, ready for use.

[22] That’s probably the language we’d use for writing a word processor. Hey, we love Perl, but we didn’t swear an oath in blood to use no other language. When language X is the best choice, use language X. But often, X equals Perl.

[23] If you’re using gawk or nawk or some other variant, a2p may not be able to convert it. Both of these conversion programs were written long ago and have had few updates except when needed to keep working with new releases of Perl.

[24] This is a task typically done by a cron job at some early-morning hour each day.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset