CHAPTER 1

Introducing Perl

In this introductory chapter, we cover the background of Perl and what makes it a popular programming language. We also look at installing a standard Perl distribution, as well as building it from source. For those who already have Perl installed, these sections can be skipped, though if we want to use optional features like threading, it may still be advantageous to build Perl from a source distribution, a process which is less daunting than it might at first seem.

Once Perl is installed, we look at how to run Perl scripts, and what needs to be done to get the operating system to recognize Perl scripts. We spend some time with the Perl command line and special environment variables, and see how Perl can be used as a versatile command-line tool to execute snippets of Perl without the involvement of any scripts. Even though Perl is ubiquitous, we also take a look at one way to create stand-alone Perl applications that can run without the benefit of a Perl installation.

There is far more to Perl than the standard distribution. In the last part of the chapter, we cover downloading installing new Perl modules from the Comprehensive Perl Archive Network (CPAN), the first—and frequently only—port of call for all Perl extensions and add-ons. As many Perl modules are a mixture of Perl plus C, we also cover some of the options for installing packages containing C, including ActivePerl's PPM package tool for Windows.

Introduction

Perl, the Practical Extraction and Reporting Language, is a tremendously versatile language that combines an extremely versatile syntax with a powerful range of features. Inspired by Unix text-processing tools like sed and awk, and shell scripting languages, Perl combines their power with a C-like syntax and the flexibility of Lisp to provide a development environment that is vastly more powerful, while also being easier to learn and faster to develop in.

Perl's versatility is one of its greatest strengths, but it can also be a liability if used unwisely. Unlike languages that have a strong opinion on what is the right way or the wrong way to do things, Perl is adaptable enough to adopt almost any approach—"There's More Than One Way to Do It," as the Perl motto goes. This lets Perl adapt to the programming style of the programmer, and not the other way around. In the eyes of Perl programmers, this is a good thing; in the eyes of advocates of some other languages, it isn't. Perl's anti-motto really ought to be "Just Because You Can Do It, Doesn't Mean You Should"; Perl does not impose good programming practices, so it is also easy to write badly constructed and hard-to-read code through sloppy programming.

Perl is a practically minded language, and takes a no-frills approach to almost everything, including features like object-oriented programming, which is pivotal to the entire ethos of other programming languages. Again, this is both a boon and a potential pitfall for the unwary. Perl is also the language of a thousand handy shortcuts, many of which are intuitive, and others of which are indispensable once they are known. We have tried to cover as many as we can during the course of this book.

Key Features

Perl has many features that contribute to its popularity. Some of them are obvious to anyone with familiarity with Perl—an easy learning curve, powerful text manipulation features, and cross-platform availability.

Ironically, experienced programmers sometimes have a harder time than newcomers; Perl makes some things so easy it is necessary to do a double-take and start from scratch rather than attempting to use familiar coding styles from other languages. This is especially true of the regular expression engine.

Others are hidden strengths—its open source credentials and licensing, independence from commercial interferences, and active online communities. Here are a few items for those who are less familiar with Perl to ponder:

  • Perl has a relatively simple and intuitive syntax that makes it extremely quick to learn and lends itself to very rapid prototyping, especially compared to languages that involve an explicit compilation step. This is one reason it is popular as a scripting language—it can be faster to code a Perl tool than find and learn how to use an existing one.
  • Perl is a cross-platform language, supported on almost any operating system of sufficient complexity. This means that, with a few caveats, a Perl program written on one platform will usually run on another with little or no modification. Perl's standard library contains considerable support for handling different platforms in such a way that Perl applications frequently need little or no additional effort to handle multiple platforms.
  • Perl's versatility allows programmers to learn the language and adapt it to their own particular styles. Conversely, of course, Perl won't magically straighten out poor style.
  • Perl contains a powerful suite of features to manipulate text. The regular expression engine, when properly used, is capable of almost any kind of textual transformation conceivable, and text manipulation is one reason for Perl's popularity both as a command-line tool and a web programming language.
  • Perl's standard library comes with comprehensive support for many common programming problems. In addition, CPAN and ActiveState distribute modules for Linux and Windows respectively, which provide many powerful extensions to the standard library including very comprehensive XML support, graphical user interfaces, and several flavors of embedded Perl scripting. Perl can also be integrated into several different web servers.
  • Perl supports references but doesn't support directly addressable pointers, one of the biggest sources of pain in C programming. References allow the easy construction of complex data structures, but without the dangers inherent in pointer arithmetic. As an adjunct to this, like most reference-based languages (Java is another), Perl has a garbage collector that counts references and removes data when it is no longer required.
  • Perl has a flexible object-oriented programming syntax, which is both powerful and at the same time extremely simple. Its simplicity comes at the price of not implementing some of the more advanced object-oriented concepts available in other languages, but it is capable of a lot more than it is often given credit for.
  • Perl is not commercially oriented. This is an important point and should not be overlooked. Perl has no commercial agenda, and does not have features added to boost anyone's market position at the expense of the language itself. This gives it a major advantage over commercial scripting languages (certain examples may spring to mind) that are not developed by the communities that use it.
  • Perl is an open source project, developed by the community of programmers who use it. Its license allows it to be used and copied freely, under the terms of either the Artistic or GNU Public Licenses, whichever suits better. This means that it cannot be commercially coerced, and it also allows it to benefit from the input of thousands of programmers around the world. Even so, commercial support is available from several companies both on Unix and Windows for those that require it.
  • Finally, Perl is not just a programming language but also a thriving online community. One of the most obvious faces of the community is CPAN (headquartered at http://cpan.org, but mirrored around the world) and its comprehensive repository of Perl programming libraries and modules. Others worthy of mention include the Perl Mongers, http://www.pm.org, a network of regional Perl clubs and societies; and the Perl Foundation, http://www.perlfoundation.org, which helps to coordinate YAPC (Yet Another Perl Conference) gatherings around the world.

Supported Platforms

Perl is supported on many platforms, and ports exist to more or less every operating system that still has life in it (and a few that don't). The most commonly used of these are:

  • Unix: More or less every Unix or Unix-like operating system ever created, notably AIX, IRIX, HP/UX, BSD, Linux, Solaris, and Tru64
  • MS Windows: DOS, Windows 3.1, 95, 98, NT, and 2000, and the Cygwin and MinGW Unix-on-Windows compatibility environments
  • Other desktop OSs: Apple Macintosh (68k and PPC, both pre– and post–MacOS X), Acorn Risc OS, Amiga, BeOS, OS/2, and many others
  • Mainframes: AS/400, OS390, VMS and OpenVMS, Stratus (VOS), and Tandem
  • PDAs: EPOC (Psion/Symbian), Windows CE and PocketPC, but not PalmOS at time of writing

Binary versions of Perl for all of these operating systems and many others are available from the Ports page on CPAN at http://cpan.org/ports/index.html. Where possible, however, building from source is preferred and recommended.

Perl also builds from source on many of these platforms. This is generally preferable because binary distributions tend to lag behind the source code in version number (the delay depends a lot on the platform—Unix is near instantaneous). For Unix-like platforms, building from source should not be a problem, and may even be educational. Other platforms that should be able to compile Perl directly from source include DOS, OS/2, BeOS, EPOC, VMS, and Windows; see later in the chapter for details.

When built from source, Perl is often able to take advantage of additional facilities on the platform, like 64-bit integers (even on 32-bit architectures). It also allows the possibility of including support for threads and other features that are often not enabled by default in binary distributions.

Perl History and Versions

Perl is an evolving language, continuously updated with support for new features. Despite this, it is still an easy language to learn and has not lost its basic simplicity, despite evolving from a simple scripting tool into a full-fledged object-oriented application development language.

Perl evolved hand-and-glove with the Internet, and gained rapid popularity in its early days as a language for writing quick utility scripts. This was thanks in part to its powerful text handling features and familiarity to programmers used to the sed and awk tools by which it was partly inspired. Perl has an obvious relationship to C, but it also has characteristics derived from Lisp. It gained popularity as a language for writing server-side CGI scripts for web servers, again because of its text handling abilities and also because of its rapid prototyping. This culminated in version 4 of Perl.

Release 5 of Perl took Perl to a new level by introducing object-oriented programming features. Like the language itself, Perl's support for objects applies itself to getting the job done rather than worrying overly about ideology, but nonetheless turns out to be very capable. The ability to support objects derived principally from the introduction of hard references to the language. Up until this point, Perl only had symbolic references, which are now deprecated (and indeed disallowed with the strict module). It was in version 5 that Perl became more than just a language for writing short utility scripts and became a language in which serious applications could be developed.

Version 5.005 introduced initial support for threaded programming, albeit only inside the interpreter itself. This gave Windows, and other platforms that did not support child processes, an emulation of the Unix fork system call, thus greatly improving Perl's support on those platforms.

In version 5.6, Perl revised its version numbering system to be more in line with version numbers elsewhere. In particular, it adopted the Linux system of numbering stable and development releases with even and odd release numbers—in this scheme the previous stable release is now retrospectively known as Perl 5.5, the odd number notwithstanding. Version 5.6 introduced a number of important improvements, the main ones being much better support of Unix idioms under Windows and initial support for Unicode character sets. From version 5.6, experimental support for threads in user-level programming was first introduced, but only if built from source and requested at that time.

Perl 5.8 brings an improved implementation of threads known as interpreter threads. It also brings full support for Unicode, support for PerlIO and filehandle layers, restricted hashes as a replacement for pseudohashes, improved signal handling, further improved support for Windows, and a much more extensive regression test suite. The development version is Perl 5.9, and the next stable release will be (or may now be) Perl 5.10.

In the future is Perl 6, a complete and fully object-oriented reimplementation and expansion of Perl from the ground up, and the Parrot virtual machine on which it will run. While radically different under the hood, Perl 6 will be sufficiently like Perl 5 that it will be possible to run most if not all Perl 5 programs under it. A port of Perl 5 to the Parrot virtual machine called Ponie, which will be identically compatible with Perl 5.10, is also in the works—see http://www.poniecode.org for information. For more information on Parrot, see http://www.parrotcode.org. For Perl 6, see http://dev.perl.org/Perl6.

Finding Help

Perl comes with a lot of documentation as standard, including a complete set of manual pages that can be accessed with the perldoc utility. For an index of all available pages, use the command line

> perldoc perl

This should return a list of the many standard Perl manual pages available as part of every Perl installation. Where documentation refers to a Perl manual page, we can use perldoc to read it. For example, to read about Perl's command-line options or its C-based interfaces:

> perldoc perlrun
> perldoc perlapi

The perldoc script is quite versatile and is used for accessing all kinds of Perl documentation from a variety of sources, through various command-line options. Using perldoc -h will generate a brief summary of all these options, but as perldoc is a valuable tool to know how to use, we will cover a few of the ways it can be used here.

The most immediately useful variants of the perldoc command worth knowing are those that enable search and query modes.

> perldoc -f funcname    # look up a Perl function
> perldoc -q pattern     # search the questions in the Perl FAQ
> perldoc -r pattern     # search documentation recursively

The -f and -q options provide focused access to the perlfunc and perlfaq manual pages. The options let us extract only the text for a specific query, avoiding the need to search through the entire rather long document. For example:

> perldoc -f split
> perldoc -q '(mail|address)'
> perldoc -q -r '(un)pack'

This last command searches both the questions and the answers of the Perl FAQ (of which there are several numbered sections). Without the -r, it searches only the questions.

The other major use of perldoc is to display the documentation for modules, Perl's reusable libraries. For example:

> perldoc IO::File

Commands and scripts can also have documentation extracted from them, assuming they are written in Perl, of course. If the command is on the search path of the shell, we do not even need to tell perldoc where it is. As a self-referential example, we can read up on perldoc itself with

> perldoc perldoc

All of these commands will extract and display any embedded Perl documentation (technically, Plain Old Documentation or POD) present in the module or script. Even Perl's manual pages are written in this embedded documentation syntax. Just occasionally, we want to look at the actual source code of a Perl library though, which we can do with the -m option. This will give us the actual file contents, including POD documentation and Perl code.

> perldoc -m IO::File

If we just want to know where the file is, rather than display it, we can use -l instead of -m.

> perldoc -l IO::File

Finally, if we are not sure of the exact mixture of cases in the name of the module or script we are looking for, we can use the -i option to make a case-insensitive query. The -i option will also work with -q or -r to perform case-insensitive searches.

> perldoc -i exutils::makemaker
> perldoc -i -q CGI

A complete set of manual pages for Perl in HTML, PDF, PostScript, and plain text formats is available from CPAN's documentation page at: http://cpan.org/doc/index.html, or from http://www.perldoc.com. Users of ActivePerl can also use ActivePerl image Online Documentation accessed through the Windows Start menu.

Talking of HTML, also available from http://cpan.org/doc/index.html is the current version of the Perl FAQ, along with many other useful essays that go beyond the basic Perl documentation.

Building and Installing Perl

Perl is released in two different versions, the stable version, also known as the maintenance version, and a development version. Both versions update in small ways from time to time, and occasionally take a big leap. The last such big leap was from Perl version 5.6.1 to Perl 5.8.0; the new development release, as mentioned earlier, became Perl 5.9.0. This does not mean Perl 5.9 is better than Perl 5.8, it just means it is more experimental. Ultimately it will become Perl 5.10, the next stable release.

Some development releases are more stable than others, but only officially stable releases are recommended for production environments. The new numbering scheme means that the latest stable version will always have an even second digit—incremental updates for Perl 5.8 are in the 5.8.X series, while the next major release will be 5.10. While we usually want to stick to the current stable release, looking at the perldelta document for the current development release (available on CPAN) can be useful to find out what the next stable release is likely to contain.

Getting the most current maintenance release is almost always a good idea. New maintenance versions of both stable and development releases increment the last digit; depending on what is in them we may or may not need to care about upgrading Perl immediately.

Before fetching and installing a Perl distribution, it is worth taking time to consider whether a binary distribution is suitable or whether it would be worth building from source. Source distributions include the following advantages:

  • They can be built to take advantage of the underlying hardware; for example, Pentium+ class processor instructions. Binary distributions are frequently "dumbed down" in terms of processor capabilities in order to be more widely installable.
  • Enhanced and experimental features such as extended precision integers and floating point numbers, and user-level threads can be included in the Perl interpreter on Unix platforms (Perl on Windows is always threaded).
  • Support for additional packages like the GNU DBM (GDBM) and Berkley DB (DB) libraries can be built as part of the installation process if they are present on the system at the time Perl's build is configured.

Disadvantages of source distributions are that they take longer to carry out and require a compiler and supporting tools for the build process. They are also not always immediately portable to all the platforms on which Perl can run, while binary distributions have already solved the installation issues for their target platform. Having said this, the source is quite capable of being built on the bulk of platforms that we are likely to be using.

Installing Prebuilt Perl Distributions

Perl is available from many websites and FTP servers. Two places to look for it are http://www.perl.org and http://cpan.org, both of which carry the latest releases and links to all distributions, free and commercial, for all platforms on which Perl is known to work. Note that it is generally a good idea to pick a local mirror before downloading, for reasons of both speed and good neighborliness. The main CPAN site automatically tries to redirect browsers to a local site.

Binary distributions are available from a number of places, most notably CPAN's binary ports page at http://cpan.org/ports/index.html, which contains links to a large number of binary ports and incidental notes on availability. Some platforms lag behind the current source code release by a few version points—if this is important, consider building from source instead.

Many platforms can take advantage of prebuilt packages that will install onto a system using standard installation tools. Linux, for example, has both .deb and RPM packages, both of which are commonly available from the websites of the respective distributions (Debian and distributions based on it use .deb packages, while Red Hat and SuSE use RPM). Additionally, RPMs can be found at ftp.rpmfind.net and can be searched for with the rpmfind utility. Solaris packages can be installed with the pkgadd facility, and so on.

In general, packages keep up to date with the current Perl release, but check first and make a note of any changes that may mean getting a more up-to-date release.

Installing Perl on Unix

Installing a binary distribution of Perl is trivial—we just need to unpack it into the place we want it to live; to install it into a system-wide place like /usr/local or /usr/lib we will need root privileges, of course, which is usually denoted by the prompt # on the command line.

Most Unix vendors supply a binary Perl package for their particular platform, and many of them have Perl installed by default. If Perl is not already installed, it makes most sense to pick up the appropriate package from the vendor's web or FTP site.

Additionally, the latest versions of the standard package in both RPM and Debian formats can be tracked down at http://www.activestate.com. After retrieving the correct package, the following command lines for .rpm and .deb files will place them in the relevant location and, therefore, install Perl for us:

# rpm -i ActivePerl-5.8.4.810-i686-linux-thread-multi.rpm
# dpkg -i ActivePerl-5.8.5.810-i686-linux-thread-multi.deb

A .tar.gz archive is also available, if neither of these formats suits our needs.

If we want to run Perl (and attendant scripts like perldoc) from the command line, we need to either place the executables into a directory that is on the system path or adjust our own path so that we can run Perl from an installation directory of our own choosing. This latter approach is sometimes appropriate even if we could install to a system location, in order to have a second Perl installation independent of a preexisting vendor-supplied Perl. We can use the new copy for development without the risk of upsetting any operating system scripts that might rely on the Perl environment set up by the vendor.

Installing Perl on Windows

There are three main ports of Perl to Windows: the native port, the ActiveState port, and the Cygwin port. The native port can be built straight from the source, which is the preferred alternative; a straight binary port is available from the ports page for those who really can't build from source and can't (or won't) use one of the available alternative binary ports.

The ActiveState port, ActivePerl, is very popular and is freely available from http://www.activestate.com. If we are using Windows 9x, we will need the Microsoft Windows Installer (Windows 2000/XP and ME onwards have this as standard). It is available from the Windows Download page at ActiveState's website: http://www.activestate.com/Products/ActivePerl/Download.html. Select the appropriate Installer and install it first if necessary. We can then download the binary distribution from the Windows Download page or retrieve by FTP from ftp://ftp.activestate.com/ActivePerl/Windows/5.8.

The Cygwin port for the Cygwin Unix compatibility environment is available from http://sourceware.cygnus.com/cygwin. The Cygwin Port has the advantage of supporting (and indeed coming supplied with) many of the GNU utilities, including gcc, gzip, make, and tar, which the CPAN module CPAN.pm prefers to use.

Installing Perl on Macintosh

MacOS X is well provided for in terms of Perl as it is a Unix-based platform. Indeed, Perl comes preinstalled, and we do not need to do anything more to make use of it. In addition, since MacOS X is Unix-based underneath, most of the details for managing Unix installations apply to it also, including shebang lines for script identification (#!/usr/local/bin/perl -Tw). However, since Perl is used by some standard system utilities, it can sometimes be advantageous to install a second copy of Perl for development use so as not to accidentally disturb the vendor-supplied version.

The OS 9 port of Perl, also known as MacPerl, comes in several different forms, divided into two stand-alone versions and either a binary or source distribution based around the Macintosh Programmer's Workbench (MPW). Either version works on any version of MacOS from 7 through 9. All of the files needed can be found at a mirror of http://cpan.org/ports/mac. Additional information is available at the MacPerl homepage at http://www.macperl.org/.

Building Perl from Source

Building Perl from source is not required for most platforms, so this section can be skipped by those who would prefer not to experiment with compilers or are simply impatient to get on to actually writing Perl code. This section is therefore an introduction to building Perl for those who are curious about building Perl from source, or want to do more than the standard binary installations allow, such as enabling support for threads under Unix.

Source distributions are available from many places, most notably CPAN's Perl source code page at http://cpan.org/src/index.html. The current production release is always available as stable.tar.gz and stable.zip. Development releases are available as devel.tar.gz and devel.zip. In both cases, the actual version number is not reflected in the filename but is in the name of the directory archived inside, which unpacks as perl-<version> (for example, perl-5.8.5).

System administrators who are concerned with the accounting of the files they install can use a source package installation. These provide the advantage of having Perl installed as an accountable package, while at the same time taking advantage of the benefits of compiling it from source. Rather than install a binary RPM, for example, we can acquire the source RPM or SRPM (use --sources as an argument to rpmfind to locate one) and use rpm --rebuild perl.srpm to build a new binary package we can then install.

Building Perl is mostly just a matter of unpacking the source archive (stable or development version) and choosing how to configure the build process. Before moving on to configure the installation, we need to extract the archive. For example, assuming we have the gzip source archive on a Unix system, we use the following command lines:

> gunzip -c stable.tar.gz | tar -xvf -
> cd perl-5.6.0

If we have the GNU version of tar, we can also do the unpacking in one step with

> tar -xzvf stable.tar.gz

Perl builds easily on most Unix platforms; the majority of these can use the supplied Configure script to set up and compile Perl with the features and installation locations they want. Other platforms supported by the source bundle are documented in one of the README files available in the top-level directory of the source tree. For instance, instructions and additional details for building Perl on Windows are contained in the README.win32 document. If the platform we want to build on has an associated README file, we can build and install Perl on it, but there may be more involved than simply running Configure.

Configuring the Source

The Configure script sets up the source distribution for compilation on the target platform. It has two modes of operation. The first is an interactive step-by-step question-and-answer session that asks us how we want to build the interpreter, where we want to install it, and the libraries that come with it. At any stage, where a reasonable default can be assumed, the script does so. We usually change just the options we care about and accept the defaults for the rest. At its simplest we can use Configure this way by entering

> ./Configure

The second mode of operation skips the question-and-answer session and assumes the defaults for all questions. Without qualification this will build a standard Perl that installs into the default place (/usr/lib/perl5 on most systems) when we subsequently carry out the installation. To run Configure in this way, we use the -d option, for "default."

> ./Configure -d

In either mode, if we want to have Configure automatically start the compilation process after finishing the initial setup, we can specify the -e option. To stop Configure being concerned about the less important issues regarding how and why it is making its decisions, we can also specify -s, for "silent." These are both good options to specify with -d to configure and build Perl with minimal fuss.

> ./Configure -des

Both modes also allow us to specify configuration options on the command line. In the interactive mode, this changes the default option presented to us when Configure asks us the appropriate question; we can change our minds again then if we want to. In the noninteractive mode this is the only chance we get to determine how the resulting Perl is built.

There are many options that can be specified; the question-and-answer session displays most of the ones that it sets, so it is worth running through an interactive configuration once just to see what options a question sets. The most common option to specify is the prefix option, which determines where Perl is installed after it is built and presets the values of many other options. Configuration options are set with the -D flag, so to tell Perl to install itself in /usr/local/perl58 rather than the default location we would use

> ./Configure -des -Dprefix=/usr/local/perl58

This example uses the noninteractive mode of Configure to install Perl under the /usr/local directory. We might do this, for example, if we already have a Perl installation and we do not want to disturb it. Similarly, we can use -DDEBUGGING to enable a debug version of Perl and enable the -D option.

> ./Configure -des -Dprefix=/usr/local/perl58dbg -DDEBUGGING

Another option to combine with prefix that ensures we don't upset an existing installation is the installusrbinperl option. By default this is now disabled, although prior to Perl 5.8 it was enabled. It causes the installation stage to copy Perl and a few supporting scripts into the default place for system executables. Unless we change it, this default is /usr/bin. To disable an option we use -U, so to set up Perl 5.6 or earlier to install elsewhere and leave the existing installation untouched, we would use

> ./Configure -des -Dprefix=/usr/local/perl56 -Uinstallusrbinperl

Or to have Perl 5.8 do the extra installation work:

> ./Configure -des -Dprefix=/usr/local/perl58 -Dinstallusrbinperl

The PerlIO support in Perl 5.8 and later can similarly be disabled (if we have good reason to believe that the native support is better) with -Uuseperlio.

A complete list of Configure's options can be viewed by using the -h or --help option on the command line.

> ./Configure --help

If we have been through the configuration process at least once before, there will be a config.sh file present. In the interactive mode, Configure will ask if we want to use it, and then preset itself with defaults from the previous run. This file contains all the names of the options that we can supply on the command line too, so it can be handy for finding out how to set options. Note that if we are rebuilding Perl with different options, we should additionally execute a make distclean before we configure and build Perl a second time.

> make distclean
> ./Configure

Building the Source

Once configuration is complete, we actually build and install Perl by typing

> make

Perl builds itself in several steps. The first step is a stripped-down version of Perl called miniperl. This is a feature-complete version of the interpreter, but at this point none of the extension modules involving C code (for example, Socket) have been built yet.

The miniperl interpreter generates the Config module, which provides the information needed to build the extensions. The one critical extension that is built is DynaLoader, which provides support for dynamically loading extensions. We can configure which extensions are built, and whether they are built dynamically or statically, by configuring the build appropriately with options like -Dnoextensions=DB_file or -Donlyextensions=Dynaloader,Socket or answering the interactive questions appropriately.

The penultimate build step is to generate the scripts that come with Perl such as cpan, perldoc, and perlcc. Finally, the manual pages are generated and Perl is ready to be installed. To install to a privileged location, we will usually need root or administrator privileges. Assuming we do, we can just enter

# make install

This will build and install Perl according to the choices we made at the configuration stage. If we are overwriting an existing installation and prudently want to test the new build before we wreck anything, we can build Perl without installing it, test it, and then install it only if we are happy.

> make
> make test
> su


Password:

# make install

To verify that the installation is complete (and to reverify it later, if need be), use the perlivp "installation verification procedure" tool.

> perlivp -p

This will print out a series of "OK" messages if all is well, or generate appropriate messages if any problems are found. The -p is optional; it prints out the meaning of each test before it is run. We can also use -v to generate verbose output of what is being checked, if necessary.

Building a Binary Distribution

If we want to create a built Perl distribution for installation elsewhere, we can use the DESTDIR makefile macro to send the installed files to a different directory than the default.

> make install DESTDIR=/tmp/perl58dist
> cd /tmp/perl58dist
> tar cvf perl58dist.tar *

At the end of the build process a shell script called myconfig is generated with complete details of the configuration used to build Perl. This is handy for checking what actually was used to build Perl; a similar output can be generated from Perl itself with the -V option.

> perl -V

Individual options can be checked by qualifying the option. For example:

> perl -V:usethreads

The usethreads value is a Boolean one, so this command will come back with a line containing the word define if the option was set, or undef if it was not. Other options like cc or osname return string values.

The information provided by the -V option can also be accessed from the Config.pm module, which is generated from the contents of config.sh when Perl was built. Supporting scripts like h2xs and xsubpp use it to configure the build environment for building extension modules.

Building a Threaded Perl Interpreter

One specific reason for building Perl from the source distribution rather than installing a precompiled binary is to enable support for threads. Threads are a powerful way to write multitasking applications without the overheads of multiple processes, and are fundamental to languages such as Java. In Perl, threads are technically still considered experimental (although from Perl 5.8 they are in fact fully mature), so they are not built by default.

The Configure script poses two questions about threads: firstly, Build a threading Perl?, and secondly, Use interpreter-based threads? You'll almost certainly want to answer y and y respectively. (For Perl 5.6 and before, the answer to the second question should be n to use 5.005 threads, but unless legacy code needs it, this older thread implementation is deprecated and should be avoided if at all possible.)

Unfortunately, this will not on its own result in a working threaded Perl interpreter due to technical issues surrounding the configuration process. We must also select thread support on the command line with the usethreads option. Here is how we would do that before entering an interactive question and answer session:

> ./Configure -des -Dusethreads

(If 5.005 threads are really necessary, also add -Duse5005threads to the end of this command.)

To build a threaded version of Perl noninteractively and in a different location as described earlier:

> ./Configure -des -Dusethreads -Dprefix=/usr/local/perl58thr

Either way, we should eventually create a threaded Perl interpreter. If all has gone well, we should have a threads module in the standard library (for 5.005 threads, the module is instead called Thread—it should not be possible to have both modules installed simultaneously). We can check that it is present by executing

> /usr/local/perl58/bin/perldoc threads

See the latter part of Chapter 21 for a description of threads and why we might want to use them, and also the threaded network server example in Chapter 22.

Supporting 64-Bit Integers and Long Doubles

If Perl's build configuration script finds itself running on a natively 64-bit platform, Perl's own integers will automatically be 64 bit too. Otherwise, 32-bit integers are assumed.

Some 32-bit platforms can provide simulated 64-bit integers, for example, if the underlying C compiler (with which Perl is built) understands the long long type. To enable basic but minimal support for 64-bit integers, enable the use64bitint option.

> ./Configure -des -Duse64bitint ...

This will give Perl 64-bit-wide scalar integers, although the underlying support for them may not be all that efficient. To get fully 64-bit integer support, use use64bitmax instead.

> ./Configure -des -Duse64bitint ...

This hopefully will result in true 64-bit support not just for integers, but for internal pointers too, but depending on the actual support provided by the operating system we may find that every dependant library also needs to be rebuilt to support 64-bit integers, or that the resulting Perl executable does not work at all—caveat machinator.

Long doubles may be similarly enabled, even if the platform does not appear to natively handle them, with the uselongdouble option.

> ./Configure -des -Duselongdouble ...

Finally, to enable both long doubles and minimal 64-bit integer support we can also say

> ./Configure -des -Dusemorebits ...

Specifying Extra Compiler and Linker Flags

For the most part there will be few reasons why we might want to change the flags that the configuration script calculates for compiling and linking Perl. However, should the need arise, we can add our own flags with the -A option. While -D defines an option and -U undefines it, -A allows us to change it in almost any way we like. While it can change any configuration option, it is usually used to manipulate the compiler and linker flags that Configure does not give us direct control over. For example:

> ./Configure -Accflags=-DPERL_DEBUGGING_MSTATS -Dusemymalloc=y

This enables an additional symbol during compilation that enables support for the mstat function in the Devel::Peek module, which in turn allows us to study Perl's use of memory. It only works with Perl's internal memory allocator so we tell Configure to turn it on too. (Normally Configure will choose which malloc to use based on what it thinks of the platform's native implementation.)

We can override many low-level options this way, including ldflags, libs, and so on. In normal use -A appends a value to the existing definition separated by a space, since this is normally what we need (-D and -U are more convenient if we just want to override a value). However, we can qualify the value with append: or prepend: to directly attach the new text to the end or start of the existing value too, if need be. For information on these and other usage modes, see Configure -h.

One specialized option that affects the compiler flags is optimize. We can use it to control the optimization level the compiler applies to Perl. To build Perl with debugging information (which also switches on the DEBUGGING option as a side effect), we can say

> ./Configure -Doptimize='-g'

If we happen to know of additional optimization options beyond that which the configuration script knows about, we can try them here too.

Installing Extra Modules

If we specify the extras option (or answer y to the question in an interactive session), we can have Configure automatically download and install additional modules, typically extensions with a C-based component, into the standard Perl library. For example:

> ./Configure -des -Dextras="DBI Bundle::DBD::mysql" ...

The extras option uses the CPAN module (covered in more detail later in the chapter) to find the requested modules and so will only work if we have access to CPAN via the Internet, or a local module repository.

Extending the Default Library Search Path

Perl by default searches a standard list of directories whenever we ask it to load in a module. This list is held in the special variable @INC and can be seen by running perl -V. While there are many ways to augment @INC from the command line, environment, or within Perl code, it is sometimes handy to be able to add additional locations to the default list. We might want to do this, for example, if we have a standard location for specific shared Perl modules that are maintained outside the standard installation locations.

To add additional paths, we need to tell Configure about them. Probably the easiest way is the otherlibdirs option. For instance:

> ./Configure -des -Dotherlibdirs=/home/share/lib/perl:/home/share/lib/site-perl

These directories are added to the end of the list, so they provide additional search locations but will not override a system-installed library (since the system locations are searched first). To put additional locations at the front of the list, we need to append to the CCFLAGS makefile macro, which we can do like this (both lines that follow should be typed as a single command):

> ./Configure -des -Accflags=

'-DAPPLLIB_EXP="/home/share/lib/perl:/home/share/lib/site-perl"'

We do not need to worry about specifying architecture-specific subdirectories with either option; they will automatically be extrapolated from the base paths we provide.

Reusing and Overriding a Previous Configuration

The final result of running Configure is, among other files, a script named config.sh. This contains all the answers to the questions we answered in interactive mode, or specified or left as the default on the command line. If we run Configure a second time, it will see this file and automatically make use of it.

If we have specific requirements that are difficult to manage as command-line options to the configuration script, we can instead create a second script in the manner of config.sh called config.over. This script, if present, is run at the end of the configuration phase (whether we chose to answer questions interactively or not) and just before the config.sh file is written. We can therefore use it to automatically override any number of configuration options. For example, to force support for 64-bit integers:

use64bitint='define';

Alternatively, we can use the -O option to override config.sh for a specified option. With this in place, we can use -U and -D to specify new values for options and override the previously saved answers recorded in config.sh. Without it, config.sh will override the command-line options unless we tell Configure to disregard it.

To use an alternate configuration file, for example, a config.sh copied to a backup name for later reuse, use the -f option.

> ./Configure -des -f perl58thr64.sh

Note that config.over cannot be called another name, however.

Differences for Windows and Macintosh

Building Perl from source on Windows is broadly similar to building it on Unix with the primary difference being the choice of compiler and make utility used. Running perldoc perlwin32 provides some reasonably detailed information on the options available. For those without a Perl to use to run perldoc, the pages can be found in the unpacked distribution and read as unprocessed .pod files (which are reasonably legible even so) and are also available in HTML at http://cpan.org.

To build Perl for MacOS X is the same process as for Unix. For OS 7 to 9, it first requires installing the base packages and then compiling the source code with the MPW. Refer to the earlier section titled "Installing Perl on Macintosh" for details on which files to get and install.

Other Platforms

Other than generic Unix support and Windows, several other platforms are supported by the source distribution. Each of these has a README file located in the top of the tree that provides information to get Perl built and installed on that platform. For example, README.cygwin provides notes on building Perl for the Cygwin Unix compatibility environment on Windows.

Windows has more than one option, depending on whether we want to use the native Win32 port, the Cygwin support libraries, MinGW, or the DOS version (which requires the djgpp compiler available from http://www.midpec.com/djgpp/). Less common platforms are more or less functional but don't necessarily expect them to work completely, especially very old, very new, or very obscure platforms.

Running Perl

Having installed Perl, it is time to run some programs. The core of Perl is the Perl interpreter, the engine that actually interprets, compiles, and runs Perl scripts. Perl is not a traditional interpreted language like shell scripts; instead the interpreter compiles it from the original human-readable version into a condensed internal format generically known as bytecode. This bytecode is then executed by the interpreter. All Perl programs therefore go through two phases: a compile phase where the syntax is checked and the source code—including any modules used—is converted into bytecode; and a runtime phase, where the bytecode is executed.

Perl tries hard to be platform independent, but every platform has its own way of doing things, including its own way of running Perl. The first item on the agenda is usually to teach the operating system how to recognize a Perl script as a Perl script. Unix-like platforms have it easiest since Perl grew up there. Windows, Macintosh, and other platforms can be set up to handle Perl too, in their own particular ways.

The Perl interpreter supports several command-line options that affect its behavior in both the compile and runtime phases. These include essential features like enabling warnings or extracting configuration information, and less obvious but still useful features like implicit loops and automated input parsing that allow us to use Perl as a generic command-line tool. Since we do not always want to specify command-line options directly, Perl allows us to set them, as well as some of Perl's other default values, through environment variables. On Windows, we can set some values via the registry too.

Starting Perl Applications

In order for a Perl script to run, it must be given to the Perl interpreter to execute. Depending on the platform, there are various different ways this can happen. The simplest way, which works on all platforms that have a shell or equivalent command-line mechanism, is to run Perl directly, and supply the script name as an argument.

> perl myscript.pl

This presumes, of course, that the Perl interpreter perl, perl.exe, etc., is somewhere on the path defined for the shell. We can also supply command-line switches to Perl when we run it directly. For example, to enable warnings:

> perl -w myscript.pl

However, we usually want to run a Perl application directly, that is, to type

> myscript.pl

For this to work, the shell has to know that myscript.pl is a Perl script, find and execute the Perl interpreter, and feed the program to it. Because Perl tries to be helpful, it will actually tell us the best way to start up an application this way if we ask it using the -V option.

> perl -V:startperl

If the platform supports a form of implicit indication that allows a script to indicate that it is a Perl script and not some other form of executable, this will produce the information required, adjusted for the location that Perl is installed in. For example, on a Unix system we get something like

> startperl='#!/usr/bin/perl';

This indicates that #!/usr/bin/perl can be added to the start of any script to enable it to start up using Perl automatically.

Perl on Unix

Unix systems have it easiest when it comes to setting up scripts to run with Perl automatically, largely in part due to its close and long-term relationship with shells.

When the operating system comes to execute a Perl script, it doesn't know it is a Perl script, just a text file. However, when it sees that the first line starts with #!, it executes the remainder of the line and passes the name of the script to it as an argument, for example:

#!/usr/bin/perl -w
... rest of script ...

In this case the command happens to run the Perl interpreter. So, to start a script up directly on Unix systems we need to add this line, (possibly embellished with additional command-line options) to the start of a Perl script and make the script file executable.

This technique is standard on Unix platforms. Shells on other platforms don't understand the significance of a first line beginning #! and so we have to adopt other approaches.

Perl on Windows

MS Windows 9x/ME and Windows NT/2000/XP can make use of file extension associations stored in the registry to associate the .pl extension with the Perl interpreter. Handily, the ActiveState distribution offers to do this automatically at installation, and also makes sure that command-line arguments are passed correctly. While the shebang line is not used to locate the interpreter, trailing options are detected and used.

Otherwise, we need to follow the following instructions: select Settings image Folder Options from the Start menu and click the File Types tab. From here click the New Type button and a new dialog box will appear. Click the New Icon button and type in the location. Now fill in the relevant boxes appropriately, using the .pl extension. When we click the New button to create a new action, we have the option to give the action a name and specify the path of the Perl interpreter, in this case.

Alternatively, we can simply convert a Perl script into a batch file. We can do this by using the supplied executable pl2bat. This makes the script stand-alone, but leaves the original file untouched also. It also changes the script name, so $0 (the batch file variable which refers to the name of itself) will be different inside scripts. Alternatively, if the script has no extension, we can copy the supplied batch file, runperl.bat, to another batch file with the same name as the Perl script. This achieves exactly the same result as it uses the $0 variable to call the Perl script via the Perl interpreter.

One problem with most Windows shells is that they do not do command-line expansion of wildcards like Unix shells do, instead opting to leave this task to the application. We do not have to put up with this, though. To fix the problem we can make use of the File::DosGlob module in a small utility module that we can preload by setting the variable PERL5OPT in the system's default environment. See perldoc perlwin32 for details of how to do this.

There is also a commercial package called perl2exe, which will convert the Perl script into a stand-alone executable, which may be appropriate for some needs.

Perl on Macintosh

MacOS X is based on Unix, so we can run any Perl script using Unix conventions like shebang lines.

Running Perl programs on Macintoshes with MacOS versions 7 to 9 is simple—each script just needs to have the appropriate creator and type attributes set and the operating system will automatically recognize it for a Perl script and invoke the interpreter. Running Perl from the command line is also simple, or rather, impossible since there is no command line. However, MacPerl provides the ability to save scripts as "applets," which can be executed by the usual double click or even just packaged up into a fully self-contained executable file. For more information on all issues concerning MacPerl, consult the website http://www.macperl.com.

The Command Line

Whether or not we invoke Perl directly or implicitly, we can supply it with a number of command-line options. We do not always have to supply command-line options explicitly. For systems that support shebang lines, we can add them to the end of the command contained in the first line.

#!/usr/bin/perl -Tw
...

Alternatively, and on systems that don't support shebang lines, we can set an environment variable. Perl examines several of these when it starts up and configures itself according to their contents. For command-line options, the special environment variable is PERL5OPT, which may contain command-line options, exactly as if they were appearing on a real command line. Here is how we can use it to cause Perl to enable warnings (-w) and taint mode (-T), and load the strict module for strict syntax checks:

rem Windows
set PERL5OPT = wMstrict

# Unix - csh-like
setenv PERL5OPT "-Mstrict -wT"

# Unix - Bourne shell-like
export PERL5OPT="-Mstrict -wT"

We can set PERL5OPT in a shell startup script or batch file (.profile or similar for Unix shells, AUTOEXEC.BAT for Windows 9x, or alternatively the registry for Windows in general).

Command-Line Options

Perl's command-line options are detailed on the perlrun manual page, which we can read with perldoc perlrun. We can also obtain a short list of options with

> perl -h

For completeness, here is the output of running this command:


  -0[octal]       specify record separator (, if no argument)
  -a              autosplit mode with -n or -p (splits $_ into @F)
  -C[number/list] enables the listed Unicode features
  -c              check syntax only (runs BEGIN and CHECK blocks)
  -d[:debugger]   run program under debugger
  -e program      one line of program (several -e's allowed, omit programfile)
  -F/pattern/     split() pattern for -a switch (//'s are optional)
  -i[extension]   edit <> files in place (makes backup if extension supplied)
  -Idirectory     specify @INC/#include directory (several -I's allowed)
  -l[octal]       enable line ending processing, specifies line terminator
  -[mM][-]module  execute 'use/no module...' before executing program
  -n              assume 'while (<>) { ... }' loop around program
  -p              assume loop like -n but print line also, like sed
  -P              run program through C preprocessor before compilation
  -s              enable rudimentary parsing for switches after programfile
  -S              look for programfile using PATH environment variable
  -t              enable tainting warnings
  -T              enable tainting checks
  -u              dump core after parsing program
  -U              allow unsafe operations
  -v              print version, subversion (includes VERY IMPORTANT perl info)
  -V[:variable]   print configuration summary (or a single Config.pm variable)
  -w              enable many useful warnings (RECOMMENDED)
  -W              enable all warnings
  -x[directory]   strip off text before #!perl line and perhaps cd to directory
  -X              disable all warnings

If we build a debugging Perl, the -D option will also be available, as mentioned earlier. This is generally of interest only to programmers who are interested in working on Perl itself, or writing C-based extensions. The -C option is new to Perl 5.8. We won't cover every option here (they are all covered somewhere in the book, however), though we look at several in the following text. Detailed information on each option is available from the perlrun manual page that comes with Perl.

Command-Line Syntax

The simplest use of Perl is to run a script with no arguments, like this:

> perl myscript.pl

Assuming myscript.pl is the following simple example:

#!/usr/bin/perl
print "Hello Perl World! ";

This will produce the traditional greeting

Hello Perl World!

Perl expects to see a script name at some point on the command line, otherwise it executes whatever input it receives. We can also therefore say

> perl < myscript.pl

Command-line options, if we choose to specify them, go before the script. This command enables warnings, taint mode, and includes the strict module:

> perl -w -T -Mstrict myscript.pl

The non-fatal version of taint mode, available from Perl 5.8 onwards, is enabled instead with a lowercase -t.

> perl -w -t -Mstrict myscript.pl

Some command-line options take an argument. For example, -e takes a string of code to evaluate, enclosed in quotes. Some options (-e included) allow a space after them. Others, like -m and -M do not—if we try to do so Perl will return an error.

No space is allowed after -M

Command-line options that take no argument may be grouped together, so -w and -T can be grouped into either -wT or -Tw. We can also add one command with an argument, such as -M, if it is the last in the group. For example:

> perl -TwMstrict myscript.pl

Supplying Arguments to Scripts

If we run a script directly, we can pass arguments to it that can be read inside the program (by examining the special array variable @ARGV).

> myscript.pl -testing -1 -2 -3

If we run a script via the Perl interpreter explicitly, we cannot just add options to the end of the command line, because Perl will absorb them into its own command line. However, if we add the special sequence -- Perl will stop processing at that point and any further arguments will be passed to the script instead.

> perl -TwMstrict myscript.pl -- -testing -1 -2 -3

If the script's arguments do not look like options (that is, are not prefixed with a minus), there is no problem and we can say

> perl -TwMstrict myscript.pl testing 1 2 3

Even though this works, it is often a good idea to include the -- sequence anyway, just in case we come back to edit the command and add minus-prefixed arguments later on. We will discuss and expand on the special variable @ARGV and sequence -- later on in Chapter 6.

Using Perl As a Generic Command-Line Utility

With a little imagination, we can make Perl do a lot with just a few of its command-line options.

In particular, the -e, -n, -p, and -l options allow us to perform fairly powerful functions with a single command. The -e option allows us to execute code directly, rather than specifying a program file, so we don't need to actually write a program, save it to a file, and then execute it. The -n and -p options place an implicit while loop around the code we specify that reads from standard input, turning it into the body of the loop. The -l option strips off the terminator from each line read, then configures print to put it back afterwards.

By combining these options creatively, we can produce any manner of quick one-line text processing commands using Perl. Say we wanted to add line numbers to the start of every line of a program. We could write a program to do that, but we could also just type

> perl -ne 'print "$.: $_"' < in.pl > out.txt

The same command in MS Windows would be

> perl -ne "print ""$.: $_ """ <in.pl >out.txt

The special variable $. holds the line number, and $_ the contents of the line just read from the input. The option -n puts the whole thing in a loop, and the rest is redirection. If for some reason we wanted the line numbers on the end, we enter this command for the Bourne shell in Unix:

> perl -nle 'print "$_ [$.]"' < in.pl > out.txt

Here -l strips the original line ending, but redefines Perl's output formatting so that the print puts it back. Of course, these are just simple examples, but throw in a regular expression or a module loaded by the -m or -M flag and we can do some surprising things. For example, this command pulls a web page from a remote server and prints it out using the LWP::Simple module (available from CPAN):

> perl -MLWP::Simple -e 'getprint "http://www.myserver.com/img.gif"'

Other options we can use in these kinds of super-simple scripts are -a and -F, which enable and configure autosplit mode. This, among other things, allows us to tell Perl to automatically break down lines into individual words. Another useful option is -i, which edits a file in-place, so that transformations we perform on what we read from it are actually enacted on the file itself.

Finally, we can use the Perl debugger as a generic Perl shell by passing -d and -e with a trivial but valid piece of Perl code such as 1, and possibly -w to enable warnings.

> perl -dwe1

See Chapter 17 for more on the debugger and the features available from this "shell."

The Perl Environment

All environment variables present in the shell that executes Perl are made available inside Perl scripts in the special hash variable %ENV. In addition, Perl pays special attention to several environment variables and adjusts its configuration according to their contents.

Windows machines may (but are not required to) specify environment information in the registry, under either

HKEY_CURRENT_USERSoftwarePerl

or

HKEY_LOCAL_MACHINESoftwarePerl

In the case of duplicate entries, the local machine settings are overridden by the current user settings. Entries are of type REG_SZ or REG_EXPAND_SZ and may include any of the standard Perl environment variables (that is, any variable starting with PERL) as described in Table 1-1. We can also specify additional path information to the library include path @INC, in both generic and version specific forms (we use version 5.8.5 here for example purposes) by setting some or all of the entries in Table 1-1.

Table 1-1. Windows Registry Entries for Extending @INC

Entry Description
lib Standard library path extension
sitelib Site library path extension
vendorlib Vendor library path extension
lib-5.8.5 Version-specific standard library path extension
sitelib-5.8.5 Version-specific site library path extension
vendorlib-5.8.5 Version-specific vendor library path extension

General Environment Variables Used by Perl

There are three general environment variables that are used by Perl, which we should consider here. The first of these is PATH. This is actually the standard shell search path that Perl uses to locate external commands when executed from Perl scripts. Sensible programs often set this variable internally and do not rely on the supplied value. It is also used by the -S command-line option that causes Perl to search for the specified script using the path.

The other two variables are HOME and LOGDIR. If the chdir function is used in Perl without an argument, then Perl checks the value of HOME and changes to that directory if set. If HOME is not set, Perl checks LOGDIR as a last resort. If neither is set, chdir does nothing.

Perl-Specific Environment Variables

Perl looks for and reads the values of several environment variables that, if present, we can use to control the behavior of the interpreter when it starts up. Almost all of these start with PERL5 and the most commonly used of them are PERL5LIB and PERL5OPT.

The variable PERL5LIB defines the library search path, a list of directories that Perl searches when the do, require, or use statement is used. The value of this variable becomes @INC inside the program. (The old name for this variable, PERLLIB, is only used if it is set and PERL5LIB is not.) We can also modify the default value of @INC, defined when Perl was built, using the -I flag.

The environment variable PERL5OPT, as its name may suggest, may contain a list of the command-line options, in the same format as they would be if supplied directly.

The variable PERL5DB specifies the command used to load the debugger when the -d option is used. By default this is set to

BEGIN { require 'perl5db.pl' }

If we have created our own modified debugger, we can set this variable to have Perl use it by default. Setting it to nothing effectively disables the -d option altogether.

The last of the most commonly used Perl environment variables, PERL5SHELL, is a Windows-specific variable that allows us to override the default shell used to execute external commands when executing external commands with backticks or the qx operator. Perl uses its own variable rather than relying on the value of COMSPEC as the latter is not always set to a desirable value for a Perl-created subshell process. The default is

command.com /c Windows 9x/ME
cmd.exe /x/d/c Windows NT/2000/XP

(In general it is desirable to avoid starting an external shell at all, and we devote some time in Chapter 14 to the why and how of avoiding shells.)

Less commonly used, and generally only of interest to those involved in developing or debugging Perl itself, are the PERL_DESTRUCT_LEVEL and PERL_DEBUG_MSTATS variables, both of which are advanced options. If the Perl interpreter supports the -D flag, PERL_DESTRUCT_LEVEL controls the behavior of Perl's garbage collector for the destruction of references. If Perl was built to use the malloc library that comes supplied with it (perl -V:d_malloc), PERL_DEBUG_MSTATS enables an additional memory statistics debugging mode for the -D flag (which must also be available). If set to a false value, statistics are dumped after the program completes. If set to a true value, statistics are also dumped after the compilation stage and before the execution stage.

If PerlIO is supported by the interpreter (the default from Perl 5.8), then the environment variable PERLIO can be used to control the default layers set on filehandles. See Chapter 12 for more information.

Plenty of Perl modules respond to their own special environment variables. For example, to pick a module that is close to the core of Perl, the Perl debugger examines the PERL5DB_OPTS environment variable for configuration information. See perldoc perlrun for a full and exhaustive list of standard variables.

Certain Perl features are also sensitive to certain locale environment variables if the use locale directive is specified. We cover locale and internationalization at the end of the book.

Installing Modules

Once we have a working Perl installation, we can install additional Perl modules to expand upon the features provided by the standard library that comes with the interpreter. The majority of Perl modules are available from CPAN, which has mirrors worldwide, and take the form of prepackaged archives containing the module code itself, supporting scripts or data files (if any), and an installation script called Makefile.PL that is used to generate a makefile to actually carry out the installation.

It is relatively easy to download a Perl module distribution and then install it by hand, either directly into the standard library, or into a local or site installation directory. However, Perl provides the cpan tool to automate the process of finding, downloading, building, installing, and testing a new module distribution.

Perl module distributions come in two distinct forms—those that bind to C (or C++) and require a C compiler to install, commonly called extensions, and pure-Perl modules that don't. While the latter can be installed almost anywhere, extensions obviously need a functional compiler available. For most Unix-like platforms, this is rarely an issue, but for Windows it is easier to use the PPM tool, which is essentially "CPAN for Windows" and which retrieves precompiled CPAN modules from the ActiveState PPM repository, eliminating the need for a compiler or other build tools.

In this section, we will first look at installing modules by hand, before moving on to see how the cpan and PPM tools can automate much of the manual labor for us.

Installing Modules by Hand

The process of installing modules is essentially the same for every platform; the most significant difference between different platforms is the tools required to perform each step of unpacking, building, and installing a module.

Installing Modules on Unix

Unix systems can install module packages downloaded directly from a CPAN mirror (see http://cpan.org). The package filename usually takes the form Some-Module-1.23.tar.gz, that is, a gzipped tarball. To install it, we first unpack it.

> gunzip -c (or gzip -dc ) Some-Module-1.23.tar.gz | tar xvf -

If we have the GNU version of tar (any Linux- or BSD-based system should have this), we can perform both steps in one go with

> tar -zxvf Some-Module-1.23.tar.gz

Either approach extracts the files from the archive without decompressing the archive file itself.

Having unpacked the module, we go into the source directory (which should have the same name, minus the tar.gz suffix) and generate Makefile from the supplied Makefile.PL file.

> cd Some-Module-1.23
> perl Makefile.PL

Next, we install and test the module using make.

> make
> make test
> su


Password:

# make install

Finally, if we want to keep the original source directory but clean out all the additional files created by the building and installation process, we can use the standard

> make clean

If we change our minds and still have the source at hand, we can also uninstall the package again.

# make uninstall

Installing Modules on Windows

To install CPAN modules by hand on Windows, we use the same general technique as with Unix, only with different decompression software. The .zip package files can be decompressed using the popular WinZip, infozip, or any other ZIP-compatible decompression tool. Most of these tools can also handle Unix-style .tar.gz files of the kind supplied by CPAN.

We can install the unpacked distribution with a make tool (such as nmake) in the usual way. For example, assuming dmake:

> nmake
> nmake test
> nmake install
> nmake clean

This of course relies on a functional C compiler being available if the module happens to contain C code. For pure Perl modules, this isn't a problem.

If we don't have a make equivalent, and we don't have any C code to deal with, we can sometimes get away with simply copying the files that were unpacked into an appropriate part of the standard library tree, for instance the sitelib branch. However, some modules use a module called AutoSplit on installation to carve up the module source. (Another module, AutoLoad, then loads and compiles pieces of the module on demand.) Since we are unable to make use of the standard installation process, we need to perform a split ourselves to completely install the module. If a module file uses the AutoLoad module, we will need to run the following from the top of Perl's installation to finish the installation:

> perl -MAutoSplit -e "autosplit sitelib to module.pm sitelibauto"

(The AutoLoad and AutoSplit modules are covered in more detail in Chapter 10.)

Installing Modules on Macintosh

Again, MacOS X is just a particular case of the more general Unix case. Only the default install locations differ as the layout is a little different from most other Unix platforms.

Precompiled binary versions of some packages for OS 9 and earlier created by Chris Nandor are also available from http://cpan.org/authors/id/CNANDOR/. As usual, use a mirror of http://cpan.org if possible. Also available from this page is cpan-mac, a package of utilities that makes it possible to use the CPAN module with MacPerl. (The very latest versions of MacPerl include this functionality into the distribution.) However, it will not build modules that contain C code.

OS 9 systems can unpack source package files with Stuffit or one of its many cousins to get the unpacked source. If the package requires building C source, then things get tricky—a copy of the MPW is required, plus a certain amount of dexterity. The MacPerl homepage, http://www.macperl.com/, offers some advice and links that may help with this, as well as installing MacPerl itself.

If the package does not contain C code, then we can install it by manually moving the package files to the Perl library, generally under the site_perl folder, replicating the directory structure of the package as we go. Note that the line endings of the source files need to be Apple format—LF/CR, rather than CR/LF or just LF. The decompression should handle this automatically.

As with Windows, we may also need to manually split the module to complete its installation. This will be the case if the module file uses the AutoLoad module. Since there is no command line on a Macintosh, we will need to create and run the following to finish the installation:

#!perl -w
use AutoSplit;
autosplit "${MACPERL}site_perl:Path:To:Module.pm",
   "${MACPERL}site_perl:auto";

Installing Modules on Other Platforms

The CPAN website contains some instructions for installing modules on several platforms on the Installing CPAN Modules page at http://cpan.org/modules/INSTALL.html. Some of this information is also available from perldoc perlmodinstall, both sources with varying degrees of concurrency.

Installing Modules from CPAN

The CPAN module and cpan command provide a Perl-based interface to the CPAN archive for Unix-like platforms. We can use either the module or the tool to scan CPAN for new packages, check for updates on existing packages, and install new ones, including dependent modules and module bundles.

For Windows, ActivePerl provides the PPM tool, which provides a repository of many Perl modules from CPAN, preconfigured and built for several different Windows platforms. PPM is covered near the end of the chapter.

A new alternative interface to the CPAN repository is the CPAN++, a.k.a. CPANPLUS project. While this is currently still in development, it can be downloaded from CPAN and installed like any other module. It provides a cpanp tool in place of cpan, and will become the replacement for the CPAN module from Perl 5.10 onwards. While not yet ready for general release, CPANPLUS is well worth a look for those already familiar with the CPAN module or in search of a better tool for the job. For more information, see http://cpanplus.sourceforge.net/.

Starting and Configuring CPAN

To start the CPAN module from the command line, we can invoke the shell mode with

# perl -MCPAN -e shell

This command runs Perl, loading the CPAN module into memory and running the shell subroutine, which provides an interactive interface. If we don't want to use the shell, we can also use the cpan utility to run one command at a time. While its usage is very slightly different, the utility provides access to all the same underlying features as the module.

If we have Perl installed in a privileged place, we will need to be superuser to actually install a module (though we can still perform queries). If CPAN has never been configured, it will run through a set of questions to set itself up. Like Perl's build configuration, most of these are self-evident, and others are computed by default. However, the module needs to fetch several resources, including a current list of CPAN mirrors during the installation process, so it is very helpful (though not absolutely necessary) to have an active Internet connection during the configuration process.

Unix systems will generally have no trouble, but non-Unix systems will need to make sure that they have acceptable versions of at least some of the following command-line utilities:

  • A copy of gzip.
  • A tar program.
  • A zip/unzip program like WinZip or infozip.
  • A make program, e.g., dmake or nmake for Windows. nmake can be downloaded from http://www.microsoft.com and comes as standard with many Visual Studio products and the freely downloadable Microsoft Visual C++ Toolkit 2003 available at http://msdn.microsoft.com/visualc/vctoolkit2003/.
  • A copy of the open source lynx browser (this is only necessary if Net::FTP is not installed yet).
  • An noninteractive FTP command-line client (e.g., ncftpget, freely available and installed on many Linux and other Unix-like platforms).

The configuration process will also ask about FTP and HTTP proxies, and then fetch a list of mirrors from which we should pick two or three in the local area. This is the only part of the configuration process that requires us to make some considered choices, rather than entering return to accept the default. Select the appropriate region and country, then enter three or more numbers for the servers that will be used to download modules.

Note that any of these options can be changed later with the o command in shell mode.

The CPAN module supports command-line editing and history, if supplied by the Term::ReadLine module. This, and the Term::ReadKey module, can of course be installed with the CPAN module.

Once we have the CPAN module configured, we should have a cpan> prompt at which we can enter a variety of commands. Typing h or ? will generate a helpful list of available commands. If we have Term::ReadLine and Term::ReadKey installed and the GNU readline library available (see Chapter 15 for more on all of these), the prompt should be highlighted and underlined, otherwise we can install them.

cpan> install Term::ReadKey
...

cpan> install Term::ReadLine
...

We can also install modules directly from the command line using the cpan utility.

> cpan Term::ReadKey Term::ReadLine

Installing these modules significantly improves the CPAN shell by adding better interaction and a command-line history. Alternatively, we can install the CPAN bundle, which contains all the modules that CPAN can use, using the following command:

cpan> install Bundle::CPAN
...

Either way, the CPAN module will try to fetch the module or modules we requested. If we do not have an up-to-date copy of the Net::FTP or LWP module (for FTP or HTTP transfers, respectively) installed, then it will try to use the lynx browser to fetch it. If we don't have any of these installed, we will have to use an FTP client to fetch and install at least one of them manually. We should use reload cpan after executing any of the preceding commands to update the running CPAN shell.

Installing Modules

The CPAN module provides four commands related to installation. The main one, and in practice the one we tend to use the most often, is install. This takes a bundle, distribution, or module name (it is not practical to install an author, although it would be an intriguing concept) and determines the appropriate distribution file or files to fetch, build, and install. This means we can install modules without needing to worry about which actual distribution file they belong in. For example, we can say

cpan> install Apache::Registry

Installation is a multistep process:

  1. The currently installed version (if any) is compared with that available from CPAN. If the installation is up-to-date and we are not doing a force (see "Installing Modules in Stages" later in this chapter), then the installation terminates.
  2. The distribution file (named explicitly, by bundle, or inferred from the module requested) is fetched from the first configured CPAN mirror and unpacked.
  3. Next, the module is built using perl Makefile.PL followed by make.
  4. Next, the module is tested using make test. For most distributions, this executes a test suite implemented using the Test and Test::Harness modules. If this stage fails, the module installation aborts with an error, unless we used the force modifier, which is explained shortly.
  5. Finally, if the distribution passed the test stage, it is installed in Perl's library.

We can, if we choose, also carry out these stages individually, but this is usually only done if there is a problem with the automatic installation.

Module Prerequisites

Many modules have prerequisites, other modules that must be installed first. If we try to install a module with missing prerequisites, the CPAN module can install them for us, either prompting us (in ask mode), downloading and installing them automatically (in follow mode), or ignoring them (in ignore mode, though it is unlikely a module will install if any of its prerequisites are missing). The default behavior is set when we first configure the module, and can be changed later with, for example:

cpan> o conf prerequisites_policy follow

Installing Modules in Stages

Instead of using the install command to fetch, build, test, and install a module in one go, we can use the get, make, and test commands to perform individual steps, and then finish with install. This can be occasionally useful for examining the outcome of each stage before proceeding to the next; generally useful if the automated install process fails for some reason.

If an installation fails in the test phase, or CPAN thinks that a module or distribution is already installed and up to date, it will decline to do so a second time. We can override this with force, for example:

cpan> force install Term::ReadLine

It is perfectly fine to do this, but please be aware that the distribution might not function entirely correctly (or in extreme cases, at all). Some test failures may involve features that are not applicable to a particular site or operating system. Read the output of the test phase carefully before deciding whether to use force.

Examining the Contents of a Distribution

It is rare that we would actually want to examine the contents of an unpacked distribution directly, but if we want to it's possible using the look command. This opens a system shell in the root directory of the unpacked distribution where the files can be listed:

cpan> get CGI
...

cpan> look CGI
> ls
...

We can also execute make commands from here directly, if necessary (for example, make install UNINST=1 to install a new module removing any older installed versions at the same time). However, look does not perform any kind of remote access; if we try to look for something that we don't have installed, then it must be fetched and downloaded first.

Cleaning Up

Once a package has been installed, we can clean up the files generated during the build process with clean. This accepts a bundle, module, or distribution as an argument and issues a make clean on the unpacked sources cached by the CPAN module. Clearly this makes sense only if the package has actually been installed recently. For example:

cpan> clean Net::FTP

Cleaning up is useful if we are very short of disk space, otherwise this step can usually be skipped as the CPAN module automatically clears out the oldest build sources. This occurs whenever the total exceeds the cache size specified when we configured it at the module startup.

Scanning and Searching CPAN

CPAN categorizes its archive in four different ways: by author, by bundle, by distribution, and by module.

  • Author: These are the authors of the distributions available on CPAN, listed by code and full name.
  • Bundle: Groups of module distributions that are commonly installed together are collected into special distributions called bundles, which simply list a set of related distributions. Installing a bundle saves time and effort by avoiding the need to install a collection of modules one by one. Note that bundles do not contain any source code themselves; they are just a list of other distributions to install. All bundles are given names starting "Bundle::" to distinguish them from "real" packages.
  • Distribution: The actual distribution files for Perl packages, including the directory prefixes for the author whose distribution it is.
  • Module: As well as a list of distributions, CPAN keeps track of all the modules provided by those distributions. Since we can install distributions by naming a module inside them, we rarely need to actually type in a distribution name.

To search CPAN and list records, the CPAN module provides the a, b, d, and m commands for specific categories, or the i command, which returns information on all of these categories. On its own, the i command will return a complete list of every record on CPAN.

cpan> i

This of course takes time and produces a very long and unwieldy list, even though the information is fetched from the mirror and stored locally. More usefully, we can narrow down the result by supplying a literal string or a regular expression. The literal string must match exactly.

cpan> i CGI

A regular expression (covered in Chapter 11) can be more specific as well as more flexible. To search for anything with "CGI" in it (case insensitively, incidentally):

cpan> i /CGI/

To search for anything that begins with "CGI", "cgi", "Cgi", and so on, use

cpan> i /^CGI/

To search for anything in the CGI module hierarchy:

cpan> i /^CGI::/

Alternatively, we can use the a, b, d, or m command to search for specific authors, bundles, distributions, or modules. These work identically to the i command but only return information for the specified category. For example, to list all modules containing XML but not distributions:

cpan> m /XML/

To find information for a particular author:

cpan> a DOUGM

To find all authors called Bob:

cpan> a /^Bob/

To find distributions with a version number of 1.x:

cpan> d /-1.d/

Finally, to list all available bundles:

cpan> b

Listing Out-of-Date Modules

We can find out which modules and distributions have been updated on CPAN compared to our locally installed copies with the r command.

cpan> r


Package namespace   installed   latest  in CPAN file
CGI                      3.00     3.05  L/LD/LDS/CGI.pm-3.05.tar.gz
DBI                      1.41     1.45  T/TI/TIMB/DBI-1.45.tar.gz
Devel::Cover             0.47     0.50  P/PJ/PJCJ/Devel-Cover-0.50.tar.gz
Storable                 2.08     2.13  A/AM/AMS/Storable-2.13.tar.gz
URI                      1.23     1.34  G/GA/GAAS/URI-1.34.tar.gz

Since this command requires that the entire contents of the locally installed Perl library are examined for version numbers and compared to the current versions available at CPAN, it takes a short while to execute. We can narrow down the list we want to check for by supplying a string or regular expression. For example, to check for all out-of-date XML modules, the following works:

cpan> r /XML/

This regular expression simply matches any module with "XML" in the title. If we only want to know about XML modules in the XML hierarchy, we can specify a more explicit expression by anchoring it at the front.

cpan> r /^XML::/

Listing Available Modules

In a similar manner to the r command, we can also list all available but not yet installed modules with u. Not surprisingly, this generates a rather large list of modules, so it is more useful with a regular expression. For example, to find all uninstalled modules in the "XML" family:

cpan> u /^XML::/

Reading Documentation Without Installing

If we want to find out more about a module before installing it, since there are often multiple modules available for some tasks, we can use the readme command. This will look for the readme file within the distribution and, if present, fetch and display it to us without installing the module.

cpan> readme Apache::Session

Reloading CPAN

We can reload both the index information and the CPAN module itself with the reload command, should either be out of date (it is possible, though perhaps not likely, to have a permanent CPAN shell active on a server; updating the index from time to time would then be useful). To reload the index, use

cpan> reload index

To reload the CPAN and associated modules, use reload cpan. For example, to upgrade CPAN itself we can use

cpan> install Bundle::CPAN
cpan> reload cpan

The module will even prompt us to do this if our version of the CPAN module has been superseded.

Configuring CPAN Options

The o command configures CPAN options, and allows us to change any choice that we made during the original setup. To list all current configurations, use

cpan> o conf

To view a specific option, name the option. For example, to find the size of the build cache (where unpacked distribution sources are kept):

cpan> o conf build_cache


build_cache   10

To set an option, name the option and the new value. For example, to increase the build cache to 20 megabytes:

cpan> o conf build_cache 20

We can also set debugging options for various parts of the CPAN module. For general usage we would not normally want to do this, but a list of debugging options can be generated with

cpan> o debug

The cpan utility is convenient for Windows too, if we are using an environment like Cygwin. But for more traditional Windows setups, the ppm tool is an easier option.

Installing Modules on Windows with PPM

As mentioned earlier, the ActiveState port of Perl, ActivePerl, comes with the PPM package tool. The primary interface is the ppm command-line utility, which installs PPM modules, which are CPAN modules prebuilt and repackaged, ready for installation on Windows platforms. The distinction between a PPM module and a CPAN module is moot in the case of pure-Perl modules, but for modules that contain C or C++ code, PPM avoids the need to have a compiler available. The ppm tool works in a very similar way to the CPAN module and cpan command that inspired it, but rather than talking to a CPAN mirror, it by default connects to the PPM archive at http://www.activestate.com/PPMPackages/5.8/.

An example of its simplicity of use is as follows. Say we wish to install a package named Math::Matrix, which is not installed by default. Then we simply have to issue the command line at the prompt.

> ppm install Math::Matrix

As if by magic, and in a similar fashion to the CPAN module for Unix, the package is retrieved and installed automatically ready for our use and abuse.

The ppm tool is one of the most attractive reasons to use ActivePerl to run Perl applications on Windows. More information and a list of alternative mirrors is available in the PPM FAQ document at http://ASPN.ActiveState.com/ASPN/docs/ActivePerl/faq/ActivePerl-faq2.html. An excellent resource for an up-to-date list of modules available for use with PPM is http://www.activestate.com/ppmpackages/5.8/.

Summary

In this chapter, we looked at the key features and history of Perl, and saw how to download and install Perl binary distributions. We also examined some of the reasons why we might want to build Perl from source, and the various ways that we can control the features of the resulting Perl interpreter. One significant reason to build Perl from source on Unix systems is to enable support for threads, but there are others too, including 64-bit integer support (even on 32-bit platforms) or customizing the default library search path.

Once Perl is installed, we are ready to run Perl applications and scripts. We looked at starting applications on Unix, Windows, and Macintosh platforms, and covered Perl's many command-line options. Perl is surprisingly versatile run directly from the command line, and we also spent some time looking at Perl's role as a generic command-line utility.

There is more to Perl than the standard Perl distribution, and in the final part of the chapter we looked at CPAN, the Comprehensive Perl Archive Network, and saw how to download and install new modules, either by hand, or using the CPAN module and cpan utility under Unix, or the ppm utility on Windows platforms.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset