The CGI.pm module has become the standard tool for creating CGI scripts in Perl. It provides a simple interface for most of the common CGI tasks. Not only does it easily parse input parameters, but it also provides a clean interface for outputting headers and a powerful yet elegant way to output HTML code from your scripts.
We will cover most of the basics here and will revisit CGI.pm later to look at some of its other features when we discuss other components of CGI programming. For example, CGI.pm provides a simple way to read and write to browser cookies, but we will wait to review that until we get to our discussion about maintaining state, in Chapter 11.
If after reading this chapter you are interested in more information, the author of CGI.pm has written an entire book devoted to it: The Official Guide to Programming with CGI.pm by Lincoln Stein ( John Wiley & Sons).
Because CGI.pm offers so many methods, we’ll organize our discussion of CGI.pm into three parts: handling input, generating output, and handling errors. We will look at ways to generate output both with and without CGI.pm. Here is the structure of our chapter:
Handling Input with CGI.pm
Information about the environment. CGI.pm has
methods that provide information that is similar, but somewhat
different from the information available in %ENV
.
Form input. CGI.pm automatically parses parameters passed to you via HTML forms and provides a simple method for accessing these parameters.
File uploads. CGI.pm allows your CGI script to handle HTTP file uploads easily and transparently.
Generating Output with CGI.pm
Generating headers. CGI.pm has methods to help you output HTTP headers from your CGI script.
Generating HTML. CGI.pm allows you to generate full HTML documents via corresponding method calls.
Alternatives for Generating Output
Quoted HTML and here documents. We will compare alternative strategies for outputting HTML.
Handling Errors
Trapping die. The standard way to handle errors
with Perl, die
, does not work cleanly with CGI.
CGI::Carp. The CGI::Carp module distributed with
CGI.pm makes it easy to trap die
and other error
conditions that may kill your script.
Custom solutions. If you want more control when displaying errors to your users, you may want to create a custom subroutine or module.
Let’s start with a general overview of CGI.pm.
CGI.pm requires Perl 5.003_07 or higher and has been included with the standard Perl distribution since 5.004. You can check which version of Perl you are running with the -v option:
$ perl -v This is perl, version 5.005 Copyright 1987-1997, Larry Wall Perl may be copied only under the terms of either the Artistic License or the GNU General Public License, which may be found in the Perl 5.0 source kit.
You can verify whether CGI.pm is installed and which version by doing this:
$ perl -MCGI -e 'print "CGI.pm version $CGI::VERSION ";' CGI.pm version 2.56
If you get something like the following, then you do not have CGI.pm installed, and you will have to download and install it. Appendix B, explains how to do this.
Can't locate CGI.pm in @INC (@INC contains: /usr/lib/perl5/i386-linux/5.005 /usr/ lib/perl5 /usr/lib/perl5/site_perl/i386-linux /usr/lib/perl5/site_perl .). BEGIN failed--compilation aborted.
New versions of CGI.pm are released regularly, and most releases include bug fixes.[6] We therefore recommend that you install the latest version and monitor new releases (you can find a version history at the bottom of the cgi_docs.html file distributed with CGI.pm). This chapter discusses features introduced as late as 2.47.
Before we get started, you should make a minor change to your copy of CGI.pm. CGI.pm handles HTTP file uploads and automatically saves the contents of these uploads to temporary files. This is a very convenient feature, and we’ll talk about this later. However, file uploads are enabled by default in CGI.pm, and it does not impose any limitations on the size of files it will accept. Thus, it is possible for someone to upload multiple large files to your web server and fill up your disk.
Clearly, the vast majority of your CGI scripts do not accept file uploads. Thus, you should disable this feature and enable it only in those scripts where you wish to use it. You may also wish to limit the size of POST requests, which includes file uploads as well as standard forms submitted via the POST method.
To make these changes, locate CGI.pm in your Perl libraries and then search for text that looks like the following:
# Set this to a positive value to limit the size of a POSTing # to a certain number of bytes: $POST_MAX = -1; # Change this to 1 to disable uploads entirely: $DISABLE_UPLOADS = 0;
Set $DISABLE_UPLOADS
to 1. You may wish to set
$POST_MAX
to a reasonable upper bound as well,
such as 100KB. POST requests that are not file uploads are processed
in memory, so restricting the size of POST requests avoids someone
submitting multiple large POST requests that quickly use up available
memory on your server. The result looks like this:
# Set this to a positive value to limit the size of a POSTing # to a certain number of bytes: $POST_MAX = 102_400; # 100 KB # Change this to 1 to disable uploads entirely: $DISABLE_UPLOADS = 1;
If you then want to enable uploads and/or allow a greater size for
POST requests, you can override these values in your script by
setting $CGI::DISABLE_UPLOADS
and
$CGI::POST_MAX
after you use the CGI.pm module,
but before you create a CGI.pm object. We will look at how to receive
file uploads later in this chapter.
You may need special permission to update your CGI.pm file. If your system administrator for some reason will not make these changes, then you must disable file uploads and limit POST requests on a script by script basis. Your scripts should begin like this:
#!/usr/bin/perl -wT use strict; use CGI; $CGI::DISABLE_UPLOADS = 1; $CGI::POST_MAX = 102_400; # 100 KB my $q = new CGI; . .
Throughout our examples, we will assume that the module has been patched and omit these lines.
CGI.pm is a big module. It provides functions for accessing CGI environment variables and printing outgoing headers. It automatically interprets form data submitted via POST, via GET, and handles multipart-encoded file uploads. It provides many utility functions to do common CGI-related tasks, and it provides a simple interface for outputting HTML. This interface does not eliminate the need to understand HTML, but it makes including HTML inside a Perl script more natural and easier to validate.
Because CGI.pm is so large, some people consider it bloated and complain that it wastes memory. In fact, it uses many creative ways to increase the efficiency of CGI.pm including a custom implementation of SelfLoader. This means that it loads only code that you need. If you use CGI.pm only to parse input, but do not use it to produce HTML, then CGI.pm does not load the code for producing HTML.
There have also been some alternative, lightweight CGI modules written. One of the lightweight alternatives to CGI.pm was begun by David James; he got together with Lincoln Stein and the result is a new and improved version of CGI.pm that is even smaller, faster, and more modular than the original. It should be available as CGI.pm 3.0 by the time you read this book.
CGI.pm, like Perl, is powerful yet flexible. It supports two styles of usage: a standard interface and an object-oriented interface. Internally, it is a fully object-oriented module. Not all Perl programmers are comfortable with object-oriented notation, however, so those developers can instead request that CGI.pm make its subroutines available for the developer to call directly.
Here is an example. The object-oriented syntax looks like this:
use strict; use CGI; my $q = new CGI; my $name = $q->param( "name" ); print $q->header( "text/html" ), $q->start_html( "Welcome" ), $q->p( "Hi $name!" ), $q->end_html;
The standard syntax looks like this:
use strict; use CGI qw( :standard ); my $name = param( "name" ); print header( "text/html" ), start_html( "Welcome" ), p( "Hi $name!" ), end_html;
Don’t worry about the details of what the code does right now;
we will cover all of it during this chapter. The important thing to
notice is the different syntax. The first script creates a
CGI.pm object and stores it in
$q
($q
is short for
query and is a common convention for CGI.pm
objects, although $cgi
is used sometimes, too).
Thereafter, all the CGI.pm
functions are preceded by
$q->
. The second asks CGI.pm to export the
standard functions and simply uses them directly. CGI.pm provides
several predefined groups of functions, like :standard
, that can be exported into your CGI script.
The standard CGI.pm syntax certainly has less noise. It doesn’t
have all those $q->
prefixes. Aesthetics aside,
however, there are good arguments for using the object oriented
syntax with CGI.pm.
Exporting functions has its costs. Perl maintains a separate namespace for different chunks of code referred to as packages. Most modules, like CGI.pm, load themselves into their own package. Thus, the functions and variables that modules see are different from the modules and variables you see in your scripts. This is a good thing, because it prevents collisions between variables and functions in different packages that happen to have the same name. When a module exports symbols (whether they are variables or functions), Perl has to create and maintain an alias of each of the these symbols in your program’s namespace, the main namespace. These aliases consume memory. This memory usage becomes especially critical if you decide to use your CGI scripts with FastCGI or mod_perl.
The object-oriented syntax also helps you avoid any possible collisions that would occur if you create a subroutine with the same name as one of CGI.pm’s exported subroutines. Also, from a maintenance standpoint, it is clear from looking at the object-oriented script where the code for the header function is: it’s a method of a CGI.pm object, so it must be in the CGI.pm module (or one of its associated modules). Knowing where to look for the header function in the second example is much more difficult, especially if your CGI scripts grow large and complex.
Some people avoid the object-oriented syntax because they believe it is slower. In Perl, methods typically are slower than functions. However, CGI.pm is truly an object-oriented module at heart, and in order to provide the function syntax, it must do some fancy footwork to manage an object for you internally. Thus with CGI.pm, the object-oriented syntax is not any slower than the function syntax. In fact, it can be slightly faster.
We will use CGI.pm’s object-oriented syntax in most of our examples.
CGI.pm primarily handles two separate tasks: it reads and parses input from the user, and it provides a convenient way to return HTML output. Let’s first look at how it collects input.
CGI.pm provides many
methods to get information about your
environment. Of course, when you use CGI.pm, all of your standard CGI
environment variables are still available in Perl’s
%ENV
hash, but CGI.pm also makes most of these
available via method calls. It also provides some unique methods.
Table 5.1 shows how CGI.pm’s functions
correspond to the standard CGI environment variables.
Table 5-1. CGI.pm Environment Methods and CGI Environment Variables
CGI.pm Method |
CGI Environment Variable |
---|---|
|
AUTH_TYPE |
Not available |
CONTENT_LENGTH |
|
CONTENT_TYPE |
Not available |
DOCUMENT_ROOT |
Not available |
GATEWAY_INTERFACE |
|
PATH_INFO |
|
PATH_TRANSLATED |
|
QUERY_STRING |
|
REMOTE_ADDR |
|
REMOTE_HOST |
|
REMOTE_IDENT |
|
REMOTE_USER |
|
REQUEST_METHOD |
|
SCRIPT_NAME |
|
Not available |
|
SERVER_NAME |
|
SERVER_PORT |
|
SERVER_PROTOCOL |
|
SERVER_SOFTWARE |
|
Not available |
|
HTTP_ACCEPT |
|
HTTP_ACCEPT_CHARSET |
|
HTTP_ACCEPT_ENCODING |
|
HTTP_ACCEPT_LANGUAGE |
|
HTTP_COOKIE |
|
HTTP_FROM |
|
HTTP_HOST |
|
HTTP_REFERER |
|
HTTP_USER_AGENT |
|
HTTPS |
|
HTTPS_CIPHER |
|
HTTPS_KEYSIZE |
|
HTTPS_SECRETKEYSIZE |
Most of these CGI.pm methods take no arguments and return that same value as the corresponding environment variable. For example, to get the additional path information passed to your CGI script, you can use the following method:
my $path = $q->path_info;
This is the same information that you could also get this way:
my $path = $ENV{PATH_INFO};
However, a few methods differ or have features worth noting. Let’s take a look at these.
As a general rule, if a CGI.pm method has
the same name as a built-in Perl function or keyword (e.g.,
accept
or tr
), then the
CGI.pm method is
capitalized. Although there would be no
collision if CGI.pm were available only via an object-oriented
syntax, the collision creates problem for people who use it via the
standard syntax. accept
was originally
lowercase, but it was renamed to Accept
in
version 2.44 of CGI.pm, and the new name affects both syntaxes.
Unlike the other methods that take no arguments and simply return a
value, Accept
can also be given a
content type and it will
evaluate to true or false depending on whether that content type is
acceptable according to the
HTTP-Accept
header:
if ( $q->Accept( "image/png" ) ) { . . .
Keep in mind that most browsers today send
*/*
in their Accept
header. This matches anything, so using the
Accept
method in this manner is not especially
useful. For new file formats like image/png
, it is
best to get the values for the HTTP header and perform the check
yourself, ignoring wildcard matches (this is unfortunate, since it
defeats the purpose of wildcards):
my @accept = $q->Accept; if ( grep $_ eq "image/png", @accept ) { . . .
If the http
method is called without arguments, it
returns the name of the environment variables available that contain
an HTTP_ prefix. If you call http
with an
argument, then it will return the value of the corresponding HTTP_
environment variable. When passing an argument to
http
, the HTTP_ prefix is optional,
capitalization does not matter, and hyphens and underscores are
interpreted the same. In other words, you can pass the actual HTTP
header field name or the environment variable name or even some
hybrid of the two, and http
will generally
figure it out. Here is how you can display all the HTTP_ environment
variables your CGI script receives:
#!/usr/bin/perl -wT use strict; use CGI; my $q = new CGI; print $q->header( "text/plain" ); print "These are the HTTP environment variables I received: "; foreach ( $q->http ) { print "$_: "; print " ", $q->http( $_ ), " "; }
The
https
method functions
similarly to the http
method when it is passed a
parameter. It returns the corresponding
HTTPS_ environment
variable. These variables are set by your web server only if you are
receiving a secure request via SSL. When https
is called without arguments, it returns the value of the HTTPS
environment variable, which indicates whether the connection is
secure (its values are server-dependent).
The query_string
method does not do what you might
think since it does not correspond one-to-one with
$ENV{QUERY_STRING}
.
$ENV{QUERY_STRING}
holds the query portion of the
URL that called your CGI script. query_string
,
on the other hand, is dynamic, so if you modify any of the query
parameters in your script (see Section 5.2.2.1
later in this chapter), then the value returned by
query_string
will include these new values. If
you want to know what the original query string was, then you should
refer to $ENV{QUERY_STRING}
instead.
Also, if the request method is POST, then
query_string
returns the POST parameters that
were submitted in the content of the request, and ignores any
parameters passed to the CGI script via the query string. This means
that if you create a form that submits its values via POST to a URL
that also contains a query string, you will not be able to access the
parameters on the query string via CGI.pm unless you make a slight
modification to CGI.pm to tell it to include parameters from the
original query string with POST requests. We’ll see how to do
this in Section 5.2.2.2 later in this chapter.
This method does not correspond to a standard CGI
environment variable, although you could
manually construct it from other environment variables. It provides
you with a URL that can call your CGI with the same
parameters. The path information is maintained and the query string
is set to the value of the query_string
method.
Note that this URL is not necessarily the same URL that was used to call your CGI script. Your CGI script may have been called because of an internal redirection by the web server. Also, because all of the parameters are moved to the query string, this new URL is built to be used with a GET request, even if the current request was a POST request.
The url
method functions similarly to the
self_url
method, except that it returns a URL to
the current CGI script without any parameters, i.e., no path
information and an empty query string.
The virtual_host
method is handy because it returns the
value of the HTTP_HOST environment variable, if set, and SERVER_NAME
otherwise. Remember that HTTP_HOST is the name of the web server as
the browser referred to it, which may differ if multiple domains
share the same IP address. HTTP_HOST is available only if the browser
supplied the Host HTTP header, added for HTTP 1.1.
param
is probably the most useful method
CGI.pm provides. It allows you to access the parameters submitted to
your CGI
script, whether these parameters come to you via a GET request or a
POST request. If you call param
without
arguments, it will return a list of all of the parameter names your
script received. If you provide a single argument to it, it will
return the value for the parameter with that name. If no parameter
with that name was submitted to your script, it returns
undef
.
It is possible for your CGI script to receive multiple values for a
parameter with the same name. This happens when you create two form
elements with the same name or you have a select box that allows
multiple selections. In this case, param
returns
a list of all of the values if it is called in a list context and
just the first value if it is called in a scalar context. This may
sound a little complicated, but in practice it works such that you
should end up with what you expect. If you ask
param
for one value, you will get one value
(even if other values were also submitted), and if you ask it for a
list, you will always get a list (even if the list contains only one
element).
Example 5.1 is a simple example that displays all the parameters your script receives.
Example 5-1. param_list.cgi
#!/usr/bin/perl -wT use strict; use CGI; my $q = new CGI; print $q->header( "text/plain" ); print "These are the parameters I received: "; my( $name, $value ); foreach $name ( $q->param ) { print "$name: "; foreach $value ( $q->param( $name ) ) { print " $value "; } }
If you call this CGI script with multiple parameters, like this:
http://localhost/cgi/param_list.cgi?color=red&color=blue&shade=dark
you will get the following output:
These are the parameters I received: color: red blue shade: dark
CGI.pm also lets you add, modify, or delete the value of
parameters within your script. To add or
modify a parameter, just pass param
more than
one argument. Using Perl’s =>
operator
instead of a comma makes the code easier to read and allows you to
omit the quotes around the parameter name, so long as it’s a
word (i.e., only contains includes letters, numbers, and underscores)
that does not conflict with a built-in function or keyword:
$q->param( title => "Web Developer" );
You can create a parameter with multiple values by passing additional arguments:
$q->param( hobbies => "Biking", "Windsurfing", "Music" );
To delete a parameter, use the
delete
method and provide the name of the
parameter:
$q->delete( "age" );
You can clear all of the parameters with
delete_all
:
$q->delete_all;
It may seem odd that you would ever want to modify parameters yourself, since these will typically be coming from the user. Setting parameters is useful for many reasons, but especially when assigning default values to fields in forms. We will see how to do this later in this chapter.
param
automatically determines if the request
method is
POST or GET. If it is POST, it reads any
parameters submitted to it from STDIN. If it is GET, it reads them
from the query string. It is possible to POST information to a URL
that already has a query string. In this case, you have two souces of
input data, and because CGI.pm determines what to do by checking the
request method, it will ignore the data in the query string.
You can change this behavior if you are willing to edit CGI.pm. In
fact, CGI.pm includes comments to help you do this. You can find this
block of code in the
init
subroutine (the line number will
vary depending on the version of CGI.pm you have):
if ($meth eq 'POST') { $self->read_from_client(*STDIN,$query_string,$content_length,0) if $content_length > 0; # Some people want to have their cake and eat it too! # Uncomment this line to have the contents of the query string # APPENDED to the POST data. # $query_string .= (length($query_string) ? '&' : '') . $ENV{'QUERY_STRING'} if defined $ENV{'QUERY_STRING'}; last METHOD; }
By removing the pound sign from the beginning of the line indicated, you will be able to use POST and query string data together. Note that the line you would need to uncomment is too long to display on one line in this text, so it has been wrapped to the next line, but it is just one line in CGI.pm.
You may receive a query string that contains words that do not
comprise name-value pairs. The
<ISINDEX> HTML tag, which is not
used much anymore, creates a single text field along with a prompt to
enter search keywords. When a user enters words into this field and
presses Enter, it makes a new request for the same URL, adding the
text the user entered as the query string with keywords separated by
a
plus sign (+
), such
as this:
http://www.localhost.com/cgi/lookup.cgi?cgi+perl
You can retrieve the list of keywords that the user entered by
calling param
with “keywords” as the
name of the parameter or by calling the separate keywords method:
my @words = $q->keywords; # these lines do the same thing my @words = $q->param( "keywords" );
These methods return index keywords only if CGI.pm finds no
name-value pair parameters, so you don’t have to worry about
using “keywords” as the name of an element in your HTML
forms; it will work correctly. On the other hand, if you want to
POST form data to a URL
with a keyword, CGI.pm cannot return that keyword to you. You must
use $ENV{QUERY_STRING}
to get it.
Whether you use <INPUT TYPE="IMAGE” > or <INPUT TYPE="SUBMIT">, the form is still sent to the CGI script. However, with the image button, the name is not transmitted by itself. Instead, the web browser splits an image button name into two separate variables: name.x and name.y.
If you want your program to support image and regular submit buttons interchangeably, it is useful to translate the image button names to normal submit button names. Thus, the main program code can use logic based upon which submit button was clicked even if image buttons later replace them.
To accomplish this, we can use the following code that will set a form variable without the coordinates in the name for each variable that ends in “.x”:
foreach ( $q->param ) { $q->param( $1, 1 ) if /(.*).x/; }
One of the problems with using a method to retrieve the value of a parameter is that it is more work to embed the value in a string. If you wish to print the value of someone’s input, you can use an intermediate variable:
my $name = $q->param( 'user' ); print "Hi, $user!";
Another way to do this is via an odd Perl construct that forces the subroutine to be evaluated as part of an anonymous list:
print "Hi, @{[ $q->param( 'user' ) ]}!";
The first solution is more work and the second can be hard to read. Fortunately, there is a better way. If you know that you are going to need to refer to many output values in a string, you can import all the parameters as variables to a specified namespace:
$q->import_names( "Q" ); print "Hi, $Q::user!";
Parameters with multiple values become arrays in the new namespace, and any characters in a parameter name other than a letter or number become underscores. You must provide a namespace and cannot pass “main”, the default namespace, because that might create security risks.
The price you pay for this convenience is increased memory usage because Perl must create an alias for each parameter.
As we mentioned in the last chapter, it is possible to create a form with a multipart/form-data media type that permits users to upload files via HTTP. We avoided discussing how to handle this type of input then because handling file uploads properly can be quite complex. Fortunately, there’s no need for us to do this because, like other form input, CGI.pm provides a very simple interface for handling file uploads.
You can access the
name of an uploaded file with the
param
method, just like the value of any other
form element. For example, if your CGI script were receiving input
from the following HTML form:
<FORM ACTION="/cgi/upload.cgi" METHOD="POST" ENCTYPE="multipart/form-data"> <P>Please choose a file to upload: <INPUT TYPE="FILE" NAME="file"> <INPUT TYPE="SUBMIT"> </FORM>
then you could get the name of the uploaded file this way, by referring to the name of the <FILE> input element, in this case “file”:
my $file = $q->param( "file" );
The name you receive from this parameter is the name of the file as it appeared on the user’s machine when they uploaded it. CGI.pm stores the file as a temporary file on your system, but the name of this temporary file does not correspond to the name you get from this parameter. We will see how to access the temporary file in a moment.
The name supplied by this parameter varies according to platform and browser. Some systems supply just the name of the uploaded file; others supply the entire path of the file on the user’s machine. Because path delimiters also vary between systems, it can be a challenge determining the name of the file. The following command appears to work for Windows, Macintosh, and Unix-compatible systems:
my( $file ) = $q->param( "file" ) =~ m|([^/:\]+)$|;
However, it may strip parts of filenames, since “report
11/3/99” is a valid filename on Macintosh systems and the above
command would in this case set $file
to
“99”. Another solution is to replace any characters other
than letters, digits, underscores, dashes, and periods with
underscores and
prevent any files from beginning with
periods or dashes:
my $file = $q->param( "file" ); $file =~ s/([^w.-])/_/g; $file =~ s/^[-.]+//;
The problem with this is that
Netscape’s browsers
on Windows sends the full path to the file as the filename. Thus,
$file
may be set to something long and ugly like
“C_ _ _Windows_Favorites_report.doc”.
You could try to sort out the behaviors of the different operating systems and browsers, check for the user’s browser and operating system, and then treat the filename appropriately, but that would be a very poor solution. You are bound to miss some combinations, you would constantly need to update it, and one of the greatest advantages of the Web is that it works across platforms; you should not build any limitations into your solutions.
So the simple, obvious solution is actually nontechnical. If you do need to know the name of the uploaded file, just add another text field to the form allowing the user to enter the name of the file they are uploading. This has the added advantage of allowing a user to provide a different name than the file has, if appropriate. The HTML form looks like this:
<FORM ACTION="/cgi/upload.cgi" METHOD="POST" ENCTYPE="multipart/form-data"> <P>Please choose a file to upload: <INPUT TYPE="FILE" NAME="file"> <P>Please enter the name of this file: <INPUT TYPE="TEXT" NAME="filename"> </FORM>
You can then get the name from the text field, remembering to strip out any odd characters:
my $filename = $q->param( "filename" ); $filename =~ s/([^w.-])/_/g; $filename =~ s/^[-.]+//;
So now that we know how to get the name of the
file uploaded, let’s look at
how we get at the content. CGI.pm creates a temporary file to store
the contents of the upload; you can get a file handle for this file
by passing the name of the file according to the file element to the
upload
method as follows:
my $file = $q->param( "file" ); my $fh = $q->upload( $file );
The upload
method was added to CGI.pm in Version
2.47. Prior to this you could use the value returned by
param
(in this case
$file
) as a file handle in order to read from the
file; if you use it as a string it returns the name of the file. This
actually still works, but there are conflicts with strict mode and
other problems, so upload
is the preferred way
to get a file handle now. Be sure that you pass
upload the name of the file according to
param, and not a different name (e.g., the name
the user supplied, the name with nonalphanumeric characters replaced
with underscores, etc.).
Note that
transfer errors are much more common with
file uploads than with other forms of input. If the user presses the
Stop button in the browser as the file is uploading, for example,
CGI.pm will receive only a portion of the uploaded file. Because of
the format of
multipart/form-data
requests, CGI.pm will
recognize that the transfer is incomplete. You can check for errors
such as this by using the
cgi_error
method after creating a CGI.pm object. It returns the
HTTP status code and message
corresponding to the error, if applicable, or an empty string if no
error has occurred. For instance, if the
Content-length of a POST request exceeds
$CGI::POST_MAX
, then
cgi_error
will return “413 Request entity
too large”. As a general rule, you should always check for an
error when you are recording input on the server. This includes file
uploads and other POST requests. It doesn’t hurt to check for
an error with GET requests either.
Example 5.2 provides the complete code, with error checking, to receive a file upload via our previous HTML form.
Example 5-2. upload.cgi
#!/usr/bin/perl -wT use strict; use CGI; use Fcntl qw( :DEFAULT :flock ); use constant UPLOAD_DIR => "/usr/local/apache/data/uploads"; use constant BUFFER_SIZE => 16_384; use constant MAX_FILE_SIZE => 1_048_576; # Limit each upload to 1 MB use constant MAX_DIR_SIZE => 100 * 1_048_576; # Limit total uploads to 100 MB use constant MAX_OPEN_TRIES => 100; $CGI::DISABLE_UPLOADS = 0; $CGI::POST_MAX = MAX_FILE_SIZE; my $q = new CGI; $q->cgi_error and error( $q, "Error transferring file: " . $q->cgi_error ); my $file = $q->param( "file" ) || error( $q, "No file received." ); my $filename = $q->param( "filename" ) || error( $q, "No filename entered." ); my $fh = $q->upload( $file ); my $buffer = ""; if ( dir_size( UPLOAD_DIR ) + $ENV{CONTENT_LENGTH} > MAX_DIR_SIZE ) { error( $q, "Upload directory is full." ); } # Allow letters, digits, periods, underscores, dashes # Convert anything else to an underscore $filename =~ s/[^w.-]/_/g; if ( $filename =~ /^(w[w.-]*)/ ) { $filename = $1; } else { error( $q, "Invalid file name; files must start with a letter or number." ); } # Open output file, making sure the name is unique until ( sysopen OUTPUT, UPLOAD_DIR . $filename, O_CREAT | O_EXCL ) { $filename =~ s/(d*)(.w+)$/($1||0) + 1 . $2/e; $1 >= MAX_OPEN_TRIES and error( $q, "Unable to save your file." ); } # This is necessary for non-Unix systems; does nothing on Unix binmode $fh; binmode OUTPUT; # Write contents to output file while ( read( $fh, $buffer, BUFFER_SIZE ) ) { print OUTPUT $buffer; } close OUTPUT; sub dir_size { my $dir = shift; my $dir_size = 0; # Loop through files and sum the sizes; doesn't descend down subdirs opendir DIR, $dir or die "Unable to open $dir: $!"; while ( readdir DIR ) { $dir_size += -s "$dir/$_"; } return $dir_size; } sub error { my( $q, $reason ) = @_; print $q->header( "text/html" ), $q->start_html( "Error" ), $q->h1( "Error" ), $q->p( "Your upload was not procesed because the following error ", "occured: " ), $q->p( $q->i( $reason ) ), $q->end_html; exit; }
We start by creating several constants to configure this script.
UPLOAD_DIR
is the path to the directory where we will store uploaded files.
BUFFER_SIZE
is the amount of data to read into memory while transferring from the
temporary file to the output file. MAX_FILE_SIZE
is the maximum file size we will accept; this is important because we
want to limit users from uploading gigabyte-sized files and filling
up all of the server’s disk space.
MAX_DIR_SIZE
is the
maximum size that we will allow our upload directory to grow to. This
restriction is as important as the last because users can fill up our
disks by posting lots of small files just as easily as posting large
files. Finally, MAX_OPEN_TRIES
is the number of
times we try to generate a unique filename and open that file before
we give up; we’ll see why this step is necessary in a moment.
First, we enable file uploads, then we set
$CGI::POST_MAX
to
MAX_FILE_SIZE
. Note
$CGI::POST_MAX
is actually the size of the entire
content of the request, which includes the data for other form fields
as well as overhead for the multipart/form-data
encoding, so this value is actually a little larger than the maximum
file size that the script will actually accept. For this form, the
difference is minor, but if you add a file upload field to a complex
form with multiple text fields, then you should keep this distinction
in mind.
We then create a CGI object and check for errors. As we said earlier, errors with file uploads are much more common than with other forms of CGI input. Next we get the file’s upload name and the filename the user provided, reporting errors if either of these is missing. Note that a user may be rather upset to get a message saying that the filename is missing after uploading a large file via a modem. There is no way to interrupt that transfer, but in a production application, it might be more user-friendly to save the unnamed file temporarily, prompt the user for a filename, and then rename the file. Of course, you would then need periodically clean up temporary files that were abandoned.
We get a file handle, $fh
, to
the temporary file where CGI.pm has stored the input. We check
whether our upload directory is full and report an error if this is
the case. Again, this message is likely to create some unhappy users.
In a production application you should add code to notify an
administrator who can see why the upload directory is full and
resolve the problem. See Chapter 9.
Next, we replace any characters in the filename the user supplied
that may cause problems with an underscore and make sure the name
doesn’t start with a
period or a dash. The odd construct that
reassigns the result of the regular expression to
$filename
untaints that variable. We’ll
discuss tainting and why this is important in Chapter 8. We confirm again that
$filename
is not empty (which would happen if it
had consisted of nothing but periods and/or dashes) and generate an
error if this is the case.
We try to open a file with this name in our upload directory. If we
fail, then we add a digit to $filename
and try
again. The regular expression allows us to keep the file extension
the same: if there is already a report.txt file,
then the next upload with that name will be named
report1.txt, the next one
report2.txt, etc. This continues until we exceed
MAX_OPEN_TRIES
. It is important that
we create a limit to this loop because there may be a reason other
than a non-unique name that prevents us from saving the file. If the
disk is full or the system has too many open files, for example, we
do not want to start looping endlessly. This error should also notify
an administrator that something is wrong.
This script is written to handle any type of file upload, including
binary files such as images or audio. By default, whenever Perl
accesses a file handle on non-Unix systems (more specifically,
systems that do not use
as their end of line
character), Perl translates the native
operating system’s end of line
characters, such as
for Windows or
for MacOS, to
on input
and back to the native characters on output. This works great for
text files, but it can corrupt binary files. Thus, we enable
binary mode with the
binmode
function in order to disable this
translation. On systems, like Unix, where no end of line translation
occurs, binmode
has no effect.
Finally, we read from our temporary file handle and write to our
output file and exit. We use the
read
function to read and write a chunk a data
at a time. The size of this chunk is defined by our
BUFFER_SIZE
constant. In case you are wondering,
CGI.pm will remove its temporary file automatically when our script
exits (technically, when $q
goes out of scope).
There is another way we could have moved the file to our
uploads directory. We could use CGI.pm’s
undocumented
tmpFileName
method to get the name of the
temporary file containing the upload and then used
Perl’s rename
function to move the file. However, relying on undocumented code is
dangerous, because it may not be compatible with future versions of
CGI.pm. Thus, in our example we stick to the public API instead.
The
dir_size
subroutine calculates the size of a
directory by summing the size of each of its files. The
error
subroutine prints a message telling the
user why the transfer failed. In a production application, you
probably want to provide links for the user to get help or to notify
someone
about problems.
CGI.pm provides a very elegant solution for outputting both headers and HTML with Perl. It allows you to embed HTML in your code, but it makes this more natural by turning the HTML into code. Every HTML element can be generated via a corresponding method in CGI.pm. We have already seen some examples of this already, but here’s another:
#!/usr/bin/perl -wT use strict; use CGI; my $q = new CGI; my $timestamp = localtime; print $q->header( "text/html" ), $q->start_html( -title => "The Time", -bgcolor => "#ffffff" ), $q->h2( "Current Time" ), $q->hr, $q->p( "The current time according to this system is: ", $q->b( $timestamp ) ), $q->end_html;
The resulting output looks like this (the indentation is added to make it easier to read):
Content-type: text/html <!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <HTML> <HEAD><TITLE>The Time</TITLE></HEAD> <BODY BGCOLOR="#ffffff"> <H2>Current Time</H2> <HR> <P>The current time according to this system is: <B>Mon May 29 16:48:14 2000</B></P> </BODY> </HTML>
As you can see, the code looks a lot like Perl and a lot less like HTML. It is also shorter than the corresponding HTML because CGI.pm manages some common tags for us. Another benefit is that it is impossible to forget to close a tag because the methods automatically generate closing tags (except for those elements that CGI.pm knows do not need them, like <HR>).
We’ll look at all of these output methods in this section,
starting with the first method, header
.
CGI.pm has two methods for returning
HTTP headers:
header
and redirect
. They correspond
to the two ways you can return data from CGI scripts: you can return
a document, or you can redirect to another document.
The header
method handles
multiple HTTP headers for you. If you
pass it one argument, it returns the
Content-type
header with that value. If you do not
supply a media type, it defaults to “text/html”. Although
CGI.pm makes outputting HTML much easier, you can of course print any
content type with it. Simply use the header
method to specify the media type and then print your content,
whether it be text, XML, Adobe PDF, etc.:
print $q->header( "text/plain" ); print "This is just some boring text. ";
If you want to set other headers, then you need to pass
name-value pairs for each header. Use
the -type
argument to specify the media type (see
the example under Section 5.3.1.2 later in this
chapter).
You can specify a status other than “200 OK” by
using the -status
argument:
print $q->header( -type => "text/html", -status => "404 Not Found" );
Browsers
can’t always tell if
content is
being dynamically generated by CGI or if it is coming from a static
source, and they may try to cache the output of your script. You can
disable this or
request caching if you want it, by
using the -expires
argument. You can supply either
a full time stamp with this argument or a
relative time. Relative times
are created by supplying a plus or minus sign for forward or
backward, an integer number, and a one letter abbreviation for
second, minute, hour, day, month, or year (each of these
abbreviations is lowercase except for month, which is an uppercase
M). You can also use “now” to indicate that a document
should expire immediately. Specifying a negative value also has this
effect.
This example tells the browser that this document is good for the next 30 minutes:
print $q->header( -type => "text/html", -expires => "+30m" );
If you are using frames or have multiple windows, you may want
links in one document to
update another document. You can use the
-target
argument along with the name of the other document (as set by a
<FRAMESET>
tag or by JavaScript) to specify that clicking on a link in this
document should cause the new resource to load in the other frame (or
window):
print $q->header( -type => "text/html", -target => "main_frame" );
This argument is only meaningful for HTML documents.
If you need to redirect to another URL, you can use the
redirect
method instead of printing the
Location HTTP header:
print $q->redirect( "http://localhost/survey/thanks.html" );
Although the term “redirect” is an action, this method does not perform a redirect for you; it simply returns the corresponding header. So don’t forget you still need to print the result!
If you need to generate other HTTP headers, you can simply
pass the name-value pair to header
and it will
return the header with the appropriate formatting. Underscores are
converted to hyphens for you.
Thus, the following statement:
print $q->header( -content_encoding => "gzip" );
produces the following output:
Content-encoding: gzip
Now let’s look at the methods that you can use to generate HTML. We’ll start by looking at the methods for starting and ending documents.
The start_html
method returns the HTML DTD, the
<HTML> tag, the <HEAD> section including <TITLE>,
and the <BODY> tag. In the previous example, it generates HTML
like the following:
<!DOCTYPE HTML PUBLIC "-//IETF//DTD HTML//EN"> <HTML><HEAD><TITLE>The Time</TITLE> </HEAD><BODY BGCOLOR="#ffffff">
The most common arguments start_html
recognizes
are as follows:
Setting the -base
argument to a true value tells
CGI.pm to include a
<BASE
HREF="url"> tag in the head of your document that points to the
URL of your script.
The -meta
argument accepts a reference to a hash
containing the name and content of
meta tags that appear in the head of your
document.
The -script
argument allows you to add
JavaScript to the head of your document. You can either provide a
string containing the JavaScript code or a reference to a
hash containing
-language
, -src
, and
-code
as possible keys. This allows you to specify
the language and source attributes of the
<SCRIPT> tag too. CGI.pm
automatically provides comment tags around the code to protect it
from browsers that do not recognize JavaScript.
The -noscript
argument allows you to
specify HTML display if the browser does not support
JavaScript. It is inserted into the head of your document.
The -style
argument allows you to define a
style sheet for the document.
Like -script
, you may either specify a string or a
reference to a hash. The keys that -style
accepts
in the hash are -code
and -src
.
The value of -code
will be inserted into the
document as style sheet information. The value of
-src
will be a URL to a .css
file. CGI.pm automatically provides
comment tags
around the code to protect cascading style sheets from browsers that
do not recognize them.
The -xbase
argument lets you
specify a URL to use in the <BASE HREF="url"> tag. This is
different from the -base
argument that also
generates this tag but sets it to the URL of the current CGI script.
Any other arguments, like -bgcolor
, are passed as
attributes to the <BODY> tag.
HTML elements can be
generated by
using the lowercase name of the element as
a method, with the following exceptions: Accept
,
Delete
, Link
,
Param
, Select
,
Sub
, and Tr
. These methods
have an initial cap to avoid conflicting with built-in Perl functions
and other CGI.pm methods.
The following rules apply to basic HTML tags:
CGI.pm recognizes that some elements, like <HR> and <BR>, do not have closing tags. These methods take no arguments and return the single tag:
print $q->hr;
This outputs:
<HR>
If you provide one argument, it creates an opening and closing tag to enclose the text of your argument. Tags are capitalized:
print $q->p( "This is a paragraph." );
This prints the text:
<P>This is a paragraph.</P>
If you provide multiple arguments, these are simply joined with the tags at the beginning and the end:
print $q->p( "The server name is:", $q->server_name );
This prints the text:
<P>The server name is: localhost</P>
This usage makes it easy to nest elements:
print $q->p( "The server name is:", $q->em( $q->server_name ) );
This prints the text:
<P>The server name is: <EM>localhost</EM></P>
Note that a space is automatically added between each
list element. It appears after the colon
in these examples. If you wish to print multiple items in a list
without intervening
spaces, then you
must set Perl’s list separator variable, $"
,
to an empty string:
{ local $" = ""; print $q->p( "Server=", $q->server_name ); }
This prints the text:
<P>Server=Apache/1.3.9</P>
Note that whenever you change global variables like
$"
, you should localize them by enclosing them in
blocks and using
Perl’s
local
function.
If the first argument is a reference to a hash, then the hash elements are interpreted as attributes for the HTML element:
print $q->a( { -href => "/downloads" }, "Download Area" );
This prints the text:
<A HREF="/downloads" >Download Area</A>
You can specify as many attributes as you want. The leading hyphen as part of the attribute name is not required, but it is the standard convention.
Some
attributes do not take arguments and simply appear as a word. For
these, pass undef
as the value of the attribute.
Prior to version 2.41 of CGI.pm, passing an empty string would
accomplish the same thing, but that was changed so that people could
explicitly request an attribute set to an empty string (e.g., <IMG
HREF="spacer.gif” ALT="">).
If you provide a reference to an array as an argument, the tag is distributed across each item in the array:
print $q->ol( $q->li( [ "First", "Second", "Third" ] ) );
This corresponds to:
<OL> <LI>First</LI> <LI>Second</LI> <LI>Third</LI> </OL>
This still works fine when the first argument is a reference to a hash arguments. Here is a table:
print $q->table( { -border => 1, -width => "100%" }, $q->Tr( [ $q->th( { -bgcolor => "#cccccc" }, [ "Name", "Age" ] ), $q->td( [ "Mary", 29 ] ), $q->td( [ "Bill", 27 ] ), $q->td( [ "Sue", 26 ] ) ] ) );
This corresponds to:
<TABLE BORDER="1" WIDTH="100%"> <TR> <TH BGCOLOR="#cccccc">Name</TH> <TH BGCOLOR="#cccccc">Age</TH> </TR> <TR> <TD>Mary</TD> <TD>29</TD> </TR> <TR> <TD>Bill</TD> <TD>27</TD> </TR> <TR> <TD>Sue</TD> <TD>26</TD> </TR> </TABLE>
Aside from the spaces we mentioned above that are introduced between array elements, CGI.pm does not insert any whitespace between HTML elements. It creates no indentation and inserts no new lines. Although this makes it harder for a human to read, it also makes the output smaller and downloads faster. If you wish to generate neatly formatted HTML code, you can use the CGI::Pretty module distributed with CGI.pm. It provides all of the features of CGI.pm (because it is an object-oriented module that extends CGI.pm), but the HTML it produces is neatly indented.
The syntax for generating form elements differs from other elements. These methods only take name-value pairs that correspond to the attributes. See Table 5.2.
Table 5-2. CGI.pm Methods for HTML Form Elements
CGI.pm Method |
HTML Tag |
---|---|
start_form |
<FORM> |
end_form |
</FORM> |
textfield |
<INPUT TYPE="TEXT” > |
password_field |
<INPUT TYPE="PASSWORD” > |
filefield |
<INPUT TYPE="FILE” > |
button |
<INPUT TYPE="BUTTON” > |
submit |
<INPUT TYPE="SUBMIT” > |
reset |
<INPUT TYPE="RESET” > |
checkbox, checkbox_group |
<INPUT TYPE="CHECKBOX” > |
radio_group |
<INPUT TYPE="RADIO” > |
popup_menu |
<SELECT SIZE="1” > |
scrolling_list |
<SELECT SIZE="n” > where n > 1 |
textarea |
<TEXTAREA> |
hidden |
<INPUT TYPE="HIDDEN” > |
The start_form
and end_form
elements generate the opening and closing form tags.
start_form
takes arguments for each of its
attributes:
print $q->start_form( method => "get", action => "/cgi/myscript.cgi" );
Note that unlike a typical form tag, CGI.pm sets the request method
to POST instead of GET by default (the reverse of the default for
HTML forms). If you want to allow file uploads, use the
start_multipart_form
method instead of
start_form
, which sets enctype to
“multipart/form-data”.
All of the remaining methods create form elements. They all take the
-name
and -default
arguments.
The -default
value for an element is replaced by
the corresponding value from param
if that value
exists. You can disable this and force the default to override a
user’s parameters by passing the -override
argument with a true value.
The -default
option specifies the default
value of the element for elements with single
values:
print $q->textfield( -name => "username", -default => "Anonymous" );
This yields:
<INPUT TYPE="text" NAME="username" VALUE="Anonymous">
By supplying an array with the -values
argument,
the checkbox_group
and
radio_group
methods generate multiple checkboxes
that share the same name. Likewise, passing an array reference with
the -values
argument to the
scrolling_list
and
popup_menu
functions generates both the
<SELECT> and <OPTION> elements. For these elements,
-default
indicates the values that are checked or
selected; you can pass -default
a reference to an
array for checkbox_group
and
scrolling_list
for multiple defaults.
Each method accepts a -labels
argument that takes
a reference to a hash; this hash associates the value of each element
to the label the browser displays to the user.
Here is how you can generate a group of radio buttons:
print $q->radio_group( -name => "curtain", -values => [ "A", "B", "C" ], -default => "B", -labels => { A => "Curtain A", B => "Curtain B", C => "Curtain C" } );
This yields:
<INPUT TYPE="radio" NAME="look_behind" VALUE="A">Curtain A <INPUT TYPE="radio" NAME="look_behind" VALUE="B" CHECKED>Curtain B <INPUT TYPE="radio" NAME="look_behind" VALUE="C">Curtain C
For specifying any other attributes for form elements, like SIZE=4,
pass them as additional
arguments (e.g., size => 4
).
There are many different ways that people output HTML from their CGI scripts. We have just looked at how you do this from CGI.pm, and in the next chapter we will look at how we can use HTML templates to keep the HTML separate from the code. However, let’s look here at a couple of other techniques developers use to output HTML from their scripts.
One thing to keep in mind as we look at these techniques is how difficult the HTML is to maintain. Over the lifetime of a CGI application, it is often the HTML that changes the most. Thus much of the maintenance of the application will involve making changes to the design or wording found in the HTML, so the HTML should be easy to edit.
The simplest solution for including HTML in the source code is the
hardest to maintain. Many web developers start out writing CGI
scripts that contain numerous
print
statements to return documents, even for
large sections of static content—content that remains the same
each time the CGI script is called.
Here is an example:
#!/usr/bin/perl -wT use strict; my $timestamp = localtime; print "Content-type: text/html "; print "<html> "; print "<head> "; print "<title>The Time</title> "; print "</head> "; print "<body bgcolor="#ffffff"> "; print "<h2>Current Time</h2> "; print "<hr> "; print "<p>The current time according to this system is: "; print "<b>$timestamp</b> "; print "</p> "; print "</body> "; print "</html> ";
This is a pretty basic example, but you could imagine just how
complicated this can get on a large web page with numerous graphics,
nested tables, style declarations, etc. Not only is this difficult to
read because of the extra noise that each print
statement adds, but each
double quote in the HTML must be escaped
with a backslash. If you forget to do this even once, you will likely
generate a syntax error. Making HTML edits to something that looks
like this is much more work than it should be. You should definitely
avoid this approach in your scripts.
As we have seen in earlier examples, Perl supports a feature called
here documents
that allows you to express a large block of content separately within
your code. To create a here document, simply use
<<
followed by the
token that will be used to indicate
the end of the here document. You can include the token in single or
double quotes, and the
content will be evaluated as if it were a string within those quotes.
In other words, if you use single quotes, variables will not be
interpreted. If you omit the quotes, it acts as though you had used
double quotes.
Here is the previous example using a here document instead:
#!/usr/bin/perl -wT use strict; my $timestamp = localtime; print <<END_OF_MESSAGE; Content-type: text/html <html> <head> <title>The Time</title> </head> <body bgcolor="#ffffff"> <h2>Current Time</h2> <hr> <p>The current time according to this system is: <b>$timestamp</b></p> </body> </html> END_OF_MESSAGE
This is much cleaner than using lots of print
statements, and it allows us to indent the HTML content. The result
is that this is much easier to read and to update. You could have
accomplished something similar by using one
print
statement and putting all the content
inside one pair of double quotes, but then you would have had to
precede each double quote in the HTML with a backslash, and for
complicated HTML documents this could get tedious.
Another solution is to use
Perl’s qq//
operator, but with a different delimiter, such as
~
. You must find a
delimiter that will not appear in the
HTML, and remember that if your content includes JavaScript, it can
include many characters that HTML might otherwise not.
here documents are generally a safer solution.
One drawback to using here documents is that they do not easily indent, so they may look odd inside blocks of otherwise cleanly indented code. Tom Christiansen and Nathan Torkington address this issue in the Perl Cookbook (O’Reilly & Associates, Inc.). The following solutions are adapted from their discussion.
If you do not care about extra leading whitespace in your HTML output, you can simply indent everything. You can also indent the ending token if you use quotes and include the indent in the name (although this is more readable, it may be less maintainable because if the indentation changes, then you must adjust the name of the token to match):
#!/usr/bin/perl -wT use strict; my $timestamp = localtime; display_document( $timestamp ); sub display_document { my $timestamp = shift; print <<" END_OF_MESSAGE"; Content-type: text/html <html> <head> <title>The Time</title> </head> <body bgcolor="#ffffff"> <h2>Current Time</h2> <hr> <p>The current time according to this system is: <b>$timestamp</b></p> </body> </html> END_OF_MESSAGE }
One problem with indenting HTML here documents is that the extra indentation is sent to the client. You can solve this problem by creating a function that “unindents” your text. If you wish to remove all indentation, this is simple; if you want to maintain your HTML’s indentation, this is more complex. The challenge is determining the amount of indentation to remove: what portion belongs to the content and what part is incidental to your script? You could assume the first line contains the smallest indent, but this would not work if you were only printing the end of an HTML document, for example, when the last line would probably contain the smallest indent.
In the following code the
unindent
subroutine looks at all of the
lines being printed, finds the smallest indent, and removes that
amount from all of the lines:
sub unindent; sub display_document { my $timestamp = shift; print unindent <<" END_OF_MESSAGE"; Content-type: text/html <html> <head> <title>The Time</title> </head> <body bgcolor="#ffffff"> <h2>Current Time</h2> <hr> <p>The current time according to this system is: <b>$timestamp</b></p> </body> </html> END_OF_MESSAGE } sub unindent { local $_ = shift; my( $indent ) = sort /^([ ]*)S/gm; s/^$indent//gm; return $_; }
Predeclaring the unindent
function, as we do on
the first line, allows us to omit parentheses when we use it. This
solution, of course, increases the amount of work the server must do
for each request, so it would not be appropriate on a heavily used
server. Also keep in mind that each additional space increases the
number of bytes you must transfer and the user must download, so you
may actually want to strip all leading whitespace instead. After all,
users probably care more about the page downloading faster than how
it looks if they view the source code.
Overall, here documents are not a bad solution for large chunks of code, but they do not offer CGI.pm’s advantages, especially the ability to have your HTML code verified syntactically. It’s much harder to forget to close an HTML tag with CGI.pm than it is with a here document. Also, many times you must build HTML programmatically. For example, you may read records from a database and add a row to a table for each record. In these cases, when you are working with small chunks of HTML, CGI.pm is much easier to work with than here documents.
Using CGI.pm’s methods for outputting HTML generates strong reactions in developers. Some love it; others don’t. Don’t worry if it doesn’t match your needs, we will look at a whole class of alternatives in the next chapter.
While we are on the subject of handling output, we should also look at handling errors. One of the things that distinguishes an experienced developer from a novice is adequate error handling. Novices expect things to always work as planned; experienced developers have learned otherwise.
The most common method that Perl developers use for handling errors
is Perl’s built-in
die
function. Here is an example:
open FILE, $filename or die "Cannot open $filename: $!";
If Perl is unable to open the file specified by
$filename
, die
will print an
error message to STDERR and terminate the script. The
open
function, like most Perl commands that
interact with the system, sets $!
to the reason
for the error if it fails.
Unfortunately, die
is not always the best
solution for handling errors in your CGI scripts. As you will recall
from Chapter 3, output to STDERR is typically sent
to the web server’s error log, triggering the web server to
return a 500 Internal Server
Error
. This is certainly not a very
user-friendly response.
You should determine a policy for handling errors on your site. You may decide that 500 Internal Server Error pages are acceptable for very uncommon system errors like the inability to read or write to files. However, you may decide that you wish to display a formatted HTML page instead with information for users such as alternative actions they can take or who to notify about the problem.
It is possible to trap
die
so that it does not
generate a 500 Internal Server Error automatically. This is
especially useful because many common third-party modules use
die
(and variants such as
croak
) as their manner for responding to errors.
If you know that a particular subroutine may call
die
, you can catch this with an
eval
block in Perl:
eval { dangerous_routine( ); 1; } or do { error( $q, $@ || "Unknown error" ); };
If dangerous_routine
does call
die
, then eval
will catch
it, set the special variable $@
to the value of
the die
message, pass control to the end of the
block, and return undef
. This allows us to call
another subroutine to display our error more gracefully. Note that an
eval
block will not trap
exit
.
This works, but it certainly makes your code a lot more complex, and
if your CGI script interacts with a lot of subroutines that might
die
, then you must either place your entire
script within an eval
block or include lots of
these blocks throughout your script.
Fortunately, there is a better way. You may already know that it is
possible to create a global signal handler to trap Perl’s
die
and warn
functions.
This involves some rather advanced Perl; you can find specific
information in Programming Perl. Fortunately,
we don’t have to worry about the specifics, because there is a
module that not only does this, but is written specifically for CGI
scripts: CGI::Carp.
CGI::Carp is not part
of the CGI.pm module, but it is also by Lincoln Stein, and it is
distributed with CGI.pm (and thus included with the most recent
versions of Perl). It does two things: it creates more informative
entries in your error log, and it allows you to create a custom error
page for fatal calls like die
. Simply by using
the module, it adds a timestamp and the name of the running CGI
script to errors written to the error log by
die
, warn
,
carp
, croak
, and
confess
. The last three functions are provided
by the Carp module (included with Perl) and are often used by module
authors.
This still does not stop your web server from displaying
500
Internal Server Error responses for these calls, however. CGI::Carp
is most useful when you ask it to trap fatal calls. You can have it
display fatal error messages in the
browser instead. This is
especially helpful during development and debugging. To do this,
simply pass the fatalsToBrowser
parameter to it
when you use the module:
use CGI::Carp qw( fatalsToBrowser );
In a production environment, you may not want users to view your full error information if they encounter an error. Fortunately, you can have CGI::Carp trap errors and display your own custom error message. To do this, you pass CGI::Carp::set_message a reference to a subroutine that takes a single argument and displays the content of a response.
use CGI::Carp qw( fatalsToBrowser ); BEGIN { sub carp_error { my $error_message = shift; my $q = new CGI; $q->start_html( "Error" ), $q->h1( "Error" ), $q->p( "Sorry, the following error has occurred: " ); $q->p( $q->i( $error_message ) ), $q->end_html; } CGI::Carp::set_message( &carp_error ); }
We will see how to incorporate this into a more general solution later in Example 5.3.
Most of our examples up to now and throughout the book include subroutines or blocks of code for displaying errors. Here is an example:
sub error { my( $q, $error_message ) = shift; print $q->header( "text/html" ), $q->start_html( "Error" ), $q->h1( "Error" ), $q->p( "Sorry, the following error has occurred: " ); $q->p( $q->i( $error_message ) ), $q->end_html; exit; }
You can call this with a CGI object and a reason for the error. It will output an error page and then exit in order to stop executing your script. Note that we print the HTTP header here. One of the biggest challenges in creating a general solution for catching errors is knowing whether or not to print an HTTP header: if one has already been printed and you print another, it will appear at the top of your error page; if one has not been printed and you do not print one as part of the error message, you will trigger a 500 Internal Server Error instead.
Fortunately, CGI.pm has a feature that will track whether a header
has been printed for you already. If you enable this feature, it will
only output an HTTP header once per CGI object. Any future calls to
header
will silently do nothing. You can enable
this feature in one of three ways:
You can pass the
-unique_headers flag
when you load CGI.pm:
use CGI qw( -unique_headers );
You can set the
$CGI::HEADERS_ONCE
variable to a true value after you use
CGI.pm, but before you create an object:
use CGI; $CGI::HEADERS_ONCE = 1; my $q = new CGI;
Finally, if you know that you always want this feature, you can
enable it globally for all of your scripts by setting
$HEADERS_ONCE
to a true value within your copy of
CGI.pm. You can do this just like $POST_MAX
and
$DISABLE_UPLOADS
variables we discussed at the
beginning of the chapter. You will find
$HEADERS_ONCE
is in the same configurable section
of CGI.pm:
# Change this to 1 to suppress redundant HTTP headers $HEADERS_ONCE = 0;
Although adding subroutines to each of your CGI scripts is certainly an acceptable way to catch errors, it’s still not a very general solution. You will probably want to create your own error pages that are customized for your site. Once you start including complex HTML in your subroutines, it will quickly become too difficult to maintain them. If you build error subroutines that output error pages according to your site’s template, and then later someone decides they want to change the site’s look, you must go back and update all of your subroutines. Clearly, a much better option is to create a general error handler that all of your CGI scripts can access.
It is a good idea to create your own Perl module that’s specific to your site. If you host different sites, or have different applications within your site with different looks and feels, you may wish to create a module for each. Within this module, you can place subroutines that you find yourself using across many CGI scripts. These subroutines will vary depending on your site, but one should handle errors.
If you have not created your own Perl module before, don’t worry, it’s quite simple. Example 5.3 shows a very minimal module.
Example 5-3. CGIBook::Error.pm
#!/usr/bin/perl -wT package CGIBook::Error; # Export the error subroutine use Exporter; @ISA = "Exporter"; @EXPORT = qw( error ); $VERSION = "0.01"; use strict; use CGI; use CGI::Carp qw( fatalsToBrowser ); BEGIN { sub carp_error { my $error_message = shift; my $q = new CGI; my $discard_this = $q->header( "text/html" ); error( $q, $error_message ); } CGI::Carp::set_message( &carp_error ); } sub error { my( $q, $error_message ) = @_; print $q->header( "text/html" ), $q->start_html( "Error" ), $q->h1( "Error" ), $q->p( "Sorry, the following error has occurred: " ), $q->p( $q->i( $error_message ) ), $q->end_html; exit; } 1;
The only difference between a
Perl module and a standard Perl script is
that you should save your file with a
.pm
extension, declare
the name of module’s package with the
package
function (this should match the file’s name except without the
.pm extension and substituting
::
for /
),[7] and make sure
that it returns a true value when evaluated (the reason for the
1;
at the bottom).
It is standard practice to store the
version of the module in
$VERSION
. For the sake of convenience, we also use
the
Exporter module to export
the error
subroutine. This allows us to refer to
it in our scripts
as error
instead of CGIBook::Exporter::error
. Refer to
the Exporter manpage or a primary Perl text, such as
Programming Perl, for details on using
Exporter.
You have a couple options for saving this file. The simplest solution is to save it within the site_perl directory of your Perl libraries, such as /usr/lib/perl5/site_perl/5.005/CGIBook/Error.pm. The site_perl directory includes modules that are site-specific (i.e., not included in Perl’s standard distribution). The paths of your Perl libraries may differ; you can locate them on your system with the following command:
$ perl -e 'print map "$_ ", @INC'
You probably want to create a subdirectory that is unique to your organization, as we did with CGIBook, to hold all the Perl modules you create.
You can use the module as follows:
#!/usr/bin/perl -wT use strict; use CGI; use CGIBook::Error; my $q = new CGI; unless ( check_something_important( ) ) { error( $q, "Something bad happened." ); }
If you do not have the permission to install the module in your Perl
library directory, and if you cannot get your system administrator to
do it, then you can place the module in another location, for
example,
/usr/local/apache/perl-lib/CGIBook/Error.pm.
Then you must remember to include this directory in the list that
Perl searches for modules. The simplest way to do this is
with the
lib
pragma:
#!/usr/bin/perl -wT use strict; use lib "/usr/local/apache/perl-lib"; use CGI; use CGIBook::Error; . . .
[6] These are not necessarily bugs in CGI.pm; CGI.pm strives to maintain compatibility with new servers and browsers that sometimes include buggy, or at least nonstandard, code.
[7] When
determining the package name, the file’s name should be
relative to a library path in @INC
. In our
example, we store the file at
/usr/lib/perl5/site_perl/5.005/CGIBook/Error.pm.
/usr/lib/perl5/site_perl/5.005 is a library
directory. Thus, the path to the module relative to the library
directory is CGIBook/Error.pm so the package is
CGIBook::Error.