Taint checking is just the sort of security blanket you need if you want to catch bogus data you ought to have caught yourself, but didn't think to catch before passing off to the system. It's a bit like the optional warnings Perl can give you--they may not indicate a real problem, but on average the pain of dealing with the false positives is less than the pain of not dealing with the false negatives. With tainting, the latter pain is even more insistent, because using bogus data doesn't just give the wrong answers; it can blow your system right out of the water, along with your last two years of work. (And maybe your next two, if you didn't make good backups.) Taint mode is useful when you trust yourself to write honest code but don't necessarily trust whoever is feeding you data not to try to trick you into doing something regrettable.
Data is one thing. It's quite another matter when you don't even trust the code you're running. What if you fetch an applet off the Net and it contains a virus, or a time bomb, or a Trojan horse? Taint checking is useless here because the data you're feeding the program may be fine--it's the code that's untrustworthy. You're placing yourself in the position of someone who receives a mysterious device from a stranger, with a note that says, "Just hold this to your head and pull the trigger." Maybe you think it will dry your hair, but you might not think so for very long.
In this realm, prudence is synonymous with paranoia.
What you want is a system that lets you impose a quarantine on
suspicious code. The code can continue to exist, and even perform
certain functions, but you don't let it wander around doing just
anything it feels like. In Perl, you can impose a kind of quarantine
using the Safe
module.
The Safe
module lets you set up a
sandbox, a special compartment in which all
system operations are trapped, and namespace access is carefully
controlled. The low-level, technical details of this module are in a
state of flux, so here we'll take a more philosophical
approach.
At the most basic level, a Safe
object is like a safe, except the idea is to keep the bad people
in, not out. In the Unix world, there is a syscall known as
chroot (2) that can permanently
consign a process to running only in a subdirectory of the
directory structure--in its own private little hell, if you will.
Once the process is put there, there is no way for it to reach
files outside, because there's no way for it to
name files outside.[13] A Safe
object is a little like
that, except that instead of being restricted to a subset of the
filesystem's directory structure, it's restricted to a subset of
Perl's package structure, which is hierarchical just as the
filesystem is.
Another way to look at it is that the
Safe
object is like one of those observation
rooms with one-way mirrors that the police put suspicious
characters into. People on the outside can look into the room, but
those inside can't see out.
When you create a Safe
object, you may
give it a package name if you want. If you don't, a new one will
be chosen for you:
use Safe; my $sandbox = Safe->new("Dungeon"); $Dungeon::foo = 1; # Direct access is discouraged, though.
If you fully qualify variables and functions using
the package name supplied to the new
method,
you can access them in that package from the outside, at least in
the current implementation. This may change however, since the
current plan is to clone the symbol table into a new interpreter.
Slightly more upward compatible might be to set things up first
before creating the Safe
, as shown below. This
is likely to continue working and is a handy way to set up a
Safe
that has to start off with a lot of
"state". (Admittedly, $Dungeon::foo
isn't a lot
of state.)
use Safe; $Dungeon::foo = 1; # Still direct access, still discouraged. my $sandbox = Safe->new("Dungeon");
But Safe
also provides a way to
access the compartment's globals even if you don't know the name
of the compartment's package. So for maximal upward compatibility
(though less than maximal speed), we suggest you use the
reval
method:
use Safe; my $sandbox = Safe->new(); $sandbox->reval('$foo = 1'),
(In fact, that's the same method you'll use to run
suspicious code.) When you pass code into the compartment to
compile and run, that code thinks that it's really living in the
main
package. What the outside world calls
$Dungeon::foo
, the code inside thinks of as
$main::foo
, or $::foo
, or
just $foo
if you aren't running under
use strict
. It won't work to say
$Dungeon::foo
inside the compartment, because
that would really access
$Dungeon::Dungeon::foo
. By giving the
Safe
object its own notion of
main
, variables and subroutines in the rest of
your program are protected.
To compile and run code inside the compartment, use
the reval
("restricted
eval
") method, passing the code string as its
argument. Just as with any other eval
STRING
construct, compilation errors
and run-time exceptions in reval
don't kill
your program. They just abort the reval
and
leave the exception in $@
, so make sure to
check it after every reval
call.
Using the initializations given earlier, this code will
print out that "foo is now 2
":
$sandbox->reval('$foo++; print "foo is now $main::foo "'), if ($@) { die "Couldn't compile code in box: $@"; }
If you just want to compile code and not run it, wrap your string in a subroutine declaration:
$sandbox->reval(q{ our $foo; sub say_foo { print "foo is now $main::foo "; } }, 1); die if $@; # check compilation
This time we passed reval
a second
argument which, since it's true, tells reval
to
compile the code under the strict
pragma. From
within the code string, you can't disable strictness, either,
because importing and unimporting are just two of the things you
can't normally do in a Safe
compartment. There
are a lot of things you can't do normally in a
Safe
compartment--see the next section.
Once you've created the
say_foo
function in the compartment, these are
pretty much the same:
$sandbox->reval('say_foo()'), # Best way. die if $@; $sandbox->varglob('say_foo')->(); # Call through anonymous glob. Dungeon::say_foo(); # Direct call, strongly discouraged.
The other important thing about a
Safe
object is that Perl limits the available
operations within the sandbox. (You might well let your kid take a
bucket and shovel into the sandbox, but you'd probably draw the
line at a bazooka.) It's not enough to protect just the rest of
your program; you need to protect the rest of your computer,
too.
When you compile Perl code in a Safe
object, either with reval
or
rdo
(the restricted version of the
do
FILE
operator),
the compiler consults a special, per-compartment access-control
list to decide whether each individual operation is deemed safe to
compile. This way you don't have to stress out (much) worrying
about unforeseen shell escapes, opening files when you didn't mean
to, strange code assertions in regular expressions, or most of the
external access problems folks normally fret about. (Or ought
to.)
The interface for specifying which operators should be
permitted or restricted is currently under redesign, so we only
show how to use the default set of them here. For details, consult
the online documentation for the Safe
module.
The Safe
module doesn't offer complete
protection against denial-of-service attacks,
especially when used in its more permissive modes.
Denial-of-service attacks consume all available system resources
of some type, denying other processes access to essential system
facilities. Examples of such attacks include filling up the kernel
process table, dominating the CPU by running forever in a tight
loop, exhausting available memory, and filling up a filesystem.
These problems are very difficult to solve, especially portably.
See the end of Section
23.3.2 for more discussion of denial-of-service
attacks.
Imagine you've got a CGI program that manages a
form into which the user may enter an arbitrary Perl expression
and get back the evaluated result.[14] Like all external input, the string comes in
tainted, so Perl won't let you eval
it
yet--you'll first have to untaint it with a pattern match. The
problem is that you'll never be able to devise a pattern that can
detect all possible threats. And you don't dare just untaint
whatever you get and send it through the built-in
eval
. (If you do that, we
will be tempted to break into your system and delete the
script.)
That's where reval
comes in. Here's a CGI
script that processes a form with a single form field, evaluates
(in scalar context) whatever string it finds there, and prints out
the formatted result:
#!/usr/bin/perl -lTw use strict; use CGI::Carp 'fatalsToBrowser'; use CGI qw/:standard escapeHTML/; use Safe; print header(-type => "text/html;charset=UTF-8"), start_html("Perl Expression Results"); my $expr = param("EXPR") =~ /^([^;]+)/ ? $1 # return the now-taintless portion : croak("no valid EXPR field in form"); my $answer = Safe->new->reval($expr); die if $@; print p("Result of", tt(escapeHTML($expr)), "is", tt(escapeHTML($answer)));
Imagine some evil user feeding you "print `cat
/etc/passwd`
" (or worse) as the input string. Thanks to
the restricted environment that disallows backticks, Perl catches
the problem during compilation and returns immediately. The string
in $@
is "quoted execution (``, qx)
trapped by operation mask
", plus the customary trailing
information identifying where the problem happened.
Because we didn't say otherwise, the compartments we've been
creating all used the default set of allowable operations. How you
go about declaring specific operations permitted or forbidden
isn't important here. What is important is that this is completely
under the control of your program. And since you can create
multiple Safe
objects in your program, you can
confer various degrees of trust upon various chunks of code,
depending on where you got them from.
If you'd like to play around with Safe
,
here's a little interactive Perl calculator. It's a calculator in
that you can feed it numeric expressions and immediately see their
results. But it's not limited to numbers alone. It's more like the
looping example under eval
in Chapter 29, where you can take
whatever they give you, evaluate it, and give them back the
result. The difference is that the Safe
version
doesn't execute just anything you feel like. You can run this
calculator interactively at your terminal, typing in little bits
of Perl code and checking the answers, to get a feel for what
sorts of protection Safe
provides.
#!/usr/bin/perl -w # safecalc - demo program for playing with Safe use strict; use Safe; my $sandbox = Safe->new(); while (1) { print "Input: "; my $expr = <STDIN>; exit unless defined $expr; chomp($expr); print "$expr produces "; local $SIG{__WARN__} = sub { die @_ }; my $result = $sandbox->reval($expr, 1); if ($@ =~ s/at (eval d+).*//) { printf "[%s]: %s", $@ =~ /trapped by operation mask/ ? "Security Violation" : "Exception", $@; } else { print "[Normal Result] $result "; } }
Warning: the Safe
module is currently
being redesigned to run each compartment within a completely
independent Perl interpreter inside the same process. (This is the
strategy that Apache's mod_perl
employs when
running precompiled Perl scripts.) Details are still hazy at this
time, but our crystal ball suggests that blindly poking at things
inside the compartment using a named package won't get you very
far after the impending rewrite. If you're running a version of
Perl later than 5.6, check the release notes in
perldelta (1) to see what's changed,
or consult the documentation for the Safe
module itself. (Of course, you always do that anyway,
right?)
Safe
compartments are available for
when the really scary stuff is going down, but that doesn't mean you
should let down your guard totally when you're doing the everyday
stuff around home. You need to cultivate an awareness of your
surroundings and look at things from the point of view of someone
wanting to break in. You need to take proactive steps like keeping
things well lit and trimming the bushes that can hide various
lurking problems.
Perl tries to help you in this area, too. Perl's conventional parsing and execution scheme avoids the pitfalls that shell programming languages often fall prey to. There are many extremely powerful features in the language, but by design, they're syntactically and semantically bounded in ways that keep things under the control of the programmer. With few exceptions, Perl evaluates each token only once. Something that looks like it's being used as a simple data variable won't suddenly go rooting around in your filesystem.
Unfortunately, that sort of thing can happen if you
call out to the shell to run other programs for you, because then
you're running under the shell's rules instead of Perl's. The shell
is easy to avoid, though--just use the list argument forms of the
system
, exec
, or piped
open
functions. Although backticks don't have a
list-argument form that is proof against the shell, you can always
emulate them as described in Section 23.1.3. (While
there's no syntactic way to make backticks take an argument list, a
multi-argument form of the underlying readpipe
operator is in development; but as of this writing, it isn't quite
ready for prime time.)
When you use a variable in an expression (including when you interpolate it into a double-quoted string), there's No Chance that the variable will contain Perl code that does something you aren't intending.[15] Unlike the shell, Perl never needs defensive quotes around variables, no matter what might be in them.
$new = $old; # No quoting needed. print "$new items "; # $new can't hurt you. $phrase = "$new items "; # Nor here, neither. print $phrase; # Still perfectly ok.
Perl takes a "what you see is what you get" approach. If you don't see an extra level of interpolation, then it doesn't happen. It is possible to interpolate arbitrary Perl expressions into strings, but only if you specifically ask Perl to do that. (Even so, the contents are still subject to taint checking if you're in taint mode.)
$phrase = "You lost @{[ 1 + int rand(6) ]} hit points ";
Interpolation is not recursive, however. You can't just hide an arbitrary expression in a string:
$count = '1 + int rand(6)'; # Some random code. $saying = "$count hit points"; # Merely a literal. $saying = "@{[$count]} hit points"; # Also a literal.
Both assignments to $saying
would produce
"1 + int rand(6) hit points
", without evaluating
the interpolated contents of $count
as code. To
get Perl to do that, you have to call eval
STRING
explicitly:
$code = '1 + int rand(6)'; $die_roll = eval $code; die if $@;
If $code
were tainted, that
eval
STRING
would
raise its own exception. Of course, you almost never want to
evaluate random user code--but if you did, you should look into
using the Safe
module. You may have heard of
it.
There is one place where Perl can sometimes treat
data as code; namely, when the pattern in a qr//
,
m//
, or s///
operator contains
either of the new regular expression assertions,
(?{
CODE
})
or (??{
CODE
})
. These pose no
security issues when used as literals in pattern matches:
$cnt = $n = 0; while ($data =~ /( d+ (?{ $n++ }) | w+ )/gx) { $cnt++; } print "Got $cnt words, $n of which were digits. ";
But existing code that interpolates variables into matches was
written with the assumption that the data is data, not code. The new
constructs might have introduced a security hole into previously
secure programs. Therefore, Perl refuses to evaluate a pattern if an
interpolated string contains a code assertion, and raises an
exception instead. If you really need that functionality, you can
always enable it with the lexically scoped use re
'eval
' pragma. (You still can't use tainted data for an
interpolated code assertion, though.)
A completely different sort of security concern that can come up with regular expressions is denial-of-service problems. These can make your program quit too early, or run too long, or exhaust all available memory--and sometimes even dump core, depending on the phase of the moon.
When you process user-supplied patterns, you don't have to
worry about interpreting random Perl code. However, the regular
expression engine has its own little compiler and interpreter, and
the user-supplied pattern is capable of giving the regular
expression compiler heartburn. If an interpolated pattern is not a
valid pattern, a run-time exception is raised, which is fatal unless
trapped. If you do try to trap it, make sure to use only
eval
BLOCK
, not
eval
STRING
, because
the extra evaluation level of the latter would in fact allow the
execution of random Perl code. Instead, do something like
this:
if (not eval { "" =~ /$match/; 1 }) { # (Now do whatever you want for a bad pattern.) } else { # We know pattern is at least safe to compile. if ($data =~ /$match/) { … } }
A more troubling denial-of-service problem is that
given the right data and the right search pattern, your program can
appear to hang forever. That's because some pattern matches require
exponential time to compute, and this can easily exceed the MTBF
rating on our solar system. If you're especially lucky, these
computationally intensive patterns will also require exponential
storage. If so, your program will exhaust all available virtual
memory, bog down the rest of the system, annoy your users, and
either die
with an orderly "Out of
memory!
" error or else leave behind a really big core dump
file, though perhaps not as large as the solar system.
Like most denial-of-service attacks, this one is not
easy to solve. If your platform supports the
alarm
function, you could time out the pattern
match. Unfortunately, Perl cannot (currently) guarantee that the
mere act of handling a signal won't ever trigger a core dump. (This
is scheduled to be fixed in a future release.) You can always try
it, though, and even if it the signal isn't handled gracefully, at
least the program won't run forever.
If your system supports per-process resource limits,
you could set these in your shell before calling the Perl program,
or use the BSD::Resource
module from CPAN to do
so directly from Perl. The Apache web server allows you to set time,
memory, and file size limits on CGI scripts that it launches.
Finally, we hope we've left you with some unresolved feelings of insecurity. Remember, just because you're paranoid doesn't mean they're not out to get you. So you might as well enjoy it.
[13] Some sites do this for executing all CGI scripts, using loopback, read-only mounts. It's something of a pain to set up, but if someone ever escapes, they'll find there's nowhere to go.
[14] Please don't laugh. We really have seen web pages that
do this. Without a Safe
!
[15] Although if you're generating a web page, it's possible to emit HTML tags, including JavaScript code, that might do something that the remote browser isn't expecting.