Whether you're dealing with a user sitting at the keyboard typing commands or someone sending information across the network, you need to be careful about the data coming into your programs, since the other person may, either maliciously or accidentally, send you data that will do more harm than good. Perl provides a special security-checking mechanism called taint mode, whose purpose is to isolate tainted data so that you won't use it to do something you didn't intend to do. For instance, if you mistakenly trust a tainted filename, you might end up appending an entry to your password file when you thought you were appending to a log file. The mechanism of tainting is covered in Section 23.1.
In multitasking environments, offstage actions by unseen actors can affect the security of your own program. If you presume exclusive ownership of external objects (especially files) as though yours were the only process on the system, you expose yourself to errors substantially subtler than those that come from directly handling data or code of dubious provenance. Perl helps you out a little here by detecting some situations that are beyond your control, but for those that you can control, the key is understanding which approaches are proof against unseen meddlers. Section 23.2 discusses these matters.
If the data you get from a stranger happens to be a bit
of source code to execute, you need to be even more careful than you
would with their data. Perl provides checks to intercept stealthy code
masquerading as data so you don't execute it unintentionally. If you do
want to execute foreign code, though, the Safe
module
lets you quarantine suspect code where it can't do any harm and might
possibly do some good. These are the topics of Section 23.3.
Perl makes it easy to program securely even when your program is being used by someone less trustworthy than the program itself. That is, some programs need to grant limited privileges to their users, without giving away other privileges. Setuid and setgid programs fall into this category on Unix, as do programs running in various privileged modes on other operating systems that support such notions. Even on systems that don't, the same principle applies to network servers and to any programs run by those servers (such as CGI scripts, mailing list processors, and daemons listed in /etc/inetd.conf). All such programs require a higher level of scrutiny than normal.
Even programs run from the command line are sometimes
good candidates for taint mode, especially if they're meant to be run
by a privileged user. Programs that act upon untrusted data, like
those that generate statistics from log files or use
LWP::*
or Net::*
to fetch remote
data, should probably run with tainting explicitly turned on; programs
that are not prudent risk being turned into "Trojan horses". Since
programs don't get any kind of thrill out of risk taking, there's no
particular reason for them not to be careful.
Compared with Unix command-line shells, which are really just frameworks for calling other programs, Perl is easy to program securely because it's straightforward and self-contained. Unlike most shell programming languages, which are based on multiple, mysterious substitution passes on each line of the script, Perl uses a more conventional evaluation scheme with fewer hidden snags. Additionally, because the language has more built-in functionality, it can rely less upon external (and possibly untrustworthy) programs to accomplish its purposes.
Under Unix, Perl's home town, the preferred way to compromise system security was to cajole a privileged program into doing something it wasn't supposed to do. To stave off such attacks, Perl developed a unique approach for coping with hostile environments. Perl automatically enables taint mode whenever it detects its program running with differing real and effective user or group IDs.[1] Even if the file containing your Perl script doesn't have the setuid or setgid bits turned on, that script can still find itself executing in taint mode. This happens if your script was invoked by another program that was itself running under differing IDs. Perl programs that weren't designed to operate under taint mode tend to expire prematurely when caught violating safe tainting policy. This is just as well, since these are the sorts of shenanigans that were historically perpetrated on shell scripts to compromise system security. Perl isn't that gullible.
You can also enable taint mode explicitly with the
-T
command-line switch. You should do this for
daemons, servers, and any programs that run on behalf of someone else,
such as CGI scripts. Programs that can be run remotely and anonymously
by anyone on the Net are executing in the most hostile of
environments. You should not be afraid to say "No!" occasionally.
Contrary to popular belief, you can exercise a great deal of prudence
without dehydrating into a wrinkled prude.
On the more security-conscious sites, running all CGI scripts
under the -T
flag isn't just a good a idea:
it's the law. We're not claiming that running in taint mode is
sufficient to make your script secure. It's not, and it would take a
whole book just to mention everything that would. But if you aren't
executing your CGI scripts under taint mode, you've needlessly
abandoned the strongest protection Perl can give you.
While in taint mode, Perl takes special precautions called taint checks to prevent traps both obvious and subtle. Some of these checks are reasonably simple, such as verifying that dangerous environment variables aren't set and that directories in your path aren't writable by others; careful programmers have always used checks like these. Other checks, however, are best supported by the language itself, and it is these checks especially that contribute to making a privileged Perl program more secure than the corresponding C program, or a Perl CGI script more secure than one written in any language without taint checks. (Which, as far as we know, is any language other than Perl.)
The principle is simple: you may not use data derived from outside your program to affect something else outside your program--at least, not by accident. Anything that comes from outside your program is marked as tainted, including all command-line arguments, environment variables, and file input. Tainted data may not be used directly or indirectly in any operation that invokes a subshell, nor in any operation that modifies files, directories, or processes. Any variable set within an expression that has previously referenced a tainted value becomes tainted itself, even if it is logically impossible for the tainted value to influence the variable. Because taintedness is associated with each scalar, some individual values in an array or hash might be tainted and others might not. (Only the values in a hash can be tainted, though, not the keys.)
The following code illustrates how tainting would work if you executed all these statements in order. Statements marked "Insecure" will trigger an exception, whereas those that are "OK" will not.
$arg = shift(@ARGV); # $arg is now tainted (due to @ARGV). $hid = "$arg, 'bar'"; # $hid also tainted (due to $arg). $line = <>; # Tainted (reading from external file). $path = $ENV{PATH}; # Tainted due to %ENV, but see below. $mine = 'abc'; # Not tainted. system "echo $mine"; # Insecure until PATH set. system "echo $arg"; # Insecure: uses sh with tainted $arg. system "echo", $arg; # OK once PATH set (doesn't use sh). system "echo $hid"; # Insecure two ways: taint, PATH. $oldpath = $ENV{PATH}; # $oldpath is tainted (due to $ENV). $ENV{PATH} = '/bin:/usr/bin'; # (Makes it OK to execute other programs.) $newpath = $ENV{PATH}; # $newpath is NOT tainted. delete @ENV{qw{IFS CDPATH ENV BASH_ENV}}; # Makes %ENV safer. system "echo $mine"; # OK, is secure once path is reset. system "echo $hid"; # Insecure via tainted $hid. open(OOF, "< $arg"); # OK (read-only opens not checked). open(OOF, "> $arg"); # Insecure (trying to write to tainted arg). open(OOF, "echo $arg|") # Insecure due to tainted $arg, but… or die "can't pipe from echo: $!"; open(OOF,"-|") # Considered OK: see below for taint or exec "echo", $arg # exemption on exec'ing a list. or die "can't exec echo: $!"; open(OOF,"-|", "echo", $arg # Same as previous, likewise OKish. or die "can't pipe from echo: $!"; $shout = `echo $arg`; # Insecure via tainted $arg. $shout = `echo abc`; # $shout is tainted due to backticks. $shout2 = `echo $shout`; # Insecure via tainted $shout. unlink $mine, $arg; # Insecure via tainted $arg. umask $arg; # Insecure via tainted $arg. exec "echo $arg"; # Insecure via tainted $arg passed to shell. exec "echo", $arg; # Considered OK! (But see below.) exec "sh", '-c', $arg; # Considered OK, but isn't really!
If you try to do something insecure, you get an exception (which
unless trapped, becomes a fatal error) such as "Insecure
dependency
" or "Insecure $ENV{PATH}
". See
Section 23.1.2
later.
If you pass a LIST
to a
system
, exec
, or pipe
open
, the arguments are not inspected for
taintedness, because with a LIST
of
arguments, Perl doesn't need to invoke the potentially dangerous shell
to run the command. You can still easily write an insecure
system
, exec
, or pipe
open
using the LIST
form, as demonstrated in the final example above. These forms are
exempt from checking because you are presumed to know what you're
doing when you use them.
Sometimes, though, you can't tell how many arguments you're passing. If you supply these functions with an array[2] that contains just one element, then it's just as though you passed one string in the first place, so the shell might be used. The solution is to pass an explicit path in the indirect-object slot:
system @args; # Won't call the shell unless @args == 1. system { $args[0] } @args; # Bypasses shell even with one-argument list.
To test whether a scalar variable contains tainted
data, you can use the following is_tainted
function. It makes use of the fact that eval
STRING
raises an exception if you try to
compile tainted data. It doesn't matter that the
$nada
variable used in the expression to compile
will always be empty; it will still be tainted if
$arg
is tainted. The outer
eval
BLOCK
isn't doing
any compilation. It's just there to catch the exception raised if
the inner eval
is given tainted data. Since the
$@
variable is guaranteed to be nonempty after
each eval
if an exception was raised and empty
otherwise, we return the result of testing whether its length was
zero:
sub is_tainted { my $arg = shift; my $nada = substr($arg, 0, 0); # zero-length local $@; # preserve caller's version eval { eval "# $nada" }; return length($@) != 0; }
But testing for taintedness only gets you so far. Usually you
know perfectly well which variables contain tainted data--you just
have to clear the data's taintedness. The only official way to
bypass the tainting mechanism is by referencing submatches returned
by an earlier regular expression match.[3] When you write a pattern that contains capturing
parentheses, you can access the captured substrings through match
variables like $1
, $2
, and
$+
, or by evaluating the pattern in list context.
Either way, the presumption is that you knew what you were doing
when you wrote the pattern and wrote it to weed out anything
dangerous. So you need to give it some real thought--never blindly
untaint, or else you defeat the entire mechanism.
It's better to verify that the variable contains only good
characters than to check whether it contains any bad characters.
That's because it's far too easy to miss bad characters that you
never thought of. For example, here's a test to make sure
$string
contains nothing but "word" characters
(alphabetics, numerics, and underscores), hyphens, at signs, and
dots:
if ($string =~ /^([-@w.]+)$/) { $string = $1; # $string now untainted. } else { die "Bad data in $string"; # Log this somewhere. }
This renders $string
fairly secure to use
later in an external command, since /w+/
doesn't
normally match shell metacharacters, nor are those other characters
going to mean anything special to the shell.[4] Had we used /(.+)/s
instead, it
would have been unsafe because that pattern lets everything through.
But Perl doesn't check for that. When untainting, be exceedingly
careful with your patterns. Laundering data by using regular
expressions is the only approved internal
mechanism for untainting dirty data. And sometimes it's the wrong
approach entirely. If you're in taint mode because you're running
set-id and not because you intentionally turned on
-T
, you can reduce your risk by forking a
child of lesser privilege; see Section 23.1.2.
The use re 'taint
' pragma disables the
implicit untainting of any pattern matches through the end of the
current lexical scope. You might use this pragma if you just want to
extract a few substrings from some potentially tainted data, but
since you aren't being mindful of security, you'd prefer to leave
the substrings tainted to guard against unfortunate accidents
later.
Imagine you're matching something like this, where
$fullpath
is tainted:
($dir, $file) = $fullpath =~ m!(.*/)(.*)!s;
By default, $dir
and
$file
would now be untainted. But you probably
didn't want to do that so cavalierly, because you never really
thought about the security issues. For example, you might not be
terribly happy if $file
contained the string
"; rm -rf * ;
", just to name one rather egregious
example. The following code leaves the two result variables tainted
if $fullpath
was tainted:
{ use re 'taint'; ($dir, $file) = $fullpath =~ m!(.*/)(.*)!s; }
A good strategy is to leave submatches tainted by default over the whole source file and only selectively permit untainting in nested scopes as needed:
use re 'taint'; # remainder of file now leaves $1 etc tainted { no re 'taint'; # this block now untaints re matches if ($num =~ /^(d+)$/) { $num = $1; } }
Input from a filehandle or a directory handle is automatically
tainted, except when it comes from the special filehandle,
DATA
. If you want to, you can mark other handles
as trusted sources via the IO::Handle
module's
untaint
function:
use IO::Handle; IO::Handle::untaint(*SOME_FH); # Either procedurally SOME_FH->untaint(); # or using the OO style.
Turning off tainting on an entire filehandle is a risky move. How do you really know it's safe? If you're going to do this, you should at least verify that nobody but the owner can write to the file.[5] If you're on a Unix filesystem (and one that prudently restricts chown (2) to the superuser), the following code works:
use File::stat; use Symbol 'qualify_to_ref'; sub handle_looks_safe(*) { my $fh = qualify_to_ref(shift, caller); my $info = stat($fh); return unless $info; # owner neither superuser nor "me", whose # real uid is in the $< variable if ($info->uid != 0 && $info->uid != $<) { return 0; } # check whether group or other can write file. # use 066 to detect for readability also if ($info->mode & 022) { return 0; } return 1; } use IO::Handle; SOME_FH->untaint() if handle_looks_safe(*SOME_FH);
We called stat
on the filehandle, not the
filename, to avoid a dangerous race condition. See Section 23.2.2 later in this
chapter.
Note that this routine is only a good start. A slightly more
paranoid version would check all parent directories as well, even
though you can't reliably stat
a directory
handle. But if any parent directory is world-writable, you know
you're in trouble whether or not there are race conditions.
Perl has its own notion of which operations are dangerous, but it's still possible to get into trouble with other operations that don't care whether they use tainted values. It's not always enough to be careful of input. Perl output functions don't test whether their arguments are tainted, but in some environments, this matters. If you aren't careful of what you output, you might just end up spitting out strings that have unexpected meanings to whoever is processing the output. If you're running on a terminal, special escape and control codes could cause the viewer's terminal to act strangely. If you're in a web environment and you blindly spit back out data that was given to you, you could unknowingly produce HTML tags that would drastically alter the page's appearance. Worse still, some markup tags can even execute code back on the browser.
Imagine the common case of a guest book where visitors enter
their own messages to be displayed when others come calling. A
malicious guest could supply unsightly HTML tags or put in
<SCRIPT>…</SCRIPT>
sequences that
execute code (like JavaScript) back in the browsers of subsequent
guests.
Just as you should carefully check for only good characters when inspecting tainted data that accesses resources on your own system, you should apply the same care in a web environment when presenting data supplied by a user. For example, to strip the data of any character not in the specified list of good characters, try something like this:
$new_guestbook_entry =~ tr[_a-zA-Z0-9 ,./!?()@+*-][]dc;
You certainly wouldn't use that to clean up a filename, since you probably don't want filenames with spaces or slashes, just for starters. But it's enough to keep your guest book free of sneaky HTML tags and entities. Each data-laundering case is a little bit different, so always spend time deciding what is and what is not permitted. The tainting mechanism is intended to catch stupid mistakes, not to remove the need for thought.
When you execute another program from within your Perl
script, no matter how, Perl checks to make sure your
PATH
environment variable is secure. Since it
came from your environment, your PATH
starts out
tainted, so if you try to run another program, Perl raises an
"Insecure $ENV{PATH}
" exception. When you set it
to a known, untainted value, Perl makes sure that each directory in
that path is nonwritable by anyone other than the directory's owner
and group; otherwise, it raises an "Insecure
directory
" exception.
You may be surprised to find that Perl cares about your
PATH
even when you specify the full pathname of
the command you want to execute. It's true that with an absolute
filename, the PATH
isn't used to find the
executable to run. But there's no reason to trust the program you're
running not to turn right around and execute some
other program and get into trouble because of
the insecure PATH
. So Perl forces you to set a
secure PATH
before you call any program, no
matter how you say to call it.
The PATH
isn't the only
environment variable that can bring grief. Because some shells use
the variables IFS
, CDPATH
,
ENV
, and BASH_ENV
, Perl makes
sure that those are all either empty or untainted before it will run
another command. Either set these variables to something known to be
safe, or else delete them from the environment altogether:
delete @ENV{qw(IFS CDPATH ENV BASH_ENV)}; # Make %ENV safer
Features convenient in a normal environment can
become security concerns in a hostile one. Even if you remember to
disallow filenames containing newlines, it's important to understand
that open
accesses more than just named files.
Given appropriate ornamentation on the filename argument, one- or
two-argument calls to open
can also run arbitrary
external commands via pipes, fork extra copies of the current
process, duplicate file descriptors, and interpret the special
filename "-
" as an alias for standard input or
output. It can also ignore leading and trailing whitespace that
might disguise such fancy arguments from your check patterns. While
it's true that Perl's taint checking will catch tainted arguments
used for pipe opens (unless you use a separated argument list) and
any file opens that aren't read-only, the exception this raises is
still likely to make your program misbehave.
If you intend to use any externally derived data as part of a
filename to open, at least include an explicit mode separated by a
space. It's probably safest, though, to use either the low-level
sysopen
or the three-argument form of
open
:
# Magic open--could be anything open(FH, $file) or die "can't magic open $file: $!";# Guaranteed to be a read-only file open and not a pipe # or fork, but still groks file descriptors and "-", # and ignores whitespace at either end of name. open(FH, "< $file") or die "can't open $file: $!"; # WYSIWYG open: disables all convenience features. open(FH, "<", $file) or die "can't open $file: $!"; # Same properties as WYSIWYG 3-arg version. require Fcntl; sysopen(FH, $file, O_RDONLY) or die "can't sysopen $file: $!";
Even these steps aren't quite good enough. Perl doesn't prevent you from opening tainted filenames for reading, so you need to be careful of what you show people. A program that opens an arbitrary, user-supplied filename for reading and then reveals that file's contents is still a security problem. What if it's a private letter? What if it's your system password file? What if it's salary information or your stock portfolio?
Look closely at filenames provided by a potentially
hostile user[6] before opening them. For example, you might want to
verify that there are no sneaky directory components in the path.
Names like "../../../../../../../etc/passwd
" are
notorious tricks of this sort. You can protect yourself by making
sure there are no slashes in the pathname (assuming that's your
system's directory separator). Another common trick is to put
newlines or semicolons into filenames that will later be interpreted
by some poor, witless command-line interpreter that can be fooled
into starting a new command in the middle of the filename. This is
why taint mode discourages uninspected external commands.
The following discussion pertains to some nifty security facilities of Unix-like systems. Users of other systems may safely (or rather, unsafely) skip this section.
If you're running set-id, try to arrange that,
whenever possible, you do dangerous operations with the privileges
of the user, not the privileges of the program. That is, whenever
you're going to call open
,
sysopen
, system
, backticks,
and any other file or process operations, you can protect yourself
by setting your effective UID or GID back to the real UID or GID. In
Perl, you can do this for setuid scripts by saying $> =
$<
(or $EUID = $UID
if you
use English
) and for setgid scripts by saying
$) = $(
($EGID = $GID
). If
both IDs are set, you should reset both. However, sometimes this
isn't feasible, because you might still need those increased
privileges later in your program.
For those cases, Perl provides a reasonably safe way to open a
file or pipe from within a set-id program. First, fork a child using
the special open
syntax that connects the parent
and child by a pipe. In the child, reset the user and group IDs back
to their original or known safe values. You also get to modify any
of the child's per-process attributes without affecting the parent,
letting you change the working directory, set the file creation
mask, or fiddle with environment variables. No longer executing
under extra privileges, the child process at last calls
open
and passes whatever data it manages to
access on behalf of the mundane but demented user back up to its
powerful but justly paranoid parent.
Even though system
and
exec
don't use the shell when you supply them
with more than one argument, the backtick operator admits no such
alternative calling convention. Using the forking technique, we
easily emulate backticks without fear of shell escapes, and with
reduced (and therefore safer) privileges:
use English; # to use $UID, etc die "Can't fork open: $!" unless defined($pid = open(FROMKID, "-|")); if ($pid) { # parent while (<FROMKID>) { # do something } close FROMKID; } else { $EUID = $UID; # setuid(getuid()) $EGID = $GID; # setgid(getgid()), and initgroups(2) on getgroups(2) chdir("/") or die "can't chdir to /: $!"; umask(077); $ENV{PATH} = "/bin:/usr/bin"; exec 'myprog', 'arg1', 'arg2'; die "can't exec myprog: $!"; }
This is by far the best way to call other programs
from a set-id script. You make sure never to use the shell to
execute anything, and you drop your privileges before you yourself
exec
the program. (But because the list forms of
system
, exec
, and pipe
open
are specifically exempted from taint checks
on their arguments, you must still be careful of what you pass
in.)
If you don't need to drop privileges, and just want
to implement backticks or a pipe open
without
risking the shell intercepting your arguments, you could use
this:
open(FROMKID, "-|") or exec("myprog", "arg1", "arg2") or die "can't run myprog: $!";
and then just read from
FROMKID
in the parent. As of the 5.6.1 release of
Perl, you can write that as:
open(FROMKID, "-|", "myprog", "arg1", "arg2");
The forking technique is useful for more than just running
commands from a set-id program. It's also good for opening files
under the ID of whoever ran the program. Suppose you had a setuid
program that needed to open a file for writing. You don't want to
run the open
under your extra privileges, but you
can't permanently drop them, either. So arrange for a forked copy
that's dropped its privileges to do the open
for
you. When you want to write to the file, write to the child, and it
will then write to the file for you.
use English; defined ($pid = open(SAFE_WRITER, "|-")) or die "Can't fork: $!"; if ($pid) { # you're the parent. write data to SAFE_WRITER child print SAFE_WRITER "@output_data "; close SAFE_WRITER or die $! ? "Syserr closing SAFE_WRITER writer: $!" : "Wait status $? from SAFE_WRITER writer"; } else { # you're the child, so drop extra privileges ($EUID, $EGID) = ($UID, $GID); # open the file under original user's rights open(FH, "> /some/file/path") or die "can't open /some/file/path for writing: $!"; # copy from parent (now stdin) into the file while (<STDIN>) { print FH $_; } close(FH) or die "close failed: $!"; exit; # Don't forget to make the SAFE_WRITER disappear. }
Upon failing to open the file, the child prints an error
message and exits. When the parent writes to the now-defunct child's
filehandle, it triggers a broken pipe signal
(SIGPIPE
), which is fatal unless trapped or
ignored. See Section
16.1 in Chapter 16.
[1] The setuid bit in Unix permissions is mode 04000, and the setgid bit is 02000; either or both may be set to grant the user of the program some of the privileges of the owner (or owners) of the program. (These are collectively known as set-id programs.) Other operating systems may confer special privileges on programs in other ways, but the principle is the same.
[2] Or a function that produces a list.
[3] An unofficial way is by storing the tainted string as the key to a hash and fetching back that key. Because keys aren't really full SVs (internal name scalar values), they don't carry the taint property. This behavior may be changed someday, so don't rely on it. Be careful when handling keys, lest you unintentionally untaint your data and do something unsafe with them.
[4] Unless you were using an intentionally broken locale. Perl
assumes that your system's locale definitions are potentially
compromised. Hence, when running under the use
locale
pragma, patterns with a symbolic character
class in them, such as w
or
[[:alpha:]]
, produce tainted results.
[5] Although you can untaint a directory handle, too, this
function only works on a filehandle. That's because given a
directory handle, there's no portable way to extract its file
descriptor to stat
.
[6] And on the Net, the only users you can trust not to be potentially hostile are the ones who are being actively hostile instead.