PHP is an interesting programming language with quite a history—from a security point of view as well as in general. Before we start learning how the language can be used to create obfuscated code and discover the features for creating unreadable snippets, let us take a short journey through the language's history and see how it developed from a small collection of useful scripts to a powerful object-oriented programming (OOP) language. To understand this chapter properly you should have some very basic PHP skills.
History and overview
It all began in 1994, when Greenland-based developer, Rasmus Lerdorf, attempted to create and publish a set of scripts that would be useful for generating interactive home pages. Most of those small tools and scripts covered logging tasks to ease the process of generating visitor stats and provide basic counters, and all were written in C and Perl. Sometime later, Lerdorf added a form interpreter and renamed the package from PHP—Personal Homepage to PHP/FI Personal Homepage and Form Interpreter. The first public release of the language occurred in 1995, when Lerdorf added support for database interaction, and the collection of tools became increasingly powerful in terms of helping users create interactive Web applications. At that time, the syntax that was used did not resemble PHP as it exists today, as the following PHP/FI code example illustrates, and in fact used deprecated XML comment syntax:
<!--getenv HTTP_USER_AGENT-->
<!--ifsubstr $exec_result Mozilla-->
Hey, you are using Netscape!<p>
In 1997, Zeev Suraski and Andi Gutmans joined Lerdorf and started to rewrite the codebase. The result was PHP/FI 2, which became the foundation for the first release of PHP proper in June 1998, with the major version number 3. At this point, the meaning of the acronym changed from Perl Homepage to PHP: Hypertext Processor. Meanwhile, the language continued to grow, and even became the runtime on which Suraski and Gutmans relied to help them as they created an e-commerce solution they were working on at the time. In addition, the first steps toward OOP integration were taken at this time, with PHP 3 offering plain encapsulation of functions into class constructs.
A byproduct of the 1997 rewrite was a PHP scripting engine called the Zend Engine, and this became the flagship product of the Israel-based company Suraski and Gutmans later formed, called Zend Technologies (the name
Zend is a combination of the founders' first names,
Zeev and A
ndi). Over the next few years, PHP managed to gain quite a bit of market share among server-side runtimes for Web applications, and in May 2000, PHP 4 was released. Running on the Zend Engine 1.0, PHP 4 introduced numerous rudimentary OOP features, taking the language one step closer to “real” OOP. Four years later, in 2004, PHP 5 was released, complete with abstract classes, interfaces, and other OOP features, all based on the Zend Engine II.
Table 6.1 summarizes this brief history of PHP.
Table 6.1 Major PHP Versions
Date | Version | Major Features |
---|
June 1995 | 1 | First official release |
November 1997 | 2 | Performance and feature improvements; implemented in C |
June 1998 | 3 | First steps toward OOP; stricter and more consistent language syntax; lots of bug fixes and more thorough beta testing |
May 2000 | 4 | Another core rewrite; support for HTTP Sessions and superglobals; optimization and bug fixes; more support for Web servers |
July 2004 | 5 | Based on Zend Engine II; heavily improved OOP features; namespaces, anonymous classes, and reimplementation of the goto feature main components in PHP 5.3 |
Forthcoming | 6 | Promises unicode support; register_globals, safe_mode, and magic_quotes deprecated |
At the time of this writing, PHP is at version 5.3.x and PHP 6 is in the works. The language is known as a user-friendly way to create Web applications very quickly, while at the same time providing an array of features, classes, libraries, and extras. There are several repositories for existing classes and toolkits, such as PEAR (PHP Extension and Application Repository), as well as libraries written in C and other languages such as PECL (PHP Extension Community Library). Countless Web sites offer free scripts and packages, and even more Web sites provide tutorials and courses on how to learn PHP and create applications. Needless to say, most of these tutorials focus on applications that work, not on applications that both work and have a decent level of security, which explains why so many PHP-based applications and Web sites are hopelessly insecure and often broken by design.
PHP's rough history in terms of security and bugs has made people highly critical of the language. Some sources even state that PHP and security is an oxymoron, and analyzing open vulnerability databases rather supports that contention. A lot of problems were and still are exploitable from remote and enable code execution on the affected Web server, stealing information, manipulating data, and interfering with the Web application's and the runtime's code flow. Often, virtual private server (VPS) and shared hosting solutions have been targeted by attackers, since attacking the PHP instances on one virtual server instance compromises the entire box, even if the other instances were secured thoroughly. Also, so-called “security improvements,” such as
magic_quotes and
safe_mode, have been broken and rendered useless quite regularly (see
http://php.net/manual/en/security.magicquotes.php and
http://php.net/manual/en/features.safe-mode.php).
Several projects have been formed to deal with the aforementioned problems. One of the most powerful and popular of these projects is known as Suhosin, which was created by Stefan Esser, an ex-member of the PHP core team. (It is amusing to follow the discussions which led to Esser's exit from the team and his subsequent creation of the Suhosin project, but the language used might not be suitable for the faint of heart.)
So, to avoid getting stuck in the history of PHP and its countless vulnerabilities, let us look at how we can get PHP code running on a Web server. A CLI module is available, but we will not focus on it. Since PHP files are being parsed whenever they are requested, the language is not really the fastest way to deliver interactive content in Web applications. There are numerous approaches to deal with that issue, among them caching engines such as XCache, Alternative PHP Cache (APC), and comparable solutions, as well as interesting projects such as HipHop (HPHP), designed and implemented by the Facebook development team to generate binary files from complete PHP Web applications to drastically increase Web site performance.
Obfuscation in PHP
There are countless ways to execute PHP code as soon as PHP has been installed. One of the most common and easiest-to-use configurations is known as LAMP, which stands for Linux, Apache, MySQL, and PHP.
For the code samples in this chapter, the Apache 2.2.12 server and PHP 5.2.10—2ubuntu6.3 were used primarily. Some of the code examples use the new features introduced in PHP 5.3 (which was not available as a packaged version at the time of this writing). Other code examples in this chapter will work smoothly only when PHP error reporting is switched off, which is usually the case on production servers and live Web sites.
If you do not have a PHP environment in which to run your own PHP obfuscation tests, visit
http://codepad.org, which provides a free tool for evaluating arbitrary PHP code.
A lot of other languages are supported as well. For PHP, be sure you enter starting delimiters, such as <?php or <?, to make it work.
For our obfuscation scenario, let us assume the Web server (Apache in our case) receives a request from a client. Depending on the object and file extension the client is asking for, the Web server decides which runtime to use to deliver the requested data. Usually the following file extensions are connected with the PHP runtime:
AddType application/x-httpd-php.php.phtml.php3
AddType application/x-httpd-php-source.phps
You can find that snippet of code connecting file extensions with the runtime in your Web server configuration file or folder, depending on the operating system distribution being used. In the following examples, we will assume our test files are suffixed with a.php extension. In some situations, we will tamper with this extension to show how to smuggle in files with different extensions and have them be parsed and executed by PHP. We saw a very atavistic example of PHP code coming from the dark ages of PHP/FI at the beginning of this chapter. Now let us look at how to execute PHP code inside PHP files we can use today:
<?php echo 'works fine'; ?>
<? echo 'works too—if short_open_tag is enabled (default=On)'; ?>
<% echo 'works—in case asp_tags are being enabled (default=Off)'; %>
<?= ‘oh—it echoes directly!' ?>
<%= ‘same for ASP like tags' %>
As you can see, there are several ways to get PHP code to run. The next snippet shows the portion of the main PHP configuration file, the php.ini file, which is responsible for enabling and disabling those methods of delimiting code:
; Allow the <? tag. Otherwise, only <?php and <script> tags are recognized.
; NOTE: Using short tags should be avoided when developing applications or
; libraries that are meant for redistribution, or deployment on PHP
; servers which are not under your control, because short tags may not
; be supported on the target server. For portable, redistributable code,
; be sure not to use short tags.
; Allow ASP-style <% %> tags.
The <? syntax is nice and short and appreciated by template developers—but causes some trouble for developers used to deal with XML—since the notation is overlapping with the declaration for XML processing instructions—forcing the developer to create a lot of overhead to make sure that XML code is not being parsed as PHP and vice versa.
In the preceding code, the
<?= delimiter syntax implies that only echoing of strings and variables is possible. We can quickly disprove that by using a simple ternary operator, turning the entire example into arbitrary code. Next, we will attempt to call the
phpinfo() method, which will give us nicely formatted output and tell us about the most important configuration and runtime parameters of the currently installed instance.
A Request for Comments (RFC) from 2008 proposes to enable
<?= even if
short_open_tag is switched off (see
http://wiki.php.net/rfc/shortags).
<?= ‘Just an echo?’ ? eval(‘phpinfo()";’): 0; ?>
Thus far, we have seen how to delimit code inside PHP files, and we learned that the Web server determines the file type based on its extension. Therefore, if a file extension is.php or.php3, or even.phtml, the Web server will delegate the request to the PHP runtime and have it do the dirty work of parsing and processing the requested object. But what if the file extension is not.php, and instead is unknown or is something similar to.php? In this case, the
default configuration of Apache 2 tries to walk backward in the filename and figure out what the real extension, and thus the MIME type, could be. This is actually a terrible security problem, since there are many ways to obfuscate the filename and make the Web server think it is a PHP file. Here is a short list of the possible extension obfuscations from which an attacker can choose:
Files with these file extensions will automagically be considered PHP files and will be delegated to the PHP runtime. This is a rather useless feature, as rendering those Web applications vulnerable provides uploads yet lacks proper file extension validation. Additionally, on UNIX-based systems, files prefixed with a dot are usually marked as invisible; thus they are not visible in directory listings and unparameterized calls of the console methods
dir and
ls. Apache also assists in the other direction, allowing us to request files and objects without an explicitly mentioned extension. So, for example, requesting
http://localhost/test will automatically deliver
http://localhost/test.php, if there's no other file named
test or
test.html. Therefore, a file called
.php.php can be requested with either
.php or
.php.php.
Of course, it is possible to create chameleon files containing valid Graphics Interchange Format (GIF) image data as well as PHP code.
Figure 6.1 shows a basic example of a small GIF-PHP chameleon. If the targeted application accepts uploads and does not validate the extension properly, it is easy to upload such a chameleon and execute arbitrary PHP code on the box afterward. The easiest way to do so is to add some PHP code inside the comments section of the GIF file and rename it to have an extension such as
.gif.php or something similar.
Although this problem is neither new nor very sophisticated, it remains unfixed and affects a lot of Web applications in the wild. The output will be:
At this point, you might be able to see where we are heading in this chapter. We have barely started, and already we discovered several ways to mess with PHP and Web servers utilizing PHP. The problem that is connected with these and the following examples is the fact that PHP is extremely powerful and provides a lot of APIs and native functions that allow evaluation of code, inclusion of files to execute their code or unveil their content, and actual delegation of system commands to the targeted server's console via functions such as exec(), shell_exec(), system(), and passthru().
Let us get to the basics of PHP obfuscation, and see how we can solve these and other problems, such as generating numbers, generating strings, and finding ways to mix in code structures and arbitrary characters, to make the code snippet as difficult to find and decode as possible. To start, take a look at the following example:
$${‘_x’.array().‘_’}=create_function(
‘$a’, ‘retur’.@false.‘n ev'.a.‘l($a);’);$$_x_(‘echo 1;’
This snippet is nothing more than a small and obfuscated kick-starter for regular string evaluation. You can easily spot the string to evaluate; it's echo 1;. But the evaluation method itelf is a bit harder to find.
PHP and numerical data types
In PHP obfuscation, numerical values play an important role, just as they do in JavaScript obfuscation. We can use numerical values for a lot of things, including generating huge numbers and converting them to other representations to extract certain characters, or just accessing elements inside an array or even a string. It is also possible to access array elements, but it is not possible to access elements of hash maps, unless the key matches the numerical value accessing it. However, strings count as arrays in terms of accessing their elements. Let us look at an example:
$a=array(1,2,3,4,5); echo $a[1]; // echoes 2
$a=array(‘1’ => 2, ‘3’ => 4); echo $a[1]; // echoes 2
$a=array(0, 1, ‘1’ => 2, ‘3’ => 4); echo $a[1]; // echoes 2
$a=‘12345’; echo $a[1]; // echoes 2
All four lines of code in the preceding example echo the same value: 2. As you can see, just as in JavaScript, it is not possible to access elements of hash maps in this way. The key ‘1’ is selected in favor of the element with the index 1; otherwise, the output of this script would have been 2212 and not 2222. But how can we create more chaotic-looking numerical values to access array and string elements? PHP provides a lot of possibilities for that purpose.
First, there are a lot of numerical representations that we can choose from. Since PHP is a dynamically typed language, the actual type or format of the numerical value usually does not matter. This often has terrible consequences in terms of application security, because in many situations, an attacker can misuse this fact and cause heavy disturbances in code flow. There is a nice write-up on this so-called type juggling technique in PHP, at
http://us3.php.net/manual/en/language.types.type-juggling.php.
If the developer forgot that true can be equivalent to 1, and even to "1" or count(false) and other statements, the consequences can be grave. We will not go into much detail on vulnerabilities such as this, but in the context of obfuscation and circumvention it might be interesting to know that true can be replaced with 1 or "1," or with other statements if the developer was not extra careful.
The following examples show some of the ways to represent numerical data in PHP. The PHP documentation on number formats is paved with warnings—and not without reason, since we can expect a lot of quirky behavior when working with numbers and the same type providing dynamic typing.
echo $a[1]; //2—decimal index
echo $a[000000000000000000000001]; //2—octal index
echo $a[0x00000000000000000000001]; //2—hexdecimal index
echo $a["000000000000000000000001"]; //2
echo $a[count(false)]; //2
echo $a["1x1abdcefg"]; //2
You can see from this example that the PHP runtime does not care about the actual type when accessing the matching substring. The only important thing here is the actual value. Also, PHP tends to ignore almost arbitrary trailing data; as soon as the numerical value has been parsed, everything else will be ignored, just like in the previous example snippet. However, in addition to using these representations, we can also use the casting functionalities PHP provides. We basically have two ways to do this: we can use functions to do the job and we can use the
(datatype) syntax. Let us have a look:
echo $a[(int)"1E+1000"]; //2
echo $a[(float)"1.11"]; //2
echo $a[intval("1abcdefghijk")]; //2
echo $a[(float)array(0)]; //2
echo $a[(float)(int)(float)(int)' 1x ']; //2
These examples made use of not only casted strings but also casted arrays and Booleans. Also, PHP does not really care about the amount of casting used on a string or other token, as the last example shows. Furthermore, whitespace can be used again for additional obfuscation, and therefore make it more difficult to find out that (float)(int)(float)(int)‘ 1x ' represents nothing more than 1.00.
This method of generating numbers provides a plethora of possibilities. For instance, we can generate numbers by using strings containing numbers, and by casting and calling methods such as
intval(). And of course, we can generate 0 and 1 from all functions and methods returning either
false or
true, or we can generate numerical values—or empty strings and other data types, such as
count(false),
levenshtein(a,b),
rand(0001,00001), and so on. With properly quoted strings, we can even use special characters such as line breaks and tabs for obfuscation, not just the classic whitespace.
We can, of course, also use PHP's automatic casting to perform mathematical operations on strings and other objects, or make use of bit-shift and comparison. The possibilities are endless.
Strings
The following sections will shed some light on how strings can be generated in PHP, and what kinds of string delimiters exist. We will learn about what makes double-quoted strings special and how we can use them for obfuscation, as well as what nowdocs and heredocs are and how we can utilize binary strings for extra obfuscation.
Introducing and delimiting strings
PHP features many ways to introduce and create strings. Most of them are known from other programming languages and are listed and explained in the PHP documentation.
The most common way to work with strings in PHP is to make use of single or double quotes for delimiting. Both ways work fine, although a double-quoted string is treated differently by PHP than a single-quoted string. Double-quoted strings, for example, can contain escape sequences for special characters such as line breaks or tabs, and even null bytes, so if the developer uses a construct such as "hello
goodbye" it will be treated differently than ‘hello
goodbye’. The first example will actually contain the newline, while the second version will just show the character sequence backslash and the letter n.
Quite a range of escape sequences can be used, starting with the null byte