15.3 Dynamic Websites

Although static websites fascinated the first web users in the 1990s, they were soon eclipsed by sites offering more dynamic content. Dynamic sites made it easy to update the site’s contents continuously and to provide a multimedia experience.

Modern servers construct web pages on demand, based on the visitor’s input. If the visitor clicks a particular link or fills out a form, the site constructs the page tailored to the user’s selections. Sophisticated sites even take a visitor’s earlier activities into account when they build a page.

Web forms provided the original mechanism for dynamic web output. HTML contains “form” tags that display buttons, text fields, menus, selection boxes, and so on. We construct a web form by including these tags in the page markup. When the page displays, the user may fill in the fields and make selections. The user then may submit the form to the web server by clicking a “Submit” link or button. Otherwise, the user may navigate away from the page, often losing the filled-in contents.

Web Forms and POST

When the user clicks “Submit,” the browser transmits a special HTTP command called POST. This command “posts” the form’s fields to the web page. FIGURE 15.14 shows what happens when Alice fills out a form.

An illustration depicts selection, filling, and submission of the form by Alice.

FIGURE 15.14 Alice chooses, fills out, and submits a form.

The process takes the following four steps:

  1. Alice selects a link that retrieves the form from the server.

  2. The server transmits the form to the browser, which displays it to Alice.

  3. She fills out the form. When she clicks “Submit,” the browser sends the form’s fields to the server in the POST command.

  4. The server processes the form and sends a reply.

The “Submit” button constructs a URL. Although a typical GET operation simply identifies a path to a file, a typical POST URL contains additional fields. The “?” character marks the beginning of a list of arguments that match up the form’s fields. Each field’s name serves as an argument name. The POST command passes the argument values contained in the corresponding fields.

The form in this example has three fields: clerk, what, and amt. Each serves as a variable that receives its field’s value. Alice filled in the clerk as Bob, the what as “Pay,” and the amount (“amt”) as $100. The POST command carries that list of argument values back to the server.

The POST command doesn’t literally send the results back to the form’s web page on the server; it doesn’t copy them back into “form.html.” A web page might contain a form, but the page itself is static. It can’t process input itself.

Instead, the POST command tells the server to run a command script or other program. Originally, most scripts were in the Perl programming language. Perl is a sophisticated interpreted language that evolved from Unix shell scripts. It is still used heavily for system administration tasks.

When starting the script, the POST command provides the arguments sent by the browser, along with their assigned values. The script then performs its actions based on how Alice filled in the form. At minimum, a script might produce a customized page of HTML. The server sends the script’s HTML output back to the browser, which displays it to Alice.

15.3.1 Scripts on the Web

As websites became more complex, site builders relied more and more on scripts to simplify site design and operation. Modern sites often use scripts on the server side (called server-side scripts) to construct pages with consistent layout and format, including the latest and most accurate contents for that page. Sites also embed scripts in the pages themselves; the client’s browser executes those scripts on the user’s own computer. Scripts run by the browser are client-side scripts.

Server-side scripts have become essential to site construction and operation. Today, few sites actually store their contents as HTML pages. Instead, most links lead to scripts that construct the HTML page contents based on parameters passed with the URL.

Although the examples in the previous section showed arguments passed with POST commands, any link or other URL may include a list of arguments. The browser includes those arguments when it sends the GET command.

The server executes a server-side script because the URL itself refers to a script or to a page containing a script. FIGURE 15.15 illustrates the process. When the server encounters a script, it passes the script to the interpreter for that particular scripting language. The script interpreter executes the script. The script’s output yields text in HTML format, which the server transmits back to the client.

An illustration depicts execution of a server-side script.

FIGURE 15.15 Executing a server-side script.

Some script languages are web-specific, while others are general-purpose languages adapted to use with websites. Perl is a general-purpose language, as is Java. Scripts written in web-specific languages often look superficially like HTML files. The dynamic elements of the web-specific scripts appear in special tags that aren’t part of conventional HTML. The script interpreter intercepts those tags, processes them, and inserts HTML-appropriate text in their place.

Scripting Languages

Initially, web-based scripts used the POST command and a mechanism called the Common Gateway Interface (CGI). The web server executed such commands by calling up scripts with the .cgi suffix. Today, a script may carry one of many suffixes, including:

  • ■   PL—Perl

  • ■   ASP—Active Server Pages: a web-specific scripting language that supports instructions in Visual Basic, Javascript, and ActiveX, as supported by Microsoft’s IIS server

  • ■   ASPX—ASP Extended: ASP scripting extended to support Microsoft’s “.NET” network programming framework

  • ■   PHP—Hypertext Preprocessor, originally called “Personal Home Page” format: a highly popular open-source scripting language used with countless websites

  • ■   CFM—Cold Fusion Markup Language: a commercial web-specific scripting language

  • ■   JS—Javascript: a web-oriented scripting language based loosely on Java

  • ■   JSP—Java Server Page: server-side script written in Java

  • ■   SSJS—Server-Side Javascript: a version of Java tailored for web scripting

  • ■   PY—Python: a general-purpose language used for server scripting

  • ■   RB—Ruby: another general-purpose language used for server scripting

When a URL refers to a script, it may include one or more arguments. The browser sends these to the server along with the path name in the GET command. The server extracts the path name from the GET command and retrieves the script file. It then extracts the argument list and passes it to the script interpreter along with the path name.

The interpreter executes the script and provides it with the argument list and path name. The script constructs the page based on the arguments it receives.

In some more sophisticated systems, there is a single master script that intercepts all incoming HTTP commands. The master script extracts the URL path name and uses it as an argument to determine the page’s contents.

Client-Side Scripts

Client-side scripts appear as short procedures embedded in an HTML page. The server itself interprets server-side scripts. The server ignores client-side scripts and transmits them to the browser within the web page text. FIGURE 15.16 shows a simple client-side script written in JS.

A screenshot of the client-side HTML script in JavaScript is shown.

FIGURE 15.16 Client-side HTML script in Javascript.

Most of the web page in the figure consists of conventional HTML. The script begins with the <script> tag. The tag’s type argument indicates the script language being used (JS in this case). The browser interprets everything until the closing </script> tag as part of the script.

The script in the figure uses the “prompt” function to ask the browser’s user to type in a name (FIGURE 15.17); then the script writes out the rest of the web page text, including a personalized message. This takes place entirely within the browser.

A screenshot of the output of a JavaScript is shown. A pop-up window is shown on the Javascript application window. The window includes a Textbox to enter the name and two buttons OK and Cancel below the Textbox.

FIGURE 15.17 Executing the client-side script in Figure 15.16.

Courtesy of Dr. Richard Smith.

This example uses the JS language, which most browsers support. Microsoft’s browsers also support Visual Basic scripting. Although some browsers support other languages, most client scripts use JS.

Client Scripting Risks

Client-side scripting poses special risks because the browser’s chain of control passes to the instructions in the web page’s script. If the browser visits a malicious website or an attacker saves a malicious script on a legitimate site, then the browser may execute a malicious script. Most scripting languages provide enough programming functions to implement a virus or Trojan capability.

These risks pose an interesting dilemma for web developers. On the one hand, client-side scripts improve efficiency by using the client processor to do the work; they may interact with the user and the client file system without the delays of server interaction. On the other hand, client-side programs could modify and damage files and other resources on the client’s host computer. We can increase security by restricting the power of client-side scripting languages, but that reduces their effectiveness.

For example, many sites provide client-side scripts to streamline the uploading of photos or other content. These scripts must be able to select and retrieve files from the host’s file system. However, this also opens the host to malicious scripts that might retrieve files without permission.

A particularly common form of client-scripting attack is called cross-site scripting (XSS). In these attacks, the attacker tricks a benign website into feeding a malicious script to the victim’s browser. A classic example is when sites allow visitors to post comments, and the comments may contain arbitrary HTML text. The attacker either posts the malicious script in a comment or points the comment to a script stored on the attacker’s website. The victim’s browser loads and executes the attack script simply by visiting the page containing the attacker’s comment.

Many sites try to prevent such attacks by limiting the format of posted comments and other user-supplied text. The sites specifically try to filter out scripts or references to scripts. Other defensive strategies try to prevent scripts from doing harm by limiting their capabilities.

“Same Origin” Policy

One defensive strategy that many browsers use is the same origin policy, which grants a script access to resources as long as the resources share the same host, port number, and protocol. If the script tries to retrieve other files from the host, then the script must use the same protocol and port number.

For example, the script can’t retrieve a plaintext “http” file if it was retrieved through an encrypted “https” connection. Likewise, a script loaded from “amawig.com” can retrieve other files from that server, but not from “secret.org,” even if the browser can reach that server. This reduces the risk posed by a client-side script acting as a Trojan that takes advantage of the client’s network access rights. XSS attacks try to subvert the same origin policy by apparently retrieving the script from a trustworthy website.

Sandboxing

We may also sandbox the browser to try to reduce these risks. This technique runs the browser in a restricted environment that limits its access to host files and other resources. Some sandboxes block access to all resources outside the sandbox, except those explicitly brought in by the user. Google has built sandboxing into its Chrome browser.

Newer operating systems provide a limited degree of sandboxing when the browser runs as a normal user as opposed to being an administrator. Malicious scripts still may damage the user’s personal resources, but they should not be able to damage system resources. Additional levels of sandboxing may further protect personal resources.

15.3.2 States and HTTP

If HTTP and HTML are essentially stateless, then the output of a web form will depend entirely on the data Alice typed in. The form can’t collect information from pages Alice visited earlier or forms she previously filled out. This poses a problem for dynamic websites.

For example, Alice visits a site to find a pair of pants and a jacket to go with it. She clicks on a page and selects the pants she wants. Now she navigates to a different page that contains the desired jacket. Because of statelessness, however, the server already has forgotten which pants she wanted.

We know that modern websites easily keep track of selected merchandise. Most of us realize that sites often track us from one site to the next and use our browsing data to guess our interests and send us targeted advertisements. These actions clearly require “statefulness.”

The modern website shopping cart keeps track of selected purchases while we browse through a site. The cart mechanism also provides a checkout procedure in which we confirm our selections and pay for them electronically.

Beyond shopping carts, Amazon.com pioneered techniques to tailor each page to reflect the interests of each of its visitors. Amazon keeps track of prior searches, of product pages visited, and of purchases. Amazon lists related products and searches on page columns, while the central part of the page focuses on the visitor’s selected items or searches.

Browser Cookies

A browser cookie, usually just called a “cookie,” is a piece of data stored by a browser on behalf of a web server. If the browser has a cookie from a particular web server, then it includes the cookie whenever it sends that server an HTTP message.

The browser won’t have a cookie when it visits a server for the very first time. FIGURE 15.18 shows the client’s HTTP GET and the server’s response for such an initial visit.

A screenshot of the browser window displaying the client’s GET and Server’s response is shown.

FIGURE 15.18 The initial website visit produces a cookie.

The client retrieves the home page from Amalgamated Widget as shown in the upper part of Figure 15.18. Because the GET command did not contain cookies already, the server generates cookies for that browser. The server includes the cookies in its response to the GET command, shown under “Server’s Response.” The response includes two “Set-Cookie” commands that provide two cookies to the browser.

The next time the browser visits the server, it includes the cookies and their corresponding cookie values in its HTTP request. In FIGURE 15.19, the browser visits the “About” page. Note the cookies included in the last line of the GET request.

A screenshot of the browser window displaying the header with the cookie included is shown.

FIGURE 15.19 The browser adds the cookie to the header in subsequent visits.

When the Amalgamated server receives the request in Figure 15.19, the cookie tells the server that this visitor has visited at least once before. Thus, the cookie provides a simple way of counting visits.

The web browser uses the client computer’s file system to maintain a copy of every cookie. The client typically stores the cookies in a folder along with other user-specific data, like the personal desktop background selection, personal email files, desktop layout, and so on. Each user has a personalized database of cookies. Browsers often store the cookies in individual files, each named for the server that owns the cookie. Whenever the browser visits a particular server, it includes the cookies received from that server.

When the client visited the Amalgamated website, the browser received two cookies: the UID cookie and the SESSION cookie. The server uses the first cookie to uniquely identify each visitor. The second cookie lets the site distinguish between separate visits by the same person. The SESSION cookie expires after about 24 hours; when the user visits the site again, the server detects the missing SESSION cookie and issues another one, indicating a new session.

The web server software itself (e.g., Apache or IIS) essentially ignores the cookies; it simply passes them along to scripts that implement a web application for that site. The application scripts may run when the server receives particular HTTP commands and those scripts must interpret the cookies. The example provided here for managing sessions is a simple example; actual sites use different and more sophisticated strategies.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset