Chapter 5. Web Technologies

Web technology powers the Internet, and databases provide the underlying content and the ability to access, display, and manipulate that content. Although the fundamental relationship between databases and the Web is straightforward, building a web site that has an Oracle backend database is no simple matter. There are a great many interactions between a variety of different programming languages, protocols, and components. This chapter tries to demystify these interactions by examining the main technologies used to power the Web, paying particular attention to how these technologies relate to Oracle. We’ll discuss the various layers of magic operating between the web server and the database, and we’ll touch on the advantages and disadvantages of each approach to building your site. We’ll take a little extra time on these basics because they are so crucial to building modern, web-based applications.

Once we’ve given you a feel for some basic web technologies and concepts, we’ll introduce the various open source implementations on which most of the Oracle applications described in this book are based. Here’s a brief overview:

Apache

The web server is the heart of your web application. In this chapter, we’ll describe Apache, the most popular web server in use today. Apache is open source, but even large corporations like Oracle are now using it.

Perl and the Web

Your web applications themselves get served via the web server. The programming can be done in various languages, but Perl is a common choice. The Perl Database Interface module (DBI), discussed in some detail in Chapter 2, and Chapter 3, provides an interface for accessing data stored in Oracle (or other) databases. In this chapter, we’ll also discuss mod_perl, an Apache module that can speed up your Perl CGI (Common Gateway Interface) scripts dramatically by incorporating the Perl interpreter directly into Apache.

Java and the Web

Java is an alternative to Perl, though not an open source alternative, for web programming. Although Java itself is not open source, it’s important that we look briefly at it here because some Oracle open source web-based applications are built using Java. In this chapter, we’ll discuss various ways to develop web applications with Java and Apache. We’ll introduce JServ, which is to Java what mod_perl is to Perl; it brings a Java interpreter into Apache, allowing us to run Java-based CGI scripts. We’ll continue the discussion of JServ in Chapter 7, where we’ll also describe the relationship between Apache JServ and the Jakarta Tomcat project. We’ll also briefly describe Turbine, a framework for building web applications with Java, and the Ars Digita Community System, an open source web development suite of applications based on Oracle’s PL/SQL language and Java stored procedures.

HTML embedded scripting and the Web

Embedded scripting is another approach to developing dynamic web sites. This approach differs substantially from CGI-based programming in a few important ways. With CGI, the output of the program is an HTML page. You must follow the rigorous rules of HTML when formatting your various print statements in C, Perl, or Java. With embedded technologies, you leave the HTML formatting to the designers and simply embed snippets of code into your pages, thus making the pages dynamic. In this chapter we’ll describe several embedded scripting alternatives: PHP, EmbPerl, Mason, and Aquarium.

Databases and the Web

One of the most exciting developments for databases today is the growing proliferation of backend datastores that drive the Internet’s web sites and, indirectly, the world’s economy. Everywhere you dig in the fertile ground of the Internet you’ll find a hidden database hiding just beneath the surface, from news sites such as cnn.com or nytimes.com to information archival sites such as edgar-online.com. And let’s not forget the juggernaut e-commerce sites, such as amazon.com, with their trailer loads of product information. Online trading, financial news, search engines, airlines, auction sites, and portals—all of them utilize a database on the back end. Luckily for Oracle developers, that database is very often one emanating from Redwood Shores, California.

An interesting aspect of the backend databases is that, when they access a web site, most end users aren’t even aware that they’re interacting with a database server. Backend complexity is almost always hidden behind the scenes. Indeed, when it isn’t, and the loose wiring becomes frayed and visible, commercial web sites often fail, sparking and fizz-popping into business oblivion. So for the outside viewer, how a web page knows who you are when it recommends the latest Stephen King novel to you is often a complete mystery (a feat of magic, especially if you are a Stephen King fan[30]).

We’re aiming to become privileged members of this magic circle, and we need to know what’s going on under the hood. So in the rest of this chapter we’ll take a look behind the scenes at the various web-based technologies used to build the Oracle applications described in Chapter 6, and those you’ll need to understand in order to build your own applications. First, though, we’ll describe the most important concepts that underlie the connection between the Web and the databases behind it: dynamic content, personalization, CGI programming, and caching.

Dynamic Content

Dynamic content is the programming that enables a web site to change from day to day or hour to hour, as well as between one user and another. These days, most sites are dynamic; there aren’t many sites that present only static content—pages that are the same every time you visit them.

Web sites haven’t always been based on dynamic content. They’ve evolved enormously from the mostly static pages available when the Mosaic genie was first released from the bottle back in 1993. Today’s web sites are captivating and dynamic, featuring news, searching, shopping, and personalized content. Sites have become ever more compelling with animated GIF images, clever use of JavaScript and Macromedia Flash, and, more recently, streaming audio and video content.

In the early days of web site development, when using the <FONT> tag was the height of sophistication, the first dynamic pages were fairly simple scripted programs sitting quietly on the server, minding their own business, and typically situated in the /cgi-bin directory. Normally, a URL (Uniform Resource Locator) would refer to a static HTML (HyperText Markup Language) page on the server. However, when a special URL was requested which specified a /cgi-bin program, the server executed this application and returned the results to the client’s browser. Such programs were called Common Gateway Interface (CGI) applications; we’ll discuss CGI shortly.

These early web pages retrieved data from the file system, organized the data however they wanted, picked up the current date and time, and then did other useful work to display a page that was dynamic—different and tailored specifically for use each time it was requested.

It didn’t take long for someone to come up with the idea of plugging these early CGI applications into databases to retrieve useful information. From there, the Web we know today was born. As an example, let’s look at today’s banking applications. If you have an account at a bank, the bank keeps records about your financial transactions, your various accounts, how much money is where, fees applied each month, checks written against the account, balance, interest earned, and multitudes of related information. With secure online transactions, companies are now able to make this information available to customers via the Web. All you need now is your account number and various pieces of authentication information, and with this you can connect to your financial institution from virtually anywhere in the world, perform transactions, pay bills, and worry about your increasingly negative balance. This trend has been so significant that some banks are closing down entire chains of their old-fashioned real banks and replacing them with jazzy online outlets instead.

For now we’ll ignore the effects of this revolutionary type of business on society and focus on how the technologies work. The secret is dynamic content, which works in the following way.

When you log in to the site and specify your account number, a generic template is filled with specific information about you and your account. That template has been built with one of a number of technologies we describe in this chapter: PHP, EmbPerl, JavaServer Pages, or the commercial StoryServer or Microsoft ASP. The template specifies the general look of the page, where the images will go, what tables there are to organize content, colors, fonts, and so on. The template is basically HTML, but it has added bits of code that, in turn, query the database. In the case of the banking data we’ve been discussing, after you’ve authenticated and verified who you are with the server, requests are made via the bank’s backend database for all of your latest financial data. That data is (we hope) encrypted en route, to avoid interception by prying eyes.

News sites work in a way that’s similar to online banking (though potentially without the authentication). Go to any site from http://www.slashdot.org to http://www.news.com or http://www.cnn.com, and what do you see? Today’s latest news is often moving about in tickertape displays and JavaScript scrollers, trying to grab your attention (as in Figure 5-1), along with pushed advertisements that change every time you reload the page.

A typically busy web news page bursting with information

Figure 5-1. A typically busy web news page bursting with information

Today’s dynamic web pages are often built on the fly and customized for your particular request (well, that’s not completely true—they’ve been cached on the server, but more about that later). When you make your request, the page is built based upon the latest articles and news for today’s current date. You also know, if you’ve ever searched the archives on such a site, that the old news is still around too, possibly from many years ago. Again, this is all stored in the database. Depending on what’s requested, the template is filled with different data from the database.

Personalization

Web page personalization is really just a special form of dynamic content. Go to http://www.netscape.com, http://www.themes.org, or http://www.yahoo.com and you’ll find a little button labeled “my” or a preferences area.

How does personalization work? When you visit one of these sites for the first time and select the “my” section or preferences area, you’ll be asked to create a user ID. When you select one (typically an email address), it will be compared against other IDs in the database to be sure that it’s unique. Once you’ve done this, you’ll have an identifier with which the site can keep track of you. Later on, when you visit the site and log in, the site will know what personalized content to deliver for you. You can also use the various navigational and arrangement controls on the page to customize it to your liking, and your settings will be stored in the database according to the user ID that you set up previously. The site will then subsequently know just how to deliver the page the way you like it.

Behind the scenes, personalization relies on cookies. The web site sends your browser a cookie that enables the web site to identify you while you’re using that site. There are two different kinds of cookies: persistent and non-persistent. By default, friendly sites will typically send you a non-persistent cookie, one that will last only while your browser software is left running. It is only stored in memory, not written to your disk. But when you created a user ID and password, if you told the site to “Remember me” (or something similar), the site will actually send you a persistent cookie, one written to your hard drive by your browser software. This cookie won’t automatically be your user ID, but it will typically be a unique identifier that the site can use to get your ID when you revisit the site.

Even persistent cookies usually have an expiration date, which can be set to one day, one week, one month, or one year (or anything in between). This all depends on the site and the type of cookie policy they have.

Cookie operations typically work as follows:

  1. When you visit a web site, it asks your browser for cookies with a particular name. This name might be netscape.com, themes.org, or yahoo.com, for example.

  2. Your browser then returns the named cookies.

  3. The site looks up your user ID given the unique identifier it retrieves from the cookie.

The use of cookies accounts for why some sites make you log in each time, and why others can remember you from session to session, even if you shut down your machine and revisit the site several days later.

Some people disable cookies for privacy reasons. What happens if you have cookies disabled in your browser? Different sites try in different ways to personalize the web pages presented to you anyway. One way that’s been devised is to use hidden tags. Essentially, you go to the site and log in manually, and when you hit the Submit button on a form, a hidden tag is added to your returned page with a unique identifier. For each page you view on the site, this hidden tag is added as you browse the site, and the site can keep track of who you are. This is a bit tricky and seems a little puzzling to the end user, but it works, and it is typically contained within a line such as this example:[31]

<input type=hidden name=user_id value="[email protected]">

CGI and Web Programming

How does dynamic content actually work? How do web pages change for each visit or even each reload? And how can this world of dynamic content get served through such a simple little protocol as HTTP? Until recently, most dynamic content has been created via little programs called CGIs that ran on the servers. (We’ll describe an alternate method, involving embedded scripting, later in this chapter.)

How do CGIs work? On the client side, the browser calls a CGI in the same way that it would call a static page—by making a request for a file from the web server. By calling a CGI, though, the client is actually telling the server to run a small program. Running that program produces the dynamic content for the web page.

A CGI program can be written to do just about anything, but its output must be an HTML page returned by HTTP. You can write CGI programs in virtually any language that can be made to obey the protocol. CGI files are essentially masquerading as HTML files to the browser. (As far as a browser’s concerned, if it gets a standard HTTP response and page, it’s a happy bunny.) Binary CGIs are often written in C and then compiled. Most CGI programs today tend to be scripts, usually written in Perl or another popular scripting language, because scripts are quicker to write, develop, and debug, and turnaround speed is what web publishing is all about.

On the server side, the web server sets aside certain directories for CGI files. These special directories are named cgi-bin by convention (bin stands for binaries, because these directories used to be populated mainly with binaries). The CGI files are programs that, when called, generate an HTML page.

CGI stands for Common Gateway Interface, but even its name doesn’t reveal that much about what exactly it is. Essentially it’s another open protocol. It defines a number of environment variables that will be available to programs called from a browser. Some important pieces of information are passed to the program, including the client’s hostname, IP address, username, and browser, as well as the method used (GET, POST, etc), and finally the URI (everything after the protocol and hostname in the URL).

So in come these useful bits of information, which your CGI program may or may not use, and out must come an HTTP 1.0- or 1.1-compliant stream of data—for all intents and purposes to an innocent bystander, a static HTML file.

Let’s look at an example of a CGI program. We’re going to assume that you have access to a freshly installed Apache web server (we’ll describe Apache a bit later in this chapter), and that you’ve set up the cgi-bin directory—via the httpd.conf file—where you can run the CGI program:

               
#!/bin/sh
echo Content-type: text/plain
echo
echo CGI/1.0 test script report:
echo
echo The current date and time is:
date
echo
echo SERVER_SOFTWARE = $SERVER_SOFTWARE
echo SERVER_NAME = $SERVER_NAME
echo REQUEST_METHOD = $REQUEST_METHOD
echo SCRIPT_NAME = "$SCRIPT_NAME"

There are a couple of important things you should take note of. One is that this is a shell script. As we mentioned, a CGI program can be written in lots of different languages, as long as it can output an HTML file (or, in Unix terms, as long as it can write to STDOUT). Also, you’ll notice that we’ve output, via the standard Unix echo program, the contents of some of those previously mentioned (and highlighted) CGI environment variables. Here’s the output:

CGI/1.0 test script report:
The current date and time is:
Wed May 24 03:28:09 EDT 2000
SERVER_SOFTWARE = Apache/1.3.11 (Unix) PHP/3.0.14
SERVER_NAME = www.iheavy.com
REQUEST_METHOD = GET
SCRIPT_NAME = /cgi-bin/test.cgi

You’ll also notice that, each time you run this script, it prints the current date. This is important because it begins to illustrate how dynamic content is made possible. When your browser receives the HTML, it looks like a static page, but the page itself doesn’t exist anywhere on the target server. It’s generated by that target server when you request the program via your browser (or when the web server calls the script auto-magically).

Example 5-1 shows what that script would look like in Perl.[32]

Example 5-1. cgi_env.cgi Perl Program to Interrogate the Web Server

#!/usr/bin/perl -w
use CGI;
my $datestr = `date`;
my $cgivar = new CGI;
my $SERVER_SOFTWARE = $cgivar->server_software (  );
my $SERVER_NAME = $cgivar->server_name (  );
my $REQUEST_METHOD = $cgivar->request_method (  );
my $SCRIPT_NAME = $cgivar->script_name (  );
print $cgivar->header,            # create HTTP header
    "CGI/1.0 test script report:<br><br>
",
    "The current date and time is:<br>
",
    "$datestr<br>",
    $cgivar->start_html('test'),  # start the HTML
print "<br>
";
print "SERVER_SOFTWARE = $SERVER_SOFTWARE<br>
";
print "SERVER_NAME = $SERVER_NAME<br>
";
print "REQUEST_METHOD = $REQUEST_METHOD<br>
";
print "SCRIPT_NAME = $SCRIPT_NAME<br>
";
print $cgivar->end_html;          # end the HTML

Caching

Dividing content up into static and dynamic types is not the end of the story. There is a third type of content called cached content. When you make a request for a particular static page, it may seem that the images are cached on your local machine. Your browser automatically handles this caching for you. What you may not know is that the content may actually be cached elsewhere, between you and the originating server. In essence, when you request a page, your requests may be hitting a caching server that will, in turn, go and request the actual page only if it has not already been requested by another user. Caching is an effective method for improving your site’s scalability. It can provide one more layer of indirection from the actual backend database, and thus better performance at the front end.

In some cases, even dynamic content can be cached. For instance, news sites may cache content based on requests, so particular current news stories—for example, the latest presidential election news or scandal—will already be put together and made available to you in what look like static pages, which will come whizzing back to you in mere nanoseconds.

Squid is an open source Internet object cache, which means that it can cache data via HTTP, FTP, or gopher protocols. The Squid server then becomes a proxy to the real web server, standing in lieu of it and making requests to the real server only when necessary. The main Squid web site is:

http://www.squid-cache.org

Squid operates on AIX, Digital Unix, FreeBSD, HP-UX, Irix, Linux, NetBSD, Nextstep, SCO, Solaris, and OS/2 platforms.



[30] Let’s just hope that if you select The Lawnmower Man (within theNight Shift collection) from a web site book list, all of the mobile WAP phones in the world don’t start ringing at once.

[31] You can usually find these slinking in your web page HTML code by selecting View Page Source from a typical browser’s drop-down menus.

[32] Notice the use of Lincoln Stein’s CGI.pm Perl module.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset