The Apache Web Server

Apache is a hugely successful piece of open source software. It is the most popular web server in use today—including both open source and commercial web servers. Netcraft reports that over 60% of global Internet sites currently employ Apache.[33]

Apache originally evolved from the HTTP daemon program (httpd ) developed by Rob McCool at the National Center for Supercomputing Applications (NCSA), University of Illinois, Urbana-Champaign. When McCool left NCSA in 1994, development of the httpd program faltered. A group of eight core programmers, headed up by Brian Behlendorf and Cliff Skolnick, decided to continue McCool’s public domain work, forming the Apache Project (named after “patching,” their standard method of code modification). Other developers in the original core Apache group included Roy T. Fielding, Rob Hartill, David Robinson, Randy Terbush, Robert S. Thau, and Andrew Wilson.

The Apache group has now become the Apache Software Foundation (ASF), whose purpose is to provide organizational and legal support for all of the various Apache software projects and to ensure that these projects continue even if individual volunteers leave.

The main web site for Apache is:

http://www.apache.org

Apache[34] runs on virtually every operating system, including Win32, Linux, BSD, Solaris, and many other varieties of Unix. Apache’s modular design allows the functionality of the basic Apache code to be extended through the use of its easily accessible API; that design greatly enhances Apache’s power and flexibility. Among the important Apache modules are the following:

mod_perl (http://perl.apache.org)

Provides an interface between Apache and Perl. It allows Perl code to be cached in the web server’s memory space. This substantially improves performance over standard CGI applications. The use of mod_perl also reduces the advantage gained by the use of Java servlets (which we’ll get to in Chapter 7).

mod_php (http://www.php.net)

Allows the powerful HTML embedded PHP scripting language to be incorporated directly into the web server’s kernel, avoiding the performance problems related to running as a separate CGI. We’ll describe PHP later in this chapter.

Java Apache modules (http://java.apache.org)

These include JServ, a Java servlet engine made to serve code written purely in Java, and mod_java, which allows you to extend the Apache kernel by building modules in Java, rather than C. We’ll describe JServ briefly in this chapter and in greater detail in Chapter 8.

XML Apache modules (http://xml.apache.org)

There is also an Apache XML project that, as you might guess, provides an XML support layer to Apache.

Apache and Oracle

Apache is such a solid web server that Oracle has recently decided to include it in the company’s release of the Internet Application Server (iAS) product. Essentially, proprietary iAS modules are bundled with open source Apache, and the whole has been released as a commercial product. By combining Apache with Oracle’s own application servers, iAS provides the stability, performance, and scalability required to run the most demanding of web applications. Oracle also claims the following business benefits:

  • Apache’s proven technology track record and access to the development and support of the large Apache community

  • The Apache-driven Secure Sockets Layer (SSL)

  • The Apache JServ Servlet engine (described in Chapter 8)

  • The ability to employ Perl CGI/DBI programs

  • Integration with Oracle’s PL/SQL language

  • Full Oracle technical support

It’s interesting to see how Oracle has not only accepted the use of open source software in its own product set, but also used the ideology of the open source movement (e.g., community support) in its corporate marketing message. What a change from earlier days when acknowledged corporate distribution of open source software would have been unthinkable. You can find out more about Oracle’s iAS product at:

http://www.oracle.com/ip/deploy/ias/index.html?web.html

Installing Apache

We’ve provided a basic outline of the installation steps for Apache. However, be sure to read the online documentation for your own platform very carefully:

  1. Download the Apache source code from http://www.apache.org/httpd.html.

  2. Once you’ve got the source, you should be able to build the latest Apache version straight out of the box with the Apache Autoconf-style Interface (APACI), contained within the download. If you are determined to compile Apache manually, you can check out the README and INSTALL files for most of the relevant platforms. For Unix systems, the manual installation (which requires a C compiler such as gcc , available at http://www.gnu.org/software/gcc/gcc.html ) tends to follow this pattern:

    $ ./Configure
    $ make

    This will usually place the required files in the /usr/local/apache/ directory (for a totally standard install). We’ll run through a non-standard install in Chapter 7.

  3. Edit the configuration files, as described by the full installation instructions you will get with the download. This often involves altering the critically important httpd.conf file.

  4. Run the server:

    $ /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf

    In a perfect world, the server should now be running; if you’re anything like us, you’ll encounter one or two slight problems. But by crossing fingers, touching wood, saying three hail Marys, and oh, by actually reading and following the directions in the INSTALL and README files, you should get to the screen shown in Figure 5-2. (Alternatively, do as the README file says, and use the automatic installation.)

You may find the following site a very useful place for quick tips on configuring Apache (and many other pieces of open source software):

http://www.refcards.com

Once you really start pushing Apache (be sure to follow all the available online documentation when you do so), you’ll be amazed at what you can do with it.

The default page that runs under Apache directly after a basic installation

Figure 5-2. The default page that runs under Apache directly after a basic installation

Apache and HTTP

To understand how Apache and other web servers work, you need a basic understanding of the HTTP protocol.

When you type a URL into your browser, you’re probably used to typing something like http://www.somesite.com. The “http” at the beginning tells your browser that you’re requesting an HTTP-based web page. There are a few other possibilities—for example, you might be making an FTP request (e.g., ftp://ftp.ora.com) or a Telnet request (e.g., telnet://ora.com). The point is that the name before the colon specifies the protocol to your browser. This convention is used largely because the first browsers back in the NCSA days did more than just view web pages.

What is HTTP? It’s a stateless protocol, which means that you make a request and get results, and later make another request and get results, but there is no continuity between these requests. HTTP is built to be lightweight and fast, lending itself well to a distributed web of documents (following the original, inspirational work of Web pioneer Tim Berners-Lee; see http://www.w3.org/People/Berners-Lee/ ). HTTP requests return various pieces of data in multiple streams. One static HTML request might contain many images and, if frames are used, other HTML pages, all of which are fetched and returned to the requesting program or the browser that originated the request.

Let’s take a look at a somewhat more technical example. Telnet to port 80 of your favorite web server as in Example 5-2. We’ve highlighted key statements that we’ll explain at the end of the example.

Example 5-2. Connecting to the Internet via Telnet

$ telnet www.apache.org 80
Trying 63.211.145.10...
Connected to www.apache.org.
Escape character is '^]'.
GET / HTTP/1.0

HTTP/1.1 200 OK
Date: Tue, 23 May 2000 08:24:20 GMT
Server: Apache/1.3.9 (Unix) ApacheJServ/1.1 PHP/3.0.12 AuthMySQL/2.20
Connection: close
Content-Type: text/html
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">
<HTML>
 <HEAD>
  <TITLE>Index of /</TITLE>
 </HEAD>
 <BODY>
<H1>Index of /</H1>
<PRE><IMG SRC="/icons/blank.gif" ALT="     ">
 <A HREF="?N=D">Name</A>
       <A HREF="?M=A">Last modified</A>
       <A HREF="?S=A">Size</A>  <A HREF="?D=A">Description</A>
<HR>
<IMG SRC="/icons/back.gif" ALT="[DIR]">
 <A HREF="/">Parent Directory</A>        18-Mar-2000 20:09      -  
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="dev.apache.org/">dev.apache.org/</A>
         20-May-2000 19:27      -  
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="search.apache.org/">search.apache.org/</A>
      22-May-2000 05:33      -  
<IMG SRC="/icons/folder.gif" ALT="[DIR]"> <A HREF="www.apache.org/">www.apache.org/</A>
         20-May-2000 19:23      -  
</PRE><HR>
<ADDRESS>Apache/1.3.9 Server at locus.apache.org Port 80</ADDRESS>
</BODY></HTML>
Connection closed by foreign host.
$

Try this example yourself and see what happens. Be sure to hit the Return key twice after the GET line, and type everything exactly as you see it above. That’s HTTP in a nutshell—well, it’s a GET request, at least. GET requests are used when you’re receiving information from the Web—for example, when you’ve requested your favorite news site.

You should also understand a bit about POST commands, which are used when you want to send information to the server—for instance, when you enter a string at a search engine and click “Search”. You’re really sending a POST command to the HTTP web server that is handling requests for that URL.

There are also some very useful nuggets of information in the response from the server shown in Example 5-2. First of all, take a close look at the “Server:” line. It’s telling us not only that it’s using Apache (of course), but also the version number and that a number of modules are compiled into the kernel, including JServ, PHP, and MySQL authentication. Now glance down a bit more and you can see the subsequent requests for additional items, a number of GIF images. When the browser receives and interprets this page, it will notice these and send them back as further requests to the server. It will then construct the web page you see in your browser.

The instructions for putting this page together, the exact layout of image and text, size, color, and alignment are all determined by the HTML tags in the page. A text-based browser like Lynx (http://lynx.browser.org) will simply ignore these image tags and build the page based on the text and HTML markup.

Apache Security

Apache can also protect the security of HTTP transactions via the module mod_ssl. Encryption is an important aspect of any web server, especially if you want it to have secure pages and forms, which are encrypted in transit between the user and your server. This security layer essentially protects data sent over such connections from being intercepted in transit and read by prying eyes. Any time you wish to send credit card information, for example, you’ll want to encrypt the data in transit.

The mod_ssl module implements strong cryptography through the Secure Sockets Layer (SSL) and Transport Layer Security (TLS) protocols. The implementation is based on the open source program OpenSSL, which uses SSLeay, a project by Eric Young and Tim Hudson. This module is actually based on Ben Laurie’s code developed for the Apache-SSL web server. It can be used free outside the U.S. for commercial and non-commercial purposes, and inside the U.S. for non-commercial purposes. However, if you wish to use it inside the U.S. for commercial purposes, you’ll need to obtain a license from one of the following sites:

http://www.apache.org/related_projects.html#apachessl
http://www.rsasecurity.com

You may also be interested in the commercial versions of the mod_ssl module (known as Raven) and the web server (known as Stronghold). By getting the commercial product, you also get a license to use this software in the U.S. Check out:

http://www.c2.net/products/sh2/

Stronghold in the U.S. and Canada

http://www.int.c2.net/products/sh2/

Stronghold outside the U.S. and Canada



[33] According to Netcraft (http://www.netcraft.com/survey/ ), Apache’s share of the web server market was over 60% as of May 2000. Following far behind was IIS, placing second at 21%, and Netscape, placing third at 7%. (Microsoft claims that IIS is the most popular web server in the world, which is true only if you limit the scope of the survey to the commercial domain.)

[34] Apache 1.3.12 is the latest version as of this writing, with Apache 2.0 in alpha testing.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset