Apache is a hugely successful piece of open source software. It is the most popular web server in use today—including both open source and commercial web servers. Netcraft reports that over 60% of global Internet sites currently employ Apache.[33]
Apache originally evolved from the
HTTP daemon
program (httpd
) developed by
Rob McCool at
the National Center for Supercomputing Applications (NCSA),
University of Illinois, Urbana-Champaign. When McCool left NCSA in
1994, development of the httpd
program faltered.
A group of eight core programmers, headed up by
Brian
Behlendorf and Cliff Skolnick, decided to continue McCool’s
public domain work, forming the Apache Project (named after
“patching,” their standard method of code modification).
Other developers in the original core Apache group included
Roy T.
Fielding, Rob Hartill, David Robinson, Randy Terbush, Robert S. Thau,
and Andrew Wilson.
The Apache group has now become the Apache Software Foundation (ASF), whose purpose is to provide organizational and legal support for all of the various Apache software projects and to ensure that these projects continue even if individual volunteers leave.
The main web site for Apache is:
http://www.apache.org |
Apache[34] runs on virtually every operating system, including Win32, Linux, BSD, Solaris, and many other varieties of Unix. Apache’s modular design allows the functionality of the basic Apache code to be extended through the use of its easily accessible API; that design greatly enhances Apache’s power and flexibility. Among the important Apache modules are the following:
Provides an interface between Apache and Perl. It allows Perl code to be cached in the web server’s memory space. This substantially improves performance over standard CGI applications. The use of mod_perl also reduces the advantage gained by the use of Java servlets (which we’ll get to in Chapter 7).
Allows the powerful HTML embedded PHP scripting language to be incorporated directly into the web server’s kernel, avoiding the performance problems related to running as a separate CGI. We’ll describe PHP later in this chapter.
These include JServ,
a Java servlet engine made to serve code
written purely in Java, and mod_java,
which
allows you to extend the Apache kernel by building modules in Java,
rather than C. We’ll describe JServ briefly in this chapter and
in greater detail in Chapter 8.
There is also an Apache XML project that, as you might guess, provides an XML support layer to Apache.
Apache is such a solid web server that Oracle has recently decided to include it in the company’s release of the Internet Application Server (iAS) product. Essentially, proprietary iAS modules are bundled with open source Apache, and the whole has been released as a commercial product. By combining Apache with Oracle’s own application servers, iAS provides the stability, performance, and scalability required to run the most demanding of web applications. Oracle also claims the following business benefits:
Apache’s proven technology track record and access to the development and support of the large Apache community
The Apache-driven Secure Sockets Layer (SSL)
The Apache JServ Servlet engine (described in Chapter 8)
The ability to employ Perl CGI/DBI programs
Integration with Oracle’s PL/SQL language
Full Oracle technical support
It’s interesting to see how Oracle has not only accepted the use of open source software in its own product set, but also used the ideology of the open source movement (e.g., community support) in its corporate marketing message. What a change from earlier days when acknowledged corporate distribution of open source software would have been unthinkable. You can find out more about Oracle’s iAS product at:
http://www.oracle.com/ip/deploy/ias/index.html?web.html |
We’ve provided a basic outline of the installation steps for Apache. However, be sure to read the online documentation for your own platform very carefully:
Download the Apache source code from http://www.apache.org/httpd.html.
Once you’ve got the source, you should be able to build the
latest Apache version straight out of the box with the Apache
Autoconf-style Interface (APACI), contained within the download. If
you are determined to compile Apache manually, you can check out the
README
and INSTALL
files
for most of the relevant platforms. For Unix systems, the manual
installation (which requires a C compiler such as
gcc
, available at http://www.gnu.org/software/gcc/gcc.html )
tends to follow this pattern:
$ ./Configure $ make
This will usually place the required files in the
/usr/local/apache/
directory (for a totally
standard install). We’ll run through a non-standard install in
Chapter 7.
Edit the configuration files, as described
by the full installation instructions you will get with the download.
This often involves altering the critically important
httpd.conf
file.
Run the server:
$ /usr/local/apache/bin/httpd -f /usr/local/apache/conf/httpd.conf
In a perfect world, the server should now be running; if you’re
anything like us, you’ll encounter one or two slight problems.
But by crossing fingers, touching wood, saying three hail Marys, and
oh, by actually reading and following the directions in the
INSTALL
and README
files,
you should get to the screen shown in Figure 5-2.
(Alternatively, do as the README
file says, and
use the automatic installation.)
You may find the following site a very useful place for quick tips on configuring Apache (and many other pieces of open source software):
http://www.refcards.com |
Once you really start pushing Apache (be sure to follow all the available online documentation when you do so), you’ll be amazed at what you can do with it.
To understand how Apache and other web servers work, you need a basic understanding of the HTTP protocol.
When you type a URL into your browser, you’re probably used to typing something like http://www.somesite.com. The “http” at the beginning tells your browser that you’re requesting an HTTP-based web page. There are a few other possibilities—for example, you might be making an FTP request (e.g., ftp://ftp.ora.com) or a Telnet request (e.g., telnet://ora.com). The point is that the name before the colon specifies the protocol to your browser. This convention is used largely because the first browsers back in the NCSA days did more than just view web pages.
What is HTTP? It’s a stateless protocol, which means that you make a request and get results, and later make another request and get results, but there is no continuity between these requests. HTTP is built to be lightweight and fast, lending itself well to a distributed web of documents (following the original, inspirational work of Web pioneer Tim Berners-Lee; see http://www.w3.org/People/Berners-Lee/ ). HTTP requests return various pieces of data in multiple streams. One static HTML request might contain many images and, if frames are used, other HTML pages, all of which are fetched and returned to the requesting program or the browser that originated the request.
Let’s take a look at a somewhat more technical example. Telnet to port 80 of your favorite web server as in Example 5-2. We’ve highlighted key statements that we’ll explain at the end of the example.
Example 5-2. Connecting to the Internet via Telnet
$ telnet www.apache.org 80 Trying 63.211.145.10... Connected to www.apache.org. Escape character is '^]'. GET / HTTP/1.0 HTTP/1.1 200 OK Date: Tue, 23 May 2000 08:24:20 GMTServer: Apache/1.3.9 (Unix) ApacheJServ/1.1 PHP/3.0.12 AuthMySQL/2.20
Connection: close Content-Type: text/html <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN"> <HTML> <HEAD> <TITLE>Index of /</TITLE> </HEAD> <BODY> <H1>Index of /</H1> <PRE><IMGSRC="/icons/blank.gif"
ALT=" "> <A HREF="?N=D">Name</A> <A HREF="?M=A">Last modified</A> <A HREF="?S=A">Size</A> <A HREF="?D=A">Description</A> <HR> <IMGSRC="/icons/back.gif"
ALT="[DIR]"> <A HREF="/">Parent Directory</A> 18-Mar-2000 20:09 - <IMGSRC="/icons/folder.gif"
ALT="[DIR]"> <A HREF="dev.apache.org/">dev.apache.org/</A> 20-May-2000 19:27 - <IMGSRC="/icons/folder.gif"
ALT="[DIR]"> <A HREF="search.apache.org/">search.apache.org/</A> 22-May-2000 05:33 - <IMGSRC="/icons/folder.gif"
ALT="[DIR]"> <A HREF="www.apache.org/">www.apache.org/</A> 20-May-2000 19:23 - </PRE><HR> <ADDRESS>Apache/1.3.9 Server at locus.apache.org Port 80</ADDRESS> </BODY></HTML> Connection closed by foreign host. $
Try this example yourself and see what happens. Be sure to hit the Return key twice after the GET line, and type everything exactly as you see it above. That’s HTTP in a nutshell—well, it’s a GET request, at least. GET requests are used when you’re receiving information from the Web—for example, when you’ve requested your favorite news site.
You should also understand a bit about POST commands, which are used when you want to send information to the server—for instance, when you enter a string at a search engine and click “Search”. You’re really sending a POST command to the HTTP web server that is handling requests for that URL.
There are also some very useful nuggets of information in the response from the server shown in Example 5-2. First of all, take a close look at the “Server:” line. It’s telling us not only that it’s using Apache (of course), but also the version number and that a number of modules are compiled into the kernel, including JServ, PHP, and MySQL authentication. Now glance down a bit more and you can see the subsequent requests for additional items, a number of GIF images. When the browser receives and interprets this page, it will notice these and send them back as further requests to the server. It will then construct the web page you see in your browser.
The instructions for putting this page together, the exact layout of image and text, size, color, and alignment are all determined by the HTML tags in the page. A text-based browser like Lynx (http://lynx.browser.org) will simply ignore these image tags and build the page based on the text and HTML markup.
Apache can also protect the security of HTTP transactions via the
module
mod_ssl
. Encryption is an important aspect of
any web
server, especially if you want it to have secure pages and forms,
which are encrypted in transit between the user and your server. This
security layer essentially protects data sent over such connections
from being intercepted in transit and read by prying eyes. Any time
you wish to send credit card information, for example, you’ll
want to
encrypt the data in
transit.
The mod_ssl
module implements strong
cryptography through the Secure Sockets Layer (SSL) and
Transport Layer Security (TLS)
protocols. The implementation is based on the
open source
program OpenSSL, which uses
SSLeay, a project by Eric Young and Tim
Hudson. This module is actually based on Ben Laurie’s code developed for the
Apache-SSL web server. It can be used free outside the U.S. for
commercial and non-commercial purposes, and inside the U.S. for
non-commercial purposes. However, if you wish to use it inside the
U.S. for commercial purposes, you’ll need to obtain a
license from one of the following
sites:
http://www.apache.org/related_projects.html#apachessl |
http://www.rsasecurity.com |
You may also be interested in the commercial versions of the mod_ssl module (known as Raven) and the web server (known as Stronghold). By getting the commercial product, you also get a license to use this software in the U.S. Check out:
[33] According to Netcraft (http://www.netcraft.com/survey/ ), Apache’s share of the web server market was over 60% as of May 2000. Following far behind was IIS, placing second at 21%, and Netscape, placing third at 7%. (Microsoft claims that IIS is the most popular web server in the world, which is true only if you limit the scope of the survey to the commercial domain.)
[34] Apache 1.3.12 is the latest version as of this writing, with Apache 2.0 in alpha testing.