Configuring Apache HTTP to proxy your Tomcat(s)

We are going to access the application using a local alias cloudstreetmarket.com (on the port 80) rather than the former localhost:8080. Implementing the configuration for that is sometimes a mandatory step, when developing third-party integrations. In our case, the third-party will be Yahoo! and its OAuth2 authentication servers.

Getting ready

It will mostly be about configuration. We will install an Apache HTTP server and stick to the Apache Tomcat How-To. This will drive us to update our Tomcat connector and to create a virtual host in the Apache configuration file.

You will discover how this configuration can allow a great flexibility and simply serve web content to the customers with an advanced and scalable architecture.

How to do it...

  1. On MS Windows, download and install Apache HTTP Server.
    • The easiest way is probably to download directly the binaries from an official distributor. Select and download the appropriated latest Zip archive from one of the following URLs:
    • Create a directory C:apache24 and unzip the downloaded archive into this location.

    http://www.apachelounge.com/download

    http://www.apachehaus.com/cgi-bin/download.plx

    Note

    You should be able to reach the bin directory through this form: C:apache24in.

  2. On Linux / Mac OS, download and install Apache HTTP Server.
    1. Download the latest sources (compressed in a tar.gz archive) from the apache website:

      http://httpd.apache.org/download.cgi#apache24

    2. From the downloaded archive, extract the sources:
      $ tar –xvzf httpd-NN.tar.gz
      $ cd httpd-NN
      

      Note

      The NN command being the current version of Apache HTTP.

    3. Autoconfigure the arborescence:
      $ ./configure
      
    4. Compile the package:
      $ make
      
    5. Install the arborescence:
      $ make install
      
  3. On MS Windows, add a new alias in the hosts file.
    1. Edit with Notepad the file that can be found at the following path:
      %SystemRoot%system32driversetchosts

      Note

      This file has no extension, Notepad 'doesn't complain about that when you want to save the file.

    2. Add the following entry at the end of the file:
      127.0.0.1 cloudstreetmarket.com
    3. Save the modification.
  4. On Linux/Mac OS, add a new alias in the hosts file.
    1. Edit the file that can be found at the following path: /etc/hosts
    2. Add the following entry at the end of the file:
      127.0.0.1 cloudstreetmarket.com
    3. Save the modification.
  5. For all Operation Systems, edit the httpd.conf Apache configuration file.
    1. This file can either be found at C:apache24conf (on Windows) or at /usr/local/apache2/conf (on Linux or Mac).
    2. Uncomment the following two lines:
      LoadModule proxy_module modules/mod_proxy.so
      LoadModule proxy_http_module modules/mod_proxy_http.so
      
    3. Add the following block at the very bottom of the file:
      <VirtualHost cloudstreetmarket.com:80>
        ProxyPass        /portal http://localhost:8080/portal
        ProxyPassReverse /portal http://localhost:8080/portal
        ProxyPass        /api  http://localhost:8080/api
        ProxyPassReverse /api  http://localhost:8080/api
        RedirectMatch ^/$ /portal/index
      </VirtualHost>

    Note

    A sample of a modified httpd.conf file (for Apache HTTP 2.4.18) can be found in the chapter_5/source_code/app/apache directory.

  6. Edit the server.xml Tomcat configuration file.
    1. This file can either be found at C: omcat8conf (on Windows) or at /home/usr/{system.username}/tomcat8/conf (on Linux or Mac).
    2. Find the <Connector port"="8080"" protocol"="HTTP/1.1""... > definition and edit it as follows:
          <Connector port"="8080"" protocol"="HTTP/1.1""
          connectionTimeout"="20000" redirectPort"="8443"" 
          proxyName"="cloudstreetmarket.com"" proxyPort"="80""/>

    Note

    A sample of a modified server.xml file (for Apache Tomcat 8.0.30) can be found in the chapter_5/source_code/app/tomcat directory.

  7. On MS Windows, start the Apache HTTP server.
    1. Open a command prompt window.
      • Enter the following command:
        $ cd C:/apache24/bin
    2. Install an Apache service:
      $ httpd.exe –k install
      
      • Start the server:
        $ httpd.exe –k start
  8. On Linux/Mac OS, start the Apache HTTP server:
    • Start the server:
      $ sudo apachectl start

    Now start the Tomcat server and open your favorite web browser. Go to http://cloudstreetmarket.com, you should obtain the following landing-page:

    How to do it...

How it works...

The Apache HTTP configuration we made here is somehow a standard nowadays. It supplies an infinite level of customization on a network. It also allows us to initiate the scalability.

DNS configuration or host aliasing

Let's revisit how web browsers work. When we target a URL in the web browser, the final server is accessed from its IP, to establish a TCP connection on a specific port. The browser needs to resolve this IP for the specified name.

To do so, it queries a chain of Domain Name Servers (on the Internet, the chain often starts with the user's Internet Service Provider (ISP). Each DNS basically works this way:

  • It tries to resolve the IP by itself, looking-up in its database or its cache
  • If unsuccessful, it asks another DNS and waits for the response to cache the result and sends it back to the caller

A DNS managing one specific domain is called a Start Of Authority (SOA). Such DNS are usually provided by registrars, and we usually use their services to configure records (and our server IP) for a domain zone.

Around the web, each DNS tries to resolve the ultimate SOA. The top hierarchy of DNS servers is called root name servers. There are hundreds of them bound to one specific Top-Level Domain (TLD such as .com, .net, .org…).

When the browser gets the IP, it tries to establish a TCP connection on the specified port (it defaults to 80). The remote server accepts the connection and the HTTP request is sent over the network.

In production – editing DNS records

As soon as we approach the production stage, we need the real domain name to be configured for DNS records, online, with a domain-name provider. There are different types of records to edit. Each one serves a specific purpose or resource type: host, canonical names, mail-exchanger, name server, and others. Specific guidance can usually be found on the domain name provider website.

An alias for the host

Before contacting any kind of DNS, the operating system may be able to resolve the IP by itself. For this purpose, the host file is a plain-text registry. Adding aliases to this registry defines proxies to whatever final server. Doing so is a common technique for development environments but isn't restricted to them.

Each line represents an IP address followed by one or more host names. Each field is separated by white space or tabs. Comments can be specified at the very beginning of a line with a #character. Blank lines are ignored and IPs can be defined in IPv4 or IPv6.

This file is only for hosts aliasing, we don't deal with ports at this stage!

Alias definition for OAuth developments

In this chapter, we will authenticate with an OAuth2 protocol. In OAuth, there is an Authentication Server (AS) and a Service Provider (SP). In our case, the authentication server will be a third-party system (Yahoo!) and the service provider will be our application (cloudstreetmarket.com).

The OAuth2 authentication and authorization happen on the third-party side. As soon as these steps are completed, the authentication Server redirects the HTTP request to the service provider using a call-back URL passed as a parameter or stored as a variable.

Third-parties sometimes block call-back URLs that are pointing to localhost:8080. Testing and developing OAuth2 conversations locally remains a necessity.

Configuring a proxy for the hostname (in the hosts file) and a virtual host in an HTTP server to manage ports, URL rewriting, and redirections is a good solution for the local environment but also for a production infrastructure.

Apache HTTP configuration

The Apache HTTP server uses the TCP/IP protocol and provides an implementation of HTTP. TCP/IP allows computers to talk with each other throughout a network.

Each computer using TCP/IP on a network (Local Network Area or Wide Network Area) has an IP address. When a request arrives on an interface (an Ethernet connection for example), it is attempted to be mapped to a service on the machine (DNS, SMTP, HTTP, and so on) using the targeted port number. Apache usually uses the port 80 to listen to. This is a situation when Apache HTTP takes care of one site.

Virtual-hosting

This feature allows us to run and maintain more than one website from a single instance of Apache. We usually group in a <VirtualHost...> section, a set of Apache directives for a dedicated site. Each group is identified by a site ID.

Different sites can be defined as follows:

  1. By name:
    NameVirtualHost 192.168.0.1
    <VirtualHost portal.cloudstreetmarket.com>…</VirtualHost>
    <VirtualHost api.cloudstreetmarket.com>…</VirtualHost>
  2. By IP (you will still have to define a ServerName inside the block):
    <VirtualHost 192.168.0.1>…</VirtualHost>
    <VirtualHost 192.168.0.2>…</VirtualHost>
  3. By port:
    Listen 80
    Listen 8080
    <VirtualHost 192.168.0.1:80>…</VirtualHost>
    <VirtualHost 192.168.0.2:8080>…</VirtualHost>

Our current configuration with one machine and one Tomcat server is not the ideal scenario to demonstrate all the benefits of virtual hosting. However, we have delimited one site with its configuration. It's a first step towards scalability and load-balancing.

The mod_proxy module

This Apache module offers proxy/gateway capabilities to Apache HTTP server. It's a central feature as it can turn an Apache instance into a unique interface able to manage a complex set of applications balanced across multiple machines on the network.

It pushes Apache beyond its initial purpose: exposing a directory on the filesystem via HTTP. It depends on five specific sub-modules: mod_proxy_http, mod_proxy_ftp, mod_proxy_ajp, mod_proxy_balancer, and mod_proxy_connect. Each of them, when needed, requires the main mod_proxy dependency. Proxies can be defined as forward (ProxyPass) and/or as reverse (ProxyPassReverse). They are often used to provide internet-access to servers located behind firewalls.

The ProxyPass can be replaced with ProxyPassMatch to offer regex-matching capabilities.

ProxyPassReverse

Reverse-proxies handle responses and redirections exactly as if they were webservers on their own. To be activated, they are usually bound to a ProxyPass definition as in our use case here:

ProxyPass         /api  http://localhost:8080/api
ProxyPassReverse  /api  http://localhost:8080/api
Workers

Proxies manage the configuration of underlying servers and also the communication parameters between them with objects called workers (see them as a set of parameters). When used for a reverse-proxy, these workers are configured using ProxyPass or ProxyPassMatch:

ProxyPass /api http://localhost:8080/api connectiontimeout=5 timeout=30

Some examples of worker-parameters are: connectiontimeout (in seconds), keepalive (On/Off), loadfactor (from 1 to 100), route (bound to sessionid when used inside a load balancer), ping (it sends CPING requests to ajp13 connections to ensure Tomcat is not busy), min/max (number of connection pool entries to the underlying server), ttl (expiry time for connections to underlying server).

The mod_alias module

This module provides URL aliasing and client-request redirecting. We have used this module for redirecting (by default) the requests to cloudstreetmarket.com to the index page of the portal web application (cloudstreetmarket.com/portal/index).

Note that, in the same way ProxyPassMatch improves ProxyPass, RedirectMatch improves Redirect with regex-matching capability.

Tomcat connectors

A connector represents a process unit that: listens to a specific port to receive requests, forwards these requests to a specific engine, receives the dynamic content generated by the engine and finally sends back the generated content to the port. Several connectors can be defined in a Service component, sharing one single engine. One or more service(s) can be defined for one Tomcat instance (Server). There are two types of connectors in Tomcat.

HTTP connectors

This connector is setup by default in Tomcat on the 8080 port. It supports the HTTP1/1 protocol and allows Catalina to work as a standalone webserver. HTTP connectors can be used behind a proxy. Tomcat supports mod_proxy as a load balancer. This is our intended configuration. When implemented behind a proxy, the attributes proxyName and proxyPort can be set so the servlets bind the specified values to the request attributes request.getServerPort() and request.getServerName().

"This connector features the lowest latency and best overall performance."

The Tomcat documentation also states the following about HTTP proxying:

"It should be noted that the performance of HTTP proxying is usually lower than the performance of AJP."

However, configuring an AJP clustering adds an extra layer on the architecture. The necessity for this extra-layer is arguable for a stateless architecture.

AJP connectors

AJP connectors behave as HTTP connectors except that they support the AJP protocol instead of HTTP. Apache JServ Protocol (AJP) is an optimized binary version of HTTP connector.

It allows Apache HTTP to balance effectively requests among different Tomcats. It also allows Apache HTTP to serve the static content of web applications while Tomcat focuses on the dynamic content.

On the Apache HTTP side, this connector requires mod_proxy_ajp. Our configuration would probably have been:

ProxyPass / ajp://localhost:8009/api
ProxyPassReverse / http://cloudstreetmarket.com/api/

See also

Alternatives to Apache HTTP

The use of Apache HTTP can be argued on very high traffic, especially because the default configuration can lead the program to create a new process for every single connection.

If we only look for a proxy and load-balancer, we should also consider HAProxy. HAProxy is a high-availability load-balancer and proxy server. It is a free and open source (GPL v2) product used in references such as GitHub, StackOverflow, Reddit, Twitter, and others (http://haproxy.org).

Nginx is probably (and currently) the most adopted alternative to Apache HTTP. Being focused on high concurrency and low memory usage, its license is a 2-clause BSD license (http://nginx.org).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset