We are going to access the application using a local alias cloudstreetmarket.com
(on the port 80
) rather than the former localhost:8080
. Implementing the configuration for that is sometimes a mandatory step, when developing third-party integrations. In our case, the third-party will be Yahoo! and its OAuth2 authentication servers.
It will mostly be about configuration. We will install an Apache HTTP server and stick to the Apache Tomcat How-To. This will drive us to update our Tomcat connector and to create a virtual host in the Apache configuration file.
You will discover how this configuration can allow a great flexibility and simply serve web content to the customers with an advanced and scalable architecture.
tar.gz
archive) from the apache website:$ tar –xvzf httpd-NN.tar.gz $ cd httpd-NN
$ ./configure
$ make
$ make install
%SystemRoot%system32driversetchosts
127.0.0.1 cloudstreetmarket.com
/etc/hosts
127.0.0.1 cloudstreetmarket.com
httpd.conf
Apache configuration file.C:apache24conf
(on Windows) or at /usr/local/apache2/conf
(on Linux or Mac).LoadModule proxy_module modules/mod_proxy.so LoadModule proxy_http_module modules/mod_proxy_http.so
<VirtualHost cloudstreetmarket.com:80> ProxyPass /portal http://localhost:8080/portal ProxyPassReverse /portal http://localhost:8080/portal ProxyPass /api http://localhost:8080/api ProxyPassReverse /api http://localhost:8080/api RedirectMatch ^/$ /portal/index </VirtualHost>
server.xml
Tomcat configuration file.C: omcat8conf
(on Windows) or at /home/usr/{system.username}/tomcat8/conf
(on Linux or Mac).<Connector port"="8080"" protocol"="HTTP/1.1""... >
definition and edit it as follows:<Connector port"="8080"" protocol"="HTTP/1.1"" connectionTimeout"="20000" redirectPort"="8443"" proxyName"="cloudstreetmarket.com"" proxyPort"="80""/>
$ cd C:/apache24/bin
$ httpd.exe –k install
$ httpd.exe –k start
$ sudo apachectl start
Now start the Tomcat server and open your favorite web browser. Go to http://cloudstreetmarket.com
, you should obtain the following landing-page:
The Apache HTTP configuration we made here is somehow a standard nowadays. It supplies an infinite level of customization on a network. It also allows us to initiate the scalability.
Let's revisit how web browsers work. When we target a URL in the web browser, the final server is accessed from its IP, to establish a TCP connection on a specific port. The browser needs to resolve this IP for the specified name.
To do so, it queries a chain of Domain Name Servers (on the Internet, the chain often starts with the user's Internet Service Provider (ISP). Each DNS basically works this way:
A DNS managing one specific domain is called a Start Of Authority (SOA). Such DNS are usually provided by registrars, and we usually use their services to configure records (and our server IP) for a domain zone.
Around the web, each DNS tries to resolve the ultimate SOA. The top hierarchy of DNS servers is called root name
servers. There are hundreds of them bound to one specific Top-Level Domain (TLD such as .com, .net
, .org
…).
When the browser gets the IP, it tries to establish a TCP connection on the specified port (it defaults to 80). The remote server accepts the connection and the HTTP request is sent over the network.
As soon as we approach the production stage, we need the real domain name to be configured for DNS records, online, with a domain-name provider. There are different types of records to edit. Each one serves a specific purpose or resource type: host, canonical names, mail-exchanger, name server, and others. Specific guidance can usually be found on the domain name provider website.
Before contacting any kind of DNS, the operating system may be able to resolve the IP by itself. For this purpose, the host file is a plain-text registry
. Adding aliases to this registry defines proxies to whatever final server. Doing so is a common technique for development environments but isn't restricted to them.
Each line represents an IP address followed by one or more host names. Each field is separated by white space or tabs. Comments can be specified at the very beginning of a line with a #
character. Blank lines are ignored and IPs can be defined in IPv4 or IPv6.
This file is only for hosts aliasing, we don't deal with ports at this stage!
In this chapter, we will authenticate with an OAuth2 protocol. In OAuth, there is an Authentication Server (AS) and a Service Provider (SP). In our case, the authentication server will be a third-party system (Yahoo!) and the service provider will be our application (cloudstreetmarket.com
).
The OAuth2 authentication and authorization happen on the third-party side. As soon as these steps are completed, the authentication Server redirects the HTTP request to the service provider using a call-back URL passed as a parameter or stored as a variable.
Third-parties sometimes block call-back URLs that are pointing to localhost:8080
. Testing and developing OAuth2 conversations locally remains a necessity.
Configuring a proxy for the hostname (in the hosts file) and a virtual host in an HTTP server to manage ports, URL rewriting, and redirections is a good solution for the local environment but also for a production infrastructure.
The Apache HTTP server uses the TCP/IP protocol and provides an implementation of HTTP. TCP/IP allows computers to talk with each other throughout a network.
Each computer using TCP/IP on a network (Local Network Area or Wide Network Area) has an IP address. When a request arrives on an interface (an Ethernet connection for example), it is attempted to be mapped to a service on the machine (DNS, SMTP, HTTP, and so on) using the targeted port number. Apache usually uses the port 80 to listen to. This is a situation when Apache HTTP takes care of one site.
This feature allows us to run and maintain more than one website from a single instance of Apache. We usually group in a <VirtualHost...>
section, a set of Apache directives for a dedicated site. Each group is identified by a site ID.
Different sites can be defined as follows:
NameVirtualHost 192.168.0.1 <VirtualHost portal.cloudstreetmarket.com>…</VirtualHost> <VirtualHost api.cloudstreetmarket.com>…</VirtualHost>
ServerName
inside the block):<VirtualHost 192.168.0.1>…</VirtualHost> <VirtualHost 192.168.0.2>…</VirtualHost>
Listen 80 Listen 8080 <VirtualHost 192.168.0.1:80>…</VirtualHost> <VirtualHost 192.168.0.2:8080>…</VirtualHost>
Our current configuration with one machine and one Tomcat server is not the ideal scenario to demonstrate all the benefits of virtual hosting. However, we have delimited one site with its configuration. It's a first step towards scalability and load-balancing.
This Apache module offers proxy/gateway capabilities to Apache HTTP server. It's a central feature as it can turn an Apache instance into a unique interface able to manage a complex set of applications balanced across multiple machines on the network.
It pushes Apache beyond its initial purpose: exposing a directory on the filesystem via HTTP. It depends on five specific sub-modules: mod_proxy_http
, mod_proxy_ftp
, mod_proxy_ajp
, mod_proxy_balancer
, and mod_proxy_connect
. Each of them, when needed, requires the main mod_proxy
dependency. Proxies can be defined as forward (ProxyPass
) and/or as reverse (ProxyPassReverse
). They are often used to provide internet-access to servers located behind firewalls.
The ProxyPass
can be replaced with
ProxyPassMatch
to offer regex-matching capabilities.
Reverse-proxies handle responses and redirections exactly as if they were webservers on their own. To be activated, they are usually bound to a ProxyPass
definition as in our use case here:
ProxyPass /api http://localhost:8080/api ProxyPassReverse /api http://localhost:8080/api
Proxies manage the configuration of underlying servers and also the communication parameters between them with objects called workers (see them as a set of parameters). When used for a reverse-proxy, these workers are configured using ProxyPass
or ProxyPassMatch
:
ProxyPass /api http://localhost:8080/api connectiontimeout=5 timeout=30
Some examples of worker-parameters are: connectiontimeout
(in seconds), keepalive
(On/Off), loadfactor
(from 1 to 100), route
(bound to sessionid
when used inside a load balancer), ping
(it sends CPING requests to ajp13 connections to ensure Tomcat is not busy), min/max
(number of connection pool entries to the underlying server), ttl
(expiry time for connections to underlying server).
This module provides URL aliasing and client-request redirecting. We have used this module for redirecting (by default) the requests to cloudstreetmarket.com
to the index page of the portal web application (cloudstreetmarket.com/portal/index
).
Note that, in the same way ProxyPassMatch
improves ProxyPass
, RedirectMatch
improves Redirect
with regex-matching capability.
A connector represents a process unit that: listens to a specific port to receive requests, forwards these requests to a specific engine, receives the dynamic content generated by the engine and finally sends back the generated content to the port. Several connectors can be defined in a Service
component, sharing one single engine. One or more service(s) can be defined for one Tomcat instance (Server
). There are two types of connectors in Tomcat.
This connector is setup by default in Tomcat on the 8080 port. It supports the HTTP1/1 protocol and allows Catalina to work as a standalone webserver. HTTP connectors can be used behind a proxy. Tomcat supports mod_proxy
as a load balancer. This is our intended configuration. When implemented behind a proxy, the attributes proxyName
and proxyPort
can be set so the servlets bind the specified values to the request attributes request.getServerPort()
and request.getServerName()
.
"This connector features the lowest latency and best overall performance."
The Tomcat documentation also states the following about HTTP proxying:
"It should be noted that the performance of HTTP proxying is usually lower than the performance of AJP."
However, configuring an AJP clustering adds an extra layer on the architecture. The necessity for this extra-layer is arguable for a stateless architecture.
AJP connectors behave as HTTP connectors except that they support the AJP protocol instead of HTTP. Apache JServ Protocol (AJP) is an optimized binary version of HTTP connector.
It allows Apache HTTP to balance effectively requests among different Tomcats. It also allows Apache HTTP to serve the static content of web applications while Tomcat focuses on the dynamic content.
On the Apache HTTP side, this connector requires mod_proxy_ajp
. Our configuration would probably have been:
ProxyPass / ajp://localhost:8009/api ProxyPassReverse / http://cloudstreetmarket.com/api/
In this section, we will provide a few links for a deeper understanding on the topics:
http://tomcat.apache.org/tomcat-8.0-doc/connectors.html
http://tomcat.apache.org/tomcat-8.0-doc/proxy-howto.html#Apache_2.0_Proxy_Support
http://www.richardnichols.net/2010/08/5-minute-guide-clustering-apache-tomcat/
The use of Apache HTTP can be argued on very high traffic, especially because the default configuration can lead the program to create a new process for every single connection.
If we only look for a proxy and load-balancer, we should also consider HAProxy. HAProxy is a high-availability load-balancer and proxy server. It is a free and open source (GPL v2) product used in references such as GitHub, StackOverflow, Reddit, Twitter, and others (http://haproxy.org).
Nginx is probably (and currently) the most adopted alternative to Apache HTTP. Being focused on high concurrency and low memory usage, its license is a 2-clause BSD license (http://nginx.org).