DNS

Let’s start with the basics and look at DNS. For small teams this is likely to be your best choice, particularly in a slowly changing infrastructure. That would include dedicated physical machines and dedicated, long-lived virtual machines. In these environments, IP addresses will remain stable enough for DNS to be useful.

Service Discovery with DNS

“Service discovery” usually implies some kind of automated query and response, but not in this case. When you use DNS to call another service, discovery is more Sherlock Holmes than Siri. Your team needs to find the service owners and pry the DNS name or names out of them. An exchange of favors may be required, maybe a six-pack of beer in the extreme. Once you’ve finished the human protocol, you just put the “host” name into a configuration file and forget about it.

When a client calls a service, the provider of that service may only have a single DNS name. That implies the provider is responsible for load balancing and high availability. If the provider has several names, then it’s up to the caller to balance among them.

When using DNS, it’s important to have a logical service name to call, rather than a physical hostname. Even if that logical name is just an alias to the underlying host, it’s still preferable. An alias only needs to be changed in one place (the name server’s database) rather than in every consuming application.

Load Balancing with DNS

DNS round-robin load balancing is one of the oldest techniques—dating back to the early days of the web. It operates at the application layer (layer 7) of the OSI stack; but instead of operating during a service request, it operates during address resolution.

DNS round-robin simply associates several IP addresses with the service name. So instead of finding a single IP address for “shipping.example.com,” a client would get one of several addresses. Each IP address points to a single server. The client therefore connects to one out of a pool of servers, as shown in the figure.

images/design_for_production/dns_round_robin.png

Although this serves the basic purpose of distributing work across a group of machines, it does poorly on other fronts. For one thing, all the instances in the pool must be directly “routable” from callers. They may sit behind a firewall, but their front-end IP addresses are visible and reachable from clients.

Second, the DNS round-robin approach suffers from putting too much control in the client’s hands. Since the client connects directly to one of the servers, there’s no opportunity to redirect that traffic if one particular instance is down. The DNS server has no information about the health of the instances, so it can keep vending out IP addresses for instances that are toast. Furthermore, doling out IP addresses in round-robin style does not guarantee that the load is distributed evenly, just the initial connections. Some clients consume more resources than others, leading to unbalanced workloads. Again, when one of the instances gets busy, the DNS server has no way to know, so it just keeps sending every eleventh connection (or whatever) to the staggering instance.

DNS round-robin load balancing is also inappropriate whenever the calling system is a long-running enterprise system. Anything using Java’s built-in classes will cache the first IP address it receives from DNS, guaranteeing that every future connection targets the same instance and completely defeating load balancing.

Global Server Load Balancing with DNS

DNS has enough limitations when it comes to load balancing across instances that it’s usually worth moving up the stack a bit. However, there’s one place where DNS excels: global server load balancing (GSLB).

GSLB tries to route clients across multiple geographic locations (see the figure that follows). This can be for physical data centers of your own or for multiple regions in a cloud infrastructure. We see this most in the context of external clients routing across the public Internet. Clients will get the best performance by routing to a nearby location—bearing in mind that “nearby” in network terms doesn’t always match physical geography the way you’d expect.

images/design_for_production/gslb.png

Each location has one or more pools of load-balanced instances for the service, as shown in the previous illustration. Each pool has an IP address that goes to the load balancer. (See Migratory Virtual IP Addresses, for load balancing with virtual IPs.) The job of GSLB is just to get the request to the virtual IP address for a particular pool. GSLB works via specialized DNS servers at each location. Where an ordinary DNS server just has a static database of names and addresses, a GSLB server keeps track of the health and responsiveness of the pools. It offers up the underlying address only if it passes health checks. If the pool is offline, or doesn’t have any healthy instance to serve the request, the GSLB server won’t even give out the IP address of the pool.

The second trick is that different GSLB servers may give back different IP addresses for the same request. This can be to balance across several local pools, or to provide the closest point of presence for the client. The following figure illustrates this process.

images/design_for_production/gslb_in_action.png
  1. First the caller queries DNS for the address related to “price.example.com.”

  2. Both GSLB servers might respond. Each one returns a different address for “price.example.com.” The European server returns 184.72.248.171, while the North American server returns 151.101.116.113.

  3. In this example, the client is in Europe, so it probably got the response with 184.72.248.171 first.

  4. The client now connects directly to 184.72.248.171, which is served by the load balancer. The load balancer directs traffic to the instances just as it normally would.

It’s important to keep in mind that this sequence operates at two different levels. At the global level, it’s based on DNS and clever schemes for deciding which IP address to offer. After name resolution, it’s out of the picture. The load balancer (sometimes called a “local traffic manager”) operates as a reverse proxy so the actual call and response pass through it.

This approach also requires that the caller can reach both the global traffic managers and the local traffic managers.

This scenario just illustrates the most basic use of GSLB. In practice, the global traffic managers can apply a ton of intelligence to the routing decision. For instance, the previous figure assumed that each GSLB server only knew about its local pools. In a real deployment, each would have all the pools configured but would prefer to send traffic nearby. That allows them to direct traffic to the more distant pool if that’s the only one available. They can also apply weighted distribution and a host of load-balancing algorithms. These can be used as part of a disaster recovery strategy or even part of a rolling deployment process.

Availability of DNS

DNS relies on servers that can answer queries. What happens when those servers themselves are unavailable? It doesn’t matter how great the service’s availability is when callers can’t find out how to reach it. DNS can become neglected because it’s part of the invisible infrastructure. But a DNS outage can have a massive impact.

The main emphasis for DNS servers should be diversity. Don’t host them on the same infrastructure as your production systems. Make sure you have more than one DNS provider with servers in different locations. Use a different DNS provider still for your public status page. Make sure there are no failure scenarios that leave you without at least one functioning DNS server.

Remember This

We covered a lot of ground in this section. It’s worth summarizing the uses and limitations of DNS.

  • Use DNS to call services when they don’t change often.

  • DNS round-robin offers a low-cost way to load-balance.

  • “Discovery” is a human process. DNS names are supplied in configuration.

  • DNS works well for global traffic management in coordination with local load balancers.

  • Diversity is crucial in DNS hosts. Don’t rely on the same infrastructure for DNS hosts and production services.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset