Chapter 1.  The Road to Performance

We welcome you on a journey to learning pragmatic ways to use the Scala programming language and the functional programming paradigm to write performant and efficient software. Functional programming concepts, such as pure and higher-order functions, referential transparency, and immutability, are desirable engineering qualities. They allow us to write composable elements, maintainable software, and expressive and easy-to-reason-about code. However, in spite of all its benefits, functional programming is too often wrongly associated with degraded performance and inefficient code. It is our goal to convince you otherwise! This book explores how to take advantage of functional programming, the features of the Scala language, the Scala standard library, and the Scala ecosystem to write performant software.

Scala is a statically and strongly typed language that tries to elegantly blend both functional and object-oriented paradigms. It has experienced growing popularity in the past few years as both an appealing and pragmatic choice to write production-ready software in the functional paradigm. Scala code compiles to bytecode and runs on the Java Virtual Machine (JVM), which has a widely-understood runtime, is configurable, and provides excellent tooling to introspect and debug correctness and performance issues. An added bonus is Scala's great interoperability with Java, which allows you to use all the existing Java libraries. While the Scala compiler and the JVM receive constant improvements and already generate well-optimized bytecode, the onus remains on you, the developer, to achieve your performance goals.

Before diving into the Scala and JVM specifics, let's first develop an intuition for the holy grail that we seek: performance. In this first chapter, we will cover performance basics that are agnostic to the programming language. We will present and explain the terms and concepts that are used throughout this book.

In particular, we will look at the following topics:

  • Defining performance
  • Summarizing performance
  • Collecting measurements

We will also introduce our case study, a fictitious application based on real-world problems that will help us illustrate techniques and patterns that are presented later.

Defining performance

A performance vocabulary arms you with a way to qualify the type of issues at-hand and often helps guide you towards a resolution. Particularly when time is of the essence, a strong intuition and a disciplined strategy are assets to resolve performance problems.

Let's begin by forming a common understanding of the term, performance. This term is used to qualitatively or quantitatively evaluate the ability to accomplish a goal. The goal at-hand can vary significantly. However, as a professional software developer, the goal ultimately links to a business goal. It is paramount to work with your business team to characterize business domain performance sensitivities. For a consumer-facing shopping website, agreeing upon the number of concurrent app users and acceptable request response times is relevant. In a financial trading company, trade latency might be the most important because speed is a competitive advantage. It is also relevant to keep in mind nonfunctional requirements, such as "trade executions can never be lost," because of industry regulations and external audits. These domain constraints will also impact your software's performance characteristics. Building a clear and agreed upon picture of the domain that you operate in is a crucial first step. If you cannot define these constraints, an acceptable solution cannot be delivered.

Note

Gathering requirements is an involved topic outside the scope of this book. If you are interested in delving deeper into this topic, we recommend two books by Gojko Adzic: Impact Mapping: Making a big impact with software products and projects (http://www.amazon.com/Impact-Mapping-software-products-projects-ebook/dp/B009KWDKVA) and Fifty Quick Ideas to Improve Your User Stories (http://www.amazon.com/Fifty-Quick-Ideas-Improve-Stories-ebook/dp/B00OGT2U7M).

Performant software

Designing performant software is one of our goals as software engineers. Thinking about this goal leads to a commonly asked question, "What performance is good enough?" We use the term performant to characterize performance that satisfies the minimally-accepted threshold for "good enough." We aim to meet and, if possible, exceed the minimum thresholds for acceptable performance. Consider this: without an agreed upon set of criteria for acceptable performance, it is by definition impossible to write performant software! This statement illustrates the overwhelming importance of defining the desired outcome as a prerequisite to writing performant software.

Note

Take a moment to reflect on the meaning of performant for your domain. Have you had struggles maintaining software that meets your definition of performant? Consider the strategies that you applied to solve performance dilemmas. Which ones were effective and which ones were ineffective? As you progress through the book, keep this in mind so that you can check which techniques can help you meet your definition of performant more effectively.

Hardware resources

In order to define criteria for performant software, we must expand the performance vocabulary. First, become aware of your environment's resources. We use the term resource to cover all the infrastructure that your software uses to run. Refer to the following resource checklist, which lists the resources that you should collect prior to engaging in any performance tuning exercise:

  • Hardware type: physical or virtualized
  • CPUs:
    • Number of cores
    • L1, L2, and L3 cache sizes
    • NUMA zones
  • RAM (for example, 16 GB)
  • Network connectivity rating (for example, 1GbE or 10GbE)
  • OS and kernel versions
  • Kernel settings (for example, TCP socket receive buffer size)
  • JVM version

Itemizing the resource checklist forces you to consider the capabilities and limitations of your operating environment.

Note

Excellent resources for kernel optimization include Red Hat Performance Tuning Guide (https://goo.gl/gDS5mY) and presentations and tutorials by Brendan Gregg (http://www.brendangregg.com/linuxperf.html).

Latency and throughput

Latency and throughput define two types of performance, which are often used to establish the criteria for performant software. The illustration of a highway, like the following photo of the German Autobahn, is a great way to develop an intuition of these types of performance:

Latency and throughput

The Autobahn helps us think about latency and throughput. (image wikimedia, https://en.wikipedia.org/wiki/Highway#/media/File:Blick_auf_A_2_bei_Rastst%C3%A4tte_Lehrter_See_(2009).jpg. License Creative Commons CC BY-SA 3.0)

Latency describes the amount of time that it takes for an observed process to be completed. Here, the process is a single car driving down one lane of the highway. If the highway is free of congestion, then the car is able to drive down the highway quickly. This is described as a low-latency process. If the highway is congested, the trip time increases, which is characterized as a high-latency or latent process. Performance optimizations that are within your control are also captured by this analogy. You can imagine that reworking an expensive algorithm from polynomial to linear execution time is similar to either improving the quality of the highway or the car's tires to reduce road friction. The reduction in friction allows the car to cross the highway with lower latency. In practice, latency performance objectives are often defined in terms of a maximum tolerable latency for your business domain.

Throughput defines the observed rate at which a process is completed. Using the highway analogy, the number of cars traveling from point A to point B per unit of time is the highway's throughput. For example, if there are three traffic lanes and cars travel in each lane at a uniform rate, then the throughput is: (the number of cars per lane that traveled from point A to point B during the observation period) * 3. Inductive reasoning may suggest that there is a strong negative correlation between throughput and latency. That is, as latency increases, throughput decreases. As it turns out, there are a number of cases where this type of reasoning does not hold true. Keep this in mind as we continue expanding our performance vocabulary to better understand why this happens. In practice, throughput is often defined by the maximum number of transactions per second your software can support. Here, a transaction means a unit of work in your domain (for example, orders processed or trades executed).

Note

Thinking back to the recent performance issues that you faced, how would you characterize them? Did you have a latency or a throughput problem? Did your solution increase throughput while lowering latency?

Bottlenecks

A bottleneck refers to the slowest part of the system. By definition, all systems, including well-tuned ones, have a bottleneck because there is always one processing step that is measured to be the slowest. Note that the latency bottleneck may not be the throughput bottleneck. That is, multiple types of bottleneck can exist at the same time. This is another illustration of why it is important to understand whether you are combating a throughput or a latency performance issue. Use the process of identifying your system's bottlenecks to provide you with a directed focus to attack your performance dilemmas.

From personal experience, we have seen how time is wasted when the operating environment checklist is ignored. Once, while working in the advertising domain on a high-throughput real-time bidding (RTB) platform, we chased a throughput issue for several days without success. After bootstrapping an RTB platform, we began optimizing for a higher request throughput goal because request throughput is a competitive advantage in our industry. Our business team identified an increase from 40,000 requests per second (RPS) to 75,000 RPS as a major milestone. Our tuning efforts consistently yielded about 60,000 RPS. This was a real head scratcher because the system did not appear to exhaust system resources. CPU utilization was well under 100%, and previous experiments to increase heap space did not yield improvements.

The "aha!" moment came when we realized that the system was deployed within AWS with the default network connectivity configured to 1 Gigabit Ethernet. The requests processed by the system are about 2KB per request. We performed some basic arithmetic to identify the theoretical maximum throughput rate. 1 Gigabit is equivalent to 125,000 kilobytes. 125,000 kilobytes / 2 kilobytes per request translates to a theoretical maximum of 62,500 RPS. This arithmetic was confirmed by running a test of our network throughput with a tool named iPerf. Sure enough, we had maxed out our network connectivity!

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset