Performance tuning

Performance tuning is a large and complex topic that in itself can be a whole course. We can only scratch the surface of it in this short section. Similar to monitoring in the last section, operating system-specific performance tuning techniques are beyond the scope of this book.

Java virtual machine

Based on the information given by the monitoring tools and the system log, we can discover opportunities for performance tuning. The first things we usually watch are the Java heap memory and garbage collection. JVM's configuration settings are controlled in the environment settings file for Cassandra, cassandra-env.sh, located in /etc/cassandra/. An example is shown in the following screenshot:

Java virtual machine

Basically, it already has the boilerplate options calculated to be optimized for the host system. It is also accompanied with explanation for us to tweak specific JVM parameters and the startup options of a Cassandra instance when we experience real issues; otherwise, these boilerplate options should not be altered.

Note

A detailed documentation on how to tune JVM for Cassandra can be found at http://www.datastax.com/documentation/cassandra/2.0/cassandra/operations/ops_tune_jvm_c.html.

Caching

Another area we should pay attention to is caching. Cassandra includes integrated caching and distributes cache data around the cluster. For a cache specific to a table, we will focus on the partition key cache and the row cache.

Partition key cache

The partition key cache, or key cache for short, is a cache of the partition index for a table. Using the key cache saves processor time and memory. However, enabling just the key cache makes the disk activity actually read the requested data rows.

Row cache

The row cache is similar to a traditional cache. When a row is accessed, the entire row is pulled into memory, merging from multiple SSTables when required, and cached. This prevents Cassandra from retrieving that row using disk I/O again, which can tremendously improve read performance.

When both row cache and partition key cache are configured, the row cache returns results whenever possible. In the event of a row cache miss, the partition key cache might still provide a hit that makes the disk seek much more efficient.

However, there is one caveat. Cassandra caches all the rows of a partition when reading that partition. So if the partition is large or only a small portion of the partition is read every time, the row cache might not be beneficial. It is very easy to be misused and consequently the JVM will be exhausted, causing Cassandra to fail. That is why the row cache is disabled by default.

Note

We usually enable either the key or row cache for a table, not both at the same time.

Monitoring cache

Either the nodetool info command or JMX MBeans can provide assistance in monitoring cache. We should make changes to cache options in small, incremental adjustments, and then monitor the effects of each change using the nodetool utility. The last two lines of output of the nodetool info command, as seen in the following figure, contain the Row Cache and Key Cache metrics of ubtc02:

Monitoring cache

In the event of high memory consumption, we can consider tuning data caches.

Enabling/disabling cache

We use the CQL to enable or disable caching by altering the cache property of a table. For instance, we use the ALTER TABLE statement to enable the row cache for watchlist:

ALTER TABLE watchlist WITH caching=''ROWS_ONLY'';

Other available table caching options include ALL, KEYS_ONLY and NONE. They are quite self-explanatory and we do not go through each of them here.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset