Useful websites

Here are some of the useful websites for us to get up-to-date Cassandra information.

Apache Cassandra official site

The official Cassandra website, http://cassandra.apache.org/, is always the first place to go for any information. The latest released version information can be found on its home page. You might get the source code there if you want to dive deep into the heart of Cassandra or if you want to install a Cassandra instance back to square one by building from the source code.

Just akin to other projects under the Apache Software Foundation, you are welcome to contribute to the community. You can also find out how to join this enthusiastic team of developers in order to improve such a great NoSQL database.

You can also find a link to another website called PlanetCassandra, which is worth a separate introduction.

PlanetCassandra

PlanetCassandra, http://planetcassandra.org/, is a community service website supported by DataStax, a commercial company, that provides production-ready Apache Cassandra products and services:

PlanetCassandra

This website deals more with the collaboration aspects of the Cassandra community. We can look for meetups, involvements, webinars, conferences and events, and even educational training courses there. The most valuable section of the website is the Apache Cassandra Use Cases that is a repository of the companies who run their applications on Apache Cassandra and enjoy the real benefits from it.

The repository is categorized by several dimensions, namely, Product Catalog/Playlist, Recommendation/Personalization, Fraud Detection, Messaging, IOT/Sensor Data, and Undefined. Each entry of the repository has a name and a brief introduction of the company of the use case, and how it uses Cassandra to drive the business. You certainly can learn and generate some ideas by learning from the use cases.

A must-read is the Netflix case study. The use case is a personalization system that understands each person's unique habits and preferences and bring to light products and items that a user might be unaware of and not looking for. The challenges were to acquire affordable capacity in order to store and process immense amounts of data, to address a single point of failure with Oracle's legacy relational architecture, and to achieve business agility for international expansion. Netflix used a commercial version of Cassandra that delivers 100 percent uptime and cost-effective scale across multiple data centers. The results are stunning, which are as follows:

  • First, the throughput of the system is more than 10 million transactions per second
  • Second, the creation and management of the new data clusters across various regions is nearly effortless
  • Lastly, customer viewing and log data can be captured in the finest detail in Cassandra

It is highly recommended that you read this, especially for those of you who are considering to migrate from a relational database to Cassandra.

DataStax

The Cassandra version used in this book is an open source one that can be obtained freely on the Internet. It is good enough for most systems. However, many companies still look for enterprise grade products built on Cassandra and the related support, training, and consultancy services. DataStax, http://www.datastax.com/, is one of them.

DataStax serves to compile the most comprehensive Cassandra documentation, as shown in the following screenshot. The documentation is freely available on its website. It also develops and provides support to the client drivers for Java, C#, Python, and so on:

DataStax

DataStax offers an enterprise version of Apache Cassandra, known as DataStax Enterprise, with enhanced features such as advanced security and management tools that simplify the day-to-day system management of a Cassandra cluster.

DataStax Enterprise includes a powerful enterprise system management tool, OpsCenter, to allow administrators to easily grasp the status and performance of the system through a dashboard. It monitors the cluster and triggers alerts or notifications of changes in the cluster. Backup and restore operations are greatly streamlined as well.

DataStax Enterprise also extends Cassandra to support Apache Hadoop and Solr, as an integrated enterprise platform.

Hadoop integration

Cassandra integrated with Hadoop can be a powerful platform for Big Data Analytics. Cassandra has been able to directly integrate with Hadoop since its Version 0.6. It began with MapReduce support. Since then, the support has matured significantly and now includes native support for Pig and Hive. Cassandra's Hadoop support implements the same interface as Hadoop Distributed File System (HDFS) in order to achieve input data locality.

Cassandra provides the ColumnFamilyInputFormat and ColumnFamilyOutputFormat classes for direct integration with Hadoop from MapReduce programs. It involves data being read directly from Cassandra column families in MapReduce mappers and does include data movement.

Setup and configuration involves overlaying a Hadoop cluster on Cassandra nodes, configuring a separate server for the Hadoop JobTracker, and installing a Hadoop TaskTracker and DataNode on each Cassandra node.

Note

Setup and configuration procedures

The detailed procedures of integrating Cassandra with Hadoop can be found at:

The nodes in the Cassandra data center can draw from data in the HDFS DataNode as well as from Cassandra. The JobTracker receives the MapReduce input from the client application. It then sends a MapReduce job request to the TaskTrackers and optional clients, for example, MapReduce and Pig. The data is written to Cassandra and the results are sent back to the client.

DataStax has also created a simple way to use Hadoop with Cassandra and built it into the enterprise version.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset