System design

In this section, we will select the appropriate software for various system components.

The operating system

When considering the implementation, the first fundamental choice is the operating system. The single most important constraint is that it must be supported by Cassandra. For this book, I have selected Ubuntu 14.04 LTS 64-bit Version, which can be obtained at the official Ubuntu website, http://www.ubuntu.com/. You should be able to painlessly set up your Linux box by following the verbose installation instructions.

However, it is entirely up to you to use any other operating systems, supported by Cassandra, such as Microsoft Windows and Mac OS X. Please follow the respective operating system installation instructions to set up your machine. I have already considered the portability of the Stock Screener. As you will see in the subsequent sections, the Stock Screener Application is designed and developed in order to be compatible with a great number of operating systems.

Java Runtime Environment

As Cassandra is Java-based, a Java Runtime Environment (JRE) is required as a prerequisite. I have used Oracle Java SE Runtime Environment 7 64-bit Version 1.7.0_65. It is provided at the following URL:http://www.oracle.com/technetwork/java/javase/downloads/jre7-downloads-1880261.html.

Of course, I have downloaded the Linux x64 binary and followed the instructions at http://www.datastax.com/documentation/cassandra/2.0/cassandra/install/installJreDeb.html to properly set up the JRE.

At the time of writing, Java SE has been updated to Version 8. However, I have not tested JRE 8 and DataStax recommends JRE 7 for Cassandra 2.0 too. Therefore, I will stick to JRE 7 in this book.

Java Native Access

If you want to deploy Cassandra in production use on Linux platforms, Java Native Access (JNA) is required to improve Cassandra's memory usage. When installed and configured, Linux does not swap the Java virtual machine (JVM), and thus avoids any performance related issues. This is recommended as a best practice even when Cassandra, which is to be installed, is for non-production use.

To install JNA on Ubuntu, simply use Aptitude Package Manager with the following command in a terminal:

$ sudo apt-get install libjna-java

Cassandra version

I used Cassandra Version 2.0.9, which is distributed by DataStax Community, on Debian or Ubuntu. The installation steps are well documented at http://www.datastax.com/documentation/getting_started/doc/getting_started/gettingStartedDeb_t.html.

The installation process typically takes several minutes depending on your Internet bandwidth and the performance of your machine.

Note

DataStax

DataStax is a computer software company based in Santa Clara, California which offers commercial enterprise grade for Apache Cassandra in its DataStax Enterprise product. It also provides tremendous support for the Apache Cassandra community.

Programming language

It is now time to turn our attention to the programming language for the implementation of the Stock Screener Application. For this book, I have chosen Python. Python is a high-level programming language designed for speed of development. It is open source, free, and cross-platform. It possesses a wealthy set of libraries for almost every popular algorithm you can imagine.

You need not be afraid of learning Python if you are not familiar with it. Python is designed such that it is very easy to learn when compared to other programming languages such as C++. Coding a Python program is pretty much like writing pseudocode that improves the speed of development.

In addition, there are many renowned Python libraries used for data analysis, for example, NumPy, SciPy, pandas, scikit-learn, and matplotlib. You can make use of them to quickly build a full-blown application with all the bells and whistles. For the Stock Screener Application, you will use NumPy and pandas extensively.

When it comes to high performance, Python can also utilize Cython, which is an optimizing static complier for Python programs to run as fast as native C or C++ programs.

The latest major version of Python is Python 3. However, there are still many programs running that are written in Python 2. This is caused by the breaking backward compatibility of Python 3 that makes the migration of so many libraries written in Python 2 to Python 3, a very long way to go. Hence, the coexistence of Python 2 and Python 3 is expected for quite a long time in future. For this book, Python 2.7.x is used.

The following steps are used to install Python 2.7 in Ubuntu using a terminal:

$ sudo apt-get –y update
$ sudo apt-get –y upgrade
$ sudo apt-get install python-pip python-dev 
$ python2.7-dev build-essential

Once the installation is complete, type the following command:

$ python --version

You should see the version string returned by Python, which tells you that the installation has been successful.

One problem that many Python beginners face is the cumbersome installation of the various library packages. To rectify this problem, I suggest that the reader downloads the Anaconda distribution. Anaconda is completely free and includes almost 200 of the most popular Python packages for Science, Mathematics, engineering, and data analysis. Although it is rather bulky in size, it frees you from the Python package hustle. Anaconda can be downloaded at http://continuum.io/downloads, where you can select the appropriate versions of Python and the operating system. It is straightforward to install Anaconda by following the installation instructions, so I will not detail the steps here.

Cassandra driver

The last item of the system environment is the driver software for Python to connect to a Cassandra database. In fact, there are several choices out there, for example, pycassa, Cassandra driver, and Thrift. I have chosen Python Driver 2.0 for Apache Cassandra distributed by DataStax. It exclusively supports CQL 3 and Cassandra's new binary protocol, which was introduced in Version 1.2. More detailed information can be found at http://www.datastax.com/documentation/developer/python-driver/2.0/common/drivers/introduction/introArchOverview_c.html.

The driver can be easily installed with pip in a Ubuntu terminal:

$ pip install cassandra-driver

Note

pip

pip is a command-line package management system used to install and manage Python library packages. Its project page can be found at Github, https://github.com/pypa/pip.

The integrated development environment

Spyder is an open source, cross-platform integrated development environment (IDE), usually used for scientific programming in Python. It is automatically installed by Anaconda and integrates NumPy, SciPy, matplotlib, IPython, and other open source software. It is also my favorite Python development environment.

There are many other good and popular Python IDEs, such as IPython and Eclipse. The code in this book is friendly to these IDEs.

The system overview

Alright, we have gone through the major system components of the Stock Screener Application and decided their implementation. The following figure depicts the system overview for the implementation of the application:

The system overview

It is worth noting that the system will be developed on a single Ubuntu machine first and then on a single node Cassandra cluster (In Chapter 7, Deployment and Monitoring, we will expand the cluster to a two-node cluster). It serves as a limit to the superb clustering capabilities of Cassandra. However, from the software development perspective, the most important thing is to completely realize the required functionalities rather than splitting the significant efforts on the system or infrastructure components, which are of second priority.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset