Anaconda is a widely used free Python distribution maintained by Continuum (https://www.continuum.io/). We will use the prevailing software stack provided by Anaconda to generate our apps. In this book, we will use PySpark and the PyData ecosystem. The PyData ecosystem is promoted, supported, and maintained by Continuum and powered by the Anaconda Python distribution. The Anaconda Python distribution essentially saves time and aggravation in the installation of the Python environment; we will use it in conjunction with Spark. Anaconda has its own package management that supplements the traditional pip
install
and easy-install
. Anaconda comes with batteries included, namely some of the most important packages such as Pandas, Scikit-Learn, Blaze, Matplotlib, and Bokeh. An upgrade to any of the installed library is a simple command at the console:
$ conda update
A list of installed libraries in our environment can be obtained with command:
$ conda list
The key components of the stack are as follows:
The following figure shows the components of the Anaconda stack: