What you need for this course

As you've seen there are too many requirements to get started, so I've prepared a table that will give you an overview of what you'll need for each module of the course:

Module 1

Module 2

Module 3

Module 4

All the examples in this module rely on the Python 3 interpreter. Some of the examples in this module rely on third-party libraries that do not ship with Python. These are introduced within the module at the time they are used, so you do not need to install them in advance. However, for completeness, here is a list:

  • pip
  • requests
  • pillow
  • bitarray

While all the examples can be run interactively in a Python shell however, we recommend using IPython for this module. The version of libraries used in this module are:

  • NumPy 1.9.2
  • pandas 0.16.2
  • matplotlib 1.4.3
  • tables 3.2.2
  • pymongo 3.0.3
  • redis 2.10.3
  • scikit-learn 0.16.1

Any modern processor (from about 2010 onwards) and 4 GB of RAM will suffice, and you can probably run almost all of the code on a slower system too.

The exception here is with the final two chapters. In these chapters, I step through using Amazon Web Services (AWS) to run the code. This will probably cost you some money, but the advantage is less system setup than running the code locally.

If you don't want to pay for those services, the tools used can all be set up on a local computer, but you will definitely need a modern system to run it. A processor built in at least 2012 and with more than 4 GB of RAM is necessary.

Although the code examples will also be compatible with Python 2.7, it's better if you have the latest version of Python 3 (may be 3.4.3 or newer).

Installing Python

Python is a fantastic, versatile, and an easy-to-use language. It's available for all three major operating systems—Microsoft Windows, Mac OS X, and Linux—and the installer, as well as the documentation, can be downloaded from the official Python website: https://www.python.org.

Note

Windows users will need to set an environment variable in order to use Python from the command line. First, find where Python 3 is installed; the default location is C:Python34. Next, enter this command into the command line (cmd program): set the environment to PYTHONPATH=%PYTHONPATH%;C:Python34. Remember to change the C:Python34 if Python is installed into a different directory.

Once you have Python running on your system, you should be able to open a command prompt and run the following code:

$ python3
Python 3.4.0 (default, Apr 11 2014, 13:05:11)
[GCC 4.8.2] on Linux
Type "help", "copyright", "credits" or "license" for more information.
>>> print("Hello, world!")
Hello, world!
>>> exit()

Note that we will be using the dollar sign ($) to denote that a command is to be typed into the terminal (also called a shell or cmd on Windows). You do not need to type this character (or the space that follows it). Just type in the rest of the line and press Enter.

After you have the above "Hello, world!" example running, exit the program and move on to installing a more advanced environment to run Python code, the IPython Notebook.

Installing IPython

IPython is a platform for Python development that contains a number of tools and environments for running Python and has more features than the standard interpreter. It contains the powerful IPython Notebook, which allows you to write programs in a web browser. It also formats your code, shows output, and allows you to annotate your scripts. It is a great tool for exploring datasets.

To install IPython on your computer, you can type the following into a command-line prompt (not into Python):

$ pip install ipython[all]

You will need administrator privileges to install this system-wide. If you do not want to (or can't) make system-wide changes, you can install it for just the current user by running this command:

$ pip install --user ipython[all]

This will install the IPython package into a user-specific location—you will be able to use it, but nobody else on your computer can. If you are having difficulty with the installation, check the official documentation for more detailed installation instructions: http://ipython.org/install.html.

With the IPython Notebook installed, you can launch it with the following:

$ ipython3 notebook

This will do two things. First, it will create an IPython Notebook instance that will run in the command prompt you just used. Second, it will launch your web browser and connect to this instance, allowing you to create a new notebook. It will look something similar to the following screenshot (where home/bob will be replaced by your current working directory):

Installing IPython

To stop the IPython Notebook from running, open the command prompt that has the instance running (the one you used earlier to run the IPython command). Then, press Ctrl + C and you will be prompted Shutdown this notebook server (y/[n])?. Type y and press Enter and the IPython Notebook will shut down.

Installing additional packages

Python 3.4 will include a program called pip, which is a package manager that helps to install new libraries on your system. You can verify that pip is working on your system by running the $ pip3 freeze command, which tells you which packages you have installed on your system.

The additional packages can be installed via the pip installer program, which has been part of the Python standard library since Python 3.3. More information about pip can be found at https://docs.python.org/3/installing/index.html.

After we have successfully installed Python, we can execute pip from the command-line terminal to install additional Python packages:

pip install SomePackage

Already installed packages can be updated via the --upgrade flag:

pip install SomePackage --upgrade

A highly recommended alternative Python distribution for scientific computing is Anaconda by Continuum Analytics. Anaconda is a free—including commercial use—enterprise-ready Python distribution that bundles all the essential Python packages for data science, math, and engineering in one user-friendly cross-platform distribution. The Anaconda installer can be downloaded at http://continuum.io/downloads#py34, and an Anaconda quick start-guide is available at https://store.continuum.io/static/img/Anaconda-Quickstart.pdf.

After successfully installing Anaconda, we can install new Python packages using the following command:

conda install SomePackage

Existing packages can be updated using the following command:

conda update SomePackage

The major Python packages that were used for writing this course are listed here:

  • NumPy
  • SciPy
  • scikit-learn
  • matplotlib
  • pandas
  • tables
  • pymongo
  • redis

As these packages are all hosted on PyPI, the Python package index, they can be easily installed with pip. To install NumPy, you would run:

$ pip install numpy

To install scikit-learn, you would run:

$ pip3 install -U scikit-learn

Note

Important

Windows users may need to install the NumPy and SciPy libraries before installing scikit-learn. Installation instructions are available at www.scipy.org/install.html for those users.

Users of major Linux distributions such as Ubuntu or Red Hat may wish to install the official package from their package manager. Not all distributions have the latest versions of scikit-learn, so check the version before installing it.

Those wishing to install the latest version by compiling the source, or view more detailed installation instructions, can go to http://scikit-learn.org/stable/install.html to view the official documentation on installing scikit-learn.

Most libraries will have an attribute for the version, so if you already have a library installed, you can quickly check its version:

>>> import redis
>>> redis.__version__
'2.10.3'

This works well for most libraries. A few, such as pymongo, use a different attribute (pymongo uses just version, without the underscores).

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset