Chapter 2

Installing a Python Distribution

IN THIS CHAPTER

check Determining which Python distribution to use for machine learning

check Performing a Linux, Mac OS X, and Windows installation

check Obtaining the data sets and example code

“For many people my software is something you install and forget. I like to keep it that way.”

— WIETSE VENEMA

Before you can do too much with Python or use it to solve machine learning problems, you need a workable installation. In addition, you need access to the data sets and code used for this book. Downloading the sample code (found at www.dummies.com/go/codingaiodownloads) and installing it on your system is the best way to get a good learning experience from the book. This chapter helps you get your system set up so that you can easily follow the examples in the remainder of the book.

remember Using the downloadable source code doesn’t prevent you from typing the examples on your own, following them using a debugger, expanding them, or working with the code in all sorts of ways. The downloadable source code is there to help you get a good start with your machine learning and Python learning experience. After you see how the code works when it’s correctly typed and configured, you can try to create the examples on your own. If you make a mistake, you can compare what you’ve typed with the downloadable source code and discover precisely where the error exists. You can find the downloadable source for this chapter in the ML4D; 06; Sample.ipynb and ML4D; 06; Dataset Load.ipynb files.

Choosing a Python Distribution with Machine Learning in Mind

It’s entirely possible to obtain a generic copy of Python and add all the required machine learning libraries to it. The process can be difficult because you need to ensure that you have all the required libraries in the correct versions to guarantee success. In addition, you need to perform the configuration required to make sure that the libraries are accessible when you need them. Fortunately, going through the required work is not necessary because a number of Python machine learning products are available for you to use. These products provide everything needed to get started with machine learning projects.

remember You can use any of the packages mentioned in the following sections to work with the examples in this book. However, the book’s source code and downloadable source code rely on Continuum Analytics Anaconda because this particular package works on every platform this book supports: Linux, Mac OS X, and Windows. The book doesn’t mention a specific package in the chapters that follow, but all screenshots reflect how things look when using Anaconda on Windows. You may need to tweak the code to use another package, and the screens will look different if you use Anaconda on another platform.

warning Windows 10 presents some serious installation issues when working with Python. Windows 10 doesn’t provide a great environment for Python, because the automatic upgrades mean your system is always changing. If you’re working with Windows 10, simply be aware that your road to a Python installation will be a rocky one. If you run into problems, try installing Python 3.x and run the program from the command line instead of from the Start menu.

Getting Continuum Analytics Anaconda

The basic Anaconda package is a free download that you obtain at www.continuum.io/downloads. Simply click Download Anaconda to obtain access to the free product. You do need to provide an email address to get a copy of Anaconda. After you provide your email address, you go to another page, where you can choose your platform and the installer for that platform. Anaconda supports the following platforms:

  • Windows 32-bit and 64-bit

    technicalstuff The installer may offer you only the 64-bit or 32-bit version, depending on which version of Windows it detects.

  • Linux 32-bit and 64-bit
  • Mac OS X 64-bit

The code written for this book requires Anaconda 2.1.0 using Python 2.7, which you can download at https://repo.continuum.io/archive (refer to the “Using Python 2.7.x for this book” sidebar for details). You can also choose to install Python 3.5 by clicking one of the links where the filename begins with Anaconda3. Both Windows and Mac OS X provide graphical installers. When using Linux, you rely on the bash utility.

warning The code exercises and commands in this book will not work as-is if you use the latest version of Anaconda, as a new version is expected to be released during the book’s publication with a different command syntax. Make sure you download and use Anaconda 2.1.0.

The Miniconda installer can potentially save time by limiting the number of features you install. However, trying to figure out precisely which packages you do need is an error-prone and time-consuming process. In general, you want to perform a full installation to ensure that you have everything needed for your projects. Even a full install doesn’t require much time or effort to download and install on most systems.

The free product is all you need for this book. However, when you look at the site, you see that many other add-on products are available. These products can help you create robust applications. For example, when you add Accelerate to the mix, you obtain the capability to perform multicore and GPU-enabled operations. The use of these add-on products is outside the scope of this book, but the Anaconda site provides details on using them.

tip The following Python software packages are alternatives to the basic Anaconda package, but you do not need to install all or even any of these software programs.

Getting Enthought Canopy Express

Enthought Canopy Express is a free product for producing both technical and scientific applications using Python. You can obtain it at www.enthought.com/canopy-express. Click Download Free on the main page to see a listing of the versions that you can download. Only Canopy Express is free; the full Canopy product comes at a cost. However, you can use Canopy Express to work with the examples in this book. Canopy Express supports the following platforms:

  • Windows 32-bit and 64-bit
  • Linux 32-bit and 64-bit
  • Mac OS X 32-bit and 64-bit

Choose the platform and version you want to download. When you click Download Canopy Express, you see an optional form for providing information about yourself. The download starts automatically, even if you don’t provide personal information to the company.

One of the advantages of Canopy Express is that Using Python 2.7.x for this book is heavily involved in providing support for both students and teachers. People also can take classes, including online classes, that teach the use of Canopy Express in various ways (see https://training.enthought.com/courses).

Getting Python(x,y)

The Python(x,y) Integrated Development Environment (IDE) is a community project hosted on Google at http://python-xy.github.io. It’s a Windows-only product, so you can’t easily use it for cross-platform needs. (In fact, it supports only Windows Vista, Windows 7, and Windows 8.) However, it does come with a full set of libraries, and you can easily use it for this book if you want.

Because Python(x,y) uses the GNU General Public License (GPL) v3 (see www.gnu.org/licenses/gpl.html), you have no add-ons, training, or other paid features to worry about. No one will come calling at your door hoping to sell you something. In addition, you have access to all the source code for Python(x,y), so you can make modifications if you want.

Getting WinPython

The name tells you that WinPython is a Windows-only product that you can find at winpython.github.io. This product is actually a spin-off of Python(x,y) and isn’t meant to replace it. Quite the contrary: WinPython is simply a more flexible way to work with Python(x,y). You can read about the motivation for creating WinPython at https://sourceforge.net/p/winpython/wiki/Roadmap.

The bottom line for this product is that you gain flexibility at the cost of friendliness and a little platform integration. However, for developers who need to maintain multiple versions of an IDE, WinPython may make a significant difference. When using WinPython with this book, make sure to pay particular attention to configuration issues, or you’ll find that even the downloadable code has little chance of working.

Installing Python on Linux

You use the command line to install Anaconda on Linux — there is no graphical installation option. Before you can perform the install, you must download a copy of the Linux software from the Continuum Analytics site. You can find the required download information in the “Getting Continuum Analytics Anaconda” section, earlier in this chapter. The following procedure should work fine on any Linux system, whether you use the 32-bit or 64-bit version of Anaconda:

  1. Open a copy of Terminal.

    The Terminal window appears.

  2. Change directories to the downloaded copy of Anaconda on your system.

    The name of this file varies, but normally it appears as Anaconda-2.1.0-Linux-x86.sh for 32-bit systems and Anaconda-2.1.0-Linux-x86_64.sh for 64-bit systems.

    tip The version number is embedded as part of the filename. In this case, the filename refers to version 2.1.0, which is the version used for this book. If you use some other version, you may experience problems with the source code and need to make adjustments when working with it.

  3. Type bash Anaconda-2.1.0-Linux-x86.sh (for the 32-bit version) or Anaconda-2.1.0-Linux-x86_64.sh (for the 64-bit version) and press Enter.

    An installation wizard starts that asks you to accept the licensing terms for using Anaconda.

  4. Read the licensing agreement and accept the terms using the method required for your version of Linux.

    The wizard asks you to provide an installation location for Anaconda. The book assumes that you use the default location of ~/anaconda. If you choose some other location, you may have to modify some procedures later in the book to work with your setup.

  5. Provide an installation location (if necessary) and press Enter (or click Next).

    The application extraction process begins. After the extraction is complete, you see a completion message.

  6. Add the installation path to your PATH statement using the method required for your version of Linux.

    You’re ready to begin using Anaconda.

Installing Python on Mac OS X

The Mac OS X installation comes in only one form: 64-bit.

technicalstuff Before you can perform the install, you must download a copy of the Mac software from the Continuum Analytics site. You can find the required download information in the “Getting Continuum Analytics Anaconda” section, earlier in this chapter.

The following steps help you install Anaconda 64-bit on a Mac system:

  1. Locate the downloaded copy of Anaconda on your system.

    The name of this file varies, but normally it appears as Anaconda-2.1.0-MacOSX-x86_64.pkg. The version number is embedded as part of the filename. In this case, the filename refers to version 2.1.0, which is the version used for this book. If you use some other version, you may experience problems with the source code and need to make adjustments when working with it.

  2. Double-click the installation file.

    An introduction dialog box appears.

  3. Click Continue.

    The wizard asks whether you want to review the Read Me materials.

    tip You can read these materials later. For now, you can safely skip the information.

  4. Click Continue.

    The wizard displays a licensing agreement.

    tip Be sure to read through the licensing agreement so that you know the terms of usage.

  5. Click I Agree if you agree to the licensing agreement.

    The wizard asks you to provide a destination for the installation. The destination controls whether the installation is for an individual user or a group.

    warning You may see an error message stating that you can’t install Anaconda on the system. The error message occurs because of a bug in the installer and has nothing to do with your system. To get rid of the error message, choose the Install Only for Me option. You can’t install Anaconda for a group of users on a Mac system.

  6. Click Continue.

    The installer displays a dialog box containing options for changing the installation type. Click Change Install Location if you want to modify where Anaconda is installed on your system. (The book assumes that you use the default path of ~/anaconda.) Click Customize if you want to modify how the installer works. For example, you can choose not to add Anaconda to your PATH statement. However, the book assumes that you have chosen the default install options, and no good reason exists to change them unless you have another copy of Python 2.7 installed somewhere else.

  7. Click Install.

    The installation begins. A progress bar tells you how the installation process is progressing. When the installation is complete, you see a completion dialog box.

  8. Click Continue.

    You’re ready to begin using Anaconda.

tip Continuum also provides a command-line version of the Mac OS X installation. This file has a filename of Anaconda-2.1.0-MacOSX-x86_64.sh, and you use the bash utility to install it in the same way that you do on any Linux system. However, installing Anaconda from the command line gives you no advantage unless you need to perform it as part of an automated setup. Using the GUI version, as described in this section, is much easier.

Installing Python on Windows

Anaconda comes with a graphical installation application for Windows, so getting a good install means using a wizard, as you would for any other installation. Of course, you need a copy of the installation file before you begin, and you can find the required download information in the “Getting Continuum Analytics Anaconda” section, earlier in this chapter.

The following procedure should work fine on any Windows system, whether you use the 32-bit or the 64-bit version of Anaconda:

  1. Locate the downloaded copy of Anaconda on your system.

    The name of this file varies, but normally it appears as Anaconda-2.1.0-Windows-x86.exe for 32-bit systems and Anaconda-2.1.0-Windows-x86_64.exe for 64-bit systems. The version number is embedded as part of the filename. In this case, the filename refers to version 2.1.0, which is the version used for this book. If you use some other version, you may experience problems with the source code and need to make adjustments when working with it.

  2. Double-click the installation file.

    technicalstuff You may see an Open File – Security Warning dialog box that asks whether you want to run this file. Click Run if this dialog box pops up.

    You see an Anaconda 2.1.0 Setup dialog box similar to the one shown in Figure 2-1. The exact dialog box that you see depends on which version of the Anaconda installation program you download. If you have a 64-bit operating system, using the 64-bit version of Anaconda is always best so that you obtain the best possible performance. This first dialog box tells you when you have the 64-bit version of the product.

  3. Click Next.

    The wizard displays a licensing agreement. Be sure to read through the licensing agreement so that you know the terms of usage.

  4. Click I Agree if you agree to the licensing agreement.

    You’re asked what sort of installation type to perform, as shown in Figure 2-2.

    tip In most cases, you want to install the product just for yourself. The exception is if you have multiple people using your system and they all need access to Anaconda.

  5. Choose one of the installation types and then click Next.

    The wizard asks where to install Anaconda on disk, as shown in Figure 2-3.

    technicalstuff The book assumes that you use the default location. If you choose some other location, you may have to modify some procedures later in the book to work with your setup.

  6. Choose an installation location (if necessary) and then click Next.

    You see the Advanced Installation Options, shown in Figure 2-4.

    tip These options are selected by default, and no good reason exists to change them in most cases. You might need to change them if Anaconda won’t provide your default Python 2.7 (or Python 3.5) setup. However, the book assumes that you’ve set up Anaconda using the default options.

  7. Change the advanced installation options (if necessary) and then click Install.

    You see an Installing dialog box with a progress bar.

    tip The installation process can take a few minutes, so get yourself a cup of coffee and read the comics for a while.

    When the installation process is over, a Next button is enabled.

  8. Click Next.

    The wizard tells you that the installation is complete.

  9. Click Finish.

    You’re ready to begin using Anaconda.

image

FIGURE 2-1: The setup process begins by telling you whether you have the 64-bit version.

image

FIGURE 2-2: Tell the wizard how to install Anaconda on your system.

image

FIGURE 2-3: Specify an installation location.

image

FIGURE 2-4: Configure the advanced installation options.

Downloading the Data Sets and Example Code

This book is about using Python to perform machine learning tasks. Of course, you can spend all your time creating the example code from scratch, debugging it, and only then discovering how it relates to machine learning, or you can take the easy way and download the prewritten code at www.dummies.com/go/codingaiodownloads so that you can get right to work. Likewise, creating data sets large enough for machine learning purposes would take quite a while. Fortunately, you can access standardized, previously created data sets quite easily using features provided in some of the data science libraries (which also work just fine for machine learning). The following sections help you download and use the example code and data sets so that you can save time and get right to work with data science-specific tasks.

Using Jupyter Notebook

To make working with the relatively complex code in this book easier, you can use Jupyter Notebook. This interface lets you easily create Python notebook files that can contain any number of examples, each of which can run individually. The program runs in your browser, so which platform you use for development doesn’t matter; as long as it has a browser, you should be okay.

Starting Jupyter Notebook

Most platforms provide an icon to access Jupyter Notebook. Just open this icon to access Jupyter Notebook. For example, on a Windows system, you choose Start ⇒  All Programs ⇒  Anaconda⇒  Jupyter Notebook. Figure 2-5 shows how the interface looks when viewed in a Firefox browser. The precise appearance on your system depends on the browser you use and the kind of platform you have installed.

image

FIGURE 2-5: Jupyter Notebook provides an easy method to create machine learning examples.

If you have a platform that doesn’t offer easy access through an icon, you can use these steps to access Jupyter Notebook:

  1. Open a Command Prompt or Terminal Window on your system.

    The window opens so that you can type commands.

  2. Change directories to the Anaconda2Scripts directory on your machine.

    Most systems let you use the CD command for this task.

  3. Type ..python ipython2-script.py notebook and press Enter.

    The Jupyter Notebook page opens in your browser.

Stopping the Jupyter Notebook server

No matter how you start Jupyter Notebook (or just Notebook, as it appears in the remainder of the book), the system generally opens a command prompt or terminal window to host Jupyter Notebook. This window contains a server that makes the application work. After you close the browser window when a session is complete, select the server window and press Ctrl+C or Ctrl+Break to stop the server.

Defining the code repository

The code you create and use in this book will reside in a repository on your hard drive. Think of a repository as a kind of filing cabinet where you put your code. Notebook opens a drawer, takes out the folder, and shows the code to you. You can modify it, run individual examples within the folder, add new examples, and simply interact with your code in a natural manner. The following sections get you started with Notebook so that you can see how this whole repository concept works.

Defining the book’s folder

It pays to organize your files so that you can access them easier later. Book 6 keeps its files in the ML4D folder, which stands for Machine Learning for Dummies. Use these steps within Notebook to create a new folder:

  1. Choose New ⇒  Folder.

    Notebook creates a new folder named Untitled Folder, as shown in Figure 2-6. The file will appear in alphanumeric order, so you may not initially see it. You must scroll down to the correct location.

  2. Check the box next to the Untitled Folder entry.
  3. Click Rename at the top of the page.

    You see a Rename Directory dialog box like the one shown in Figure 2-7.

  4. Type ML4D and click OK.

    Notebook changes the name of the folder for you.

  5. Click the new ML4D entry in the list.

    Notebook changes the location to the ML4D folder where you perform tasks related to the exercises in this book.

image

FIGURE 2-6: New folders will appear with a name of Untitled Folder.

image

FIGURE 2-7: Rename the folder so that you remember the kinds of entries it contains.

Creating a new notebook

Every new notebook is like a file folder. You can place individual examples within the file folder, just as you would sheets of paper into a physical file folder. Each example appears in a cell.

Use these steps to create a new notebook:

  1. Click New ⇒  Python 2.

    A new tab opens in the browser with the new notebook, as shown in Figure 2-8. Notice that the notebook contains a cell and that Notebook has highlighted the cell so that you can begin typing code in it. The title of the notebook is Untitled right now. That’s not a particularly helpful title, so you need to change it.

  2. Click Untitled on the page.

    Notebook asks what you want to use as a new name, as shown in Figure 2-9.

  3. Type ML4D; 06; Sample and press Enter.
image

FIGURE 2-8: A notebook contains cells that you use to hold code.

image

FIGURE 2-9: Provide a new name for your notebook.

Of course, the Sample notebook doesn’t contain anything just yet. Place the cursor in the cell, type print ‘Python is really cool!', and then click the Run button (the button with the right-pointing arrow on the toolbar). You see the output shown in Figure 2-10. The output is part of the same cell as the code. However, Notebook visually separates the output from the code so that you can tell them apart. Notebook automatically creates a new cell for you.

image

FIGURE 2-10: Notebook uses cells to store your code.

When you finish working with a notebook, shutting it down is important. To close a notebook, choose File ⇒  Close and Halt. You return to the home page, where you can see the notebook you just created added to the list, as shown in Figure 2-11.

image

FIGURE 2-11: Any notebooks you create appear in the repository list.

Exporting a notebook

Creating notebooks and keeping them all to yourself isn’t much fun. At some point, you want to share them with other people. To perform this task, you must export your notebook from the repository to a file. You can then send the file to someone else, who will import it into his or her repository.

The previous section shows how to create a notebook named ML4D; 06; Sample. You can open this notebook by clicking its entry in the repository list. The file reopens so that you can see your code again. To export this code, choose File ⇒  Download As ⇒  IPython Notebook. What you see next depends on your browser, but you generally see some sort of dialog box for saving the notebook as a file. Utilize the same method for saving the IPython Notebook file as you do for any other file you save using your browser.

Removing a notebook

Sometimes notebooks get outdated or you simply don’t need to work with them any longer. Rather than allow your repository to get clogged with files you don’t need, you can remove these unwanted notebooks from the list. Use these steps to remove the file:

  1. Select the box next to the ML4D; 06; Sample.ipynb entry.
  2. Click the trash can icon (Delete) at the top of the page.

    You see a Delete notebook warning message like the one shown in Figure 2-12.

  3. Click Delete.

    The file is removed from the list.

image

FIGURE 2-12: Notebook warns you before removing any files from the repository.

Importing a notebook

To use the source code from this book, you must import the downloaded files into your repository. The source code comes in an archive file that you extract to a location on your hard drive. The archive contains a list of .ipynb (IPython Notebook) files containing the source code for this book found at www.dummies.com/go/codingaiodownloads. The following steps tell how to import these files into your repository:

  1. Click Upload at the top of the page.

    What you see depends on your browser. In most cases, you see some type of File Upload dialog box that provides access to the files on your hard drive.

  2. Navigate to the directory containing the files that you want to import into Notebook.
  3. Highlight one or more files to import and click the Open (or other, similar) button to begin the upload process.

    You see the file added to an upload list, as shown in Figure 2-13. The file isn’t part of the repository yet — you’ve simply selected it for upload.

  4. Click Upload.

    Notebook places the file in the repository so that you can begin using it.

image

FIGURE 2-13: The files that you want to add to the repository appear as part of an upload list.

Understanding the data sets used in this book

This book uses a number of data sets, all of which appear in the scikit-learn library. These data sets demonstrate various ways in which you can interact with data, and you use them in the examples to perform a variety of tasks. The following list provides a quick overview of the function used to import each of the data sets into your Python code:

  • load_boston(): Regression analysis with the Boston house-prices data set
  • load_iris(): Classification with the iris data set
  • load_diabetes(): Regression with the diabetes data set
  • load_digits([n_class]): Classification with the digits data set
  • fetch_20newsgroups(subset='train'): Data from 20 newsgroups
  • fetch_olivetti_faces(): Olivetti faces data set from AT&T

The technique for loading each of these data sets is the same across examples. The following example shows how to load the Boston house-prices data set. You can find the code in the ML4D; 06; Dataset Load.ipynb notebook.

from sklearn.datasets import load_boston
Boston = load_boston()
print Boston.data.shape

To see how the code works, click Run Cell. The output from the print call is (506L, 13L). You can see the output in Figure 2-14.

image

FIGURE 2-14: The Boston object contains the loaded data set.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset