In this chapter, we will explore an elementary yet elegant mathematical data structure—the matrix. Most computer science and mathematics graduates would already be familiar with matrices and their applications. In the context of machine learning, matrices are used to implement several types of machine-learning techniques, such as linear regression and classification. We will study more about these techniques in the later chapters.
Although this chapter may seem mostly theoretical at first, we will soon see that matrices are a very useful abstraction for quickly organizing and indexing data with multiple dimensions. The data used by machine-learning techniques contains a large number of sample values in several dimensions. Thus, matrices can be used to store and manipulate this sample data.
An interesting application that uses matrices is Google Search, which is built on the PageRank algorithm. Although a detailed explanation of this algorithm is beyond the scope of this book, it's worth knowing that Google Search essentially finds the eigen-vector of an extremely massive matrix of data (for more information, refer to The Anatomy of a Large-Scale Hypertextual Web Search Engine). Matrices are used for a variety of applications in computing. Although we do not discuss the eigen-vector matrix operation used by Google Search in this book, we will encounter a variety of matrix operations while implementing machine-learning algorithms. In this chapter, we will describe the useful operations that we can perform on matrices.
Over the course of this book, we will use Leiningen (http://leiningen.org/) to manage third-party libraries and dependencies. Leiningen, or lein
, is the standard Clojure package management and automation tool, and has several powerful features used to manage Clojure projects.
To get instructions on how to install Leiningen, visit the project site at http://leiningen.org/. The first run of the lein
program could take a while, as it downloads and installs the Leiningen binaries when it's run for the first time. We can create a new Leiningen project using the new
subcommand of lein
, as follows:
$ lein new default my-project
The preceding command creates a new directory, my-project
, which will contain all source and configuration files for a Clojure project. This folder contains the source files in the src
subdirectory and a single project.clj
file. In this command, default
is the type of project template to be used for the new project. All the examples in this book use the preceding default
project template.
The project.clj
file contains all the configuration associated with the project and will have the following structure:
(defproject my-project "0.1.0-SNAPSHOT" :description "FIXME: write description" :url "http://example.com/FIXME" :license {:name "Eclipse Public License" :url "http://www.eclipse.org/legal/epl-v10.html"} :dependencies [[org.clojure/clojure "1.5.1"]])
Downloading the example code
You can download the example code files for all Packt books you have purchased from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
Third-party Clojure libraries can be included in a project by adding the declarations to the vector with the :dependencies
key. For example, the core.matrix Clojure library package on Clojars (https://clojars.org/net.mikera/core.matrix) gives us the package declaration [net.mikera/core.matrix "0.20.0"]
. We simply paste this declaration into the :dependencies
vector to add the core.matrix library package as a dependency for our Clojure project, as shown in the following code:
:dependencies [[org.clojure/clojure "1.5.1"] [net.mikera/core.matrix "0.20.0"]])
To download all the dependencies declared in the project.clj
file, simply run the following deps
subcommand:
$ lein deps
Leiningen also provides an REPL (read-evaluate-print-loop), which is simply an interactive interpreter that contains all the dependencies declared in the project.clj
file. This REPL will also reference all the Clojure namespaces that we have defined in our project. We can start the REPL using the following repl
subcommand of lein
. This will start a new REPL session:
$ lein repl