In simple terms, Breeze (http://www.scalanlp.org) is a Scala library that extends the Scala collection library to provide support for vectors and matrices in addition to providing a whole bunch of functions that support their manipulation. We could safely compare Breeze to NumPy (http://www.numpy.org/) in Python terms. Breeze forms the foundation of MLlib—the Machine Learning library in Spark, which we will explore in later chapters.
In this first recipe, we will see how to pull the Breeze libraries into our project using Scala Build Tool (SBT). We will also see a brief history of Breeze to better appreciate why it could be considered as the "go to" linear algebra library in Scala.
Let's add the Breeze dependencies into our build.sbt
so that we can start playing with them in the subsequent recipes. The Breeze dependencies are just two—the breeze
(core) and the breeze-native
dependencies.
build.sbt
.breeze
libraries to the project dependencies:organization := "com.packt" name := "chapter1-breeze" scalaVersion := "2.10.4" libraryDependencies ++= Seq( "org.scalanlp" %% "breeze" % "0.11.2", //Optional - the 'why' is explained in the How it works section "org.scalanlp" %% "breeze-natives" % "0.11.2" )
sbt compile
command in order to fetch all your dependencies.You could import the project into your Eclipse using sbt eclipse
after installing the
sbteclipse
plugin https://github.com/typesafehub/sbteclipse/. For IntelliJ IDEA, you just need to import the project by pointing to the root folder where your build.sbt
file is.
Let's look into the details of what the breeze
and breeze-native
library dependencies we added bring to us.
Breeze has a long history in that it isn't written from scratch in Scala. Without the native dependency, Breeze leverages the power of netlib-java
that has a Java-compiled version of the FORTRAN Reference implementation of BLAS/LAPACK. The netlib-java
also provides gentle wrappers over the Java compiled library. What this means is that we could still work without the native dependency but the performance won't be great considering the best performance that we could leverage out of this FORTRAN-translated library is the performance of the FORTRAN reference implementation itself. However, for serious number crunching with the best performance, we should add the breeze-natives
dependency too.
With its native additive, Breeze looks for the machine-specific implementations of the BLAS/LAPACK libraries. The good news is that there are open source and (vendor provided) commercial implementations for most popular processors and GPUs. The most popular open source implementations include ATLAS (http://math-atlas.sourceforge.net) and OpenBLAS (http://www.openblas.net/).
If you are running a Mac, you are in luck—Native BLAS libraries come out of the box on Macs. Installing NativeBLAS on Ubuntu / Debian involves just running the following commands:
sudo apt-get install libatlas3-base libopenblas-base sudo update-alternatives --config libblas.so.3 sudo update-alternatives --config liblapack.so.3
Downloading the example code
You can download the example code files from your account at http://www.packtpub.com for all the Packt Publishing books you have purchased. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.
For Windows, please refer to the installation instructions on https://github.com/xianyi/OpenBLAS/wiki/Installation-Guide.