So far, we have just plotted lines using the default settings. Breeze lets us customize how lines are drawn, at least to some extent.
For this example, we will use the height-weight data discussed in Chapter 2, Manipulating Data with Breeze. We will use the Scala shell here for demonstrative purposes, but you will find a program in BreezeDemo.scala
that follows the example shell session.
The code examples for this chapter come with a module for loading the data, HWData.scala
, that loads the data from the CSVs:
scala> val data = HWData.load data: HWData = HWData [ 181 rows ] scala> data.heights breeze.linalg.DenseVector[Double] = DenseVector(182.0, ... scala> data.weights breeze.linalg.DenseVector[Double] = DenseVector(77.0, 58.0...
Let's create a scatter plot of the heights against the weights:
scala> val fig = Figure("height vs. weight") fig: breeze.plot.Figure = breeze.plot.Figure@743f2558 scala> val plt = fig.subplot(0) plt: breeze.plot.Plot = breeze.plot.Plot@501ea274 scala> plt += plot(data.heights, data.weights, '+', colorcode="black") breeze.plot.Plot = breeze.plot.Plot@501ea274
This produces a scatter-plot of the height-weight data:
Note that we passed a third argument to the plot
method, '+'
. This controls the plotting style. As of this writing, there are three available styles: '-'
(the default), '+'
, and '.'
. Experiment with these to see what they do. Finally, we pass a colorcode="black"
argument to control the color of the line. This is either a color name or an RGB triple, written as a string. Thus, to plot red points, we could have passed colorcode="[255,0,0]"
.
Looking at the height-weight plot, there is clearly a trend between height and weight. Let's try and fit a straight line through the data points. We will fit the following function:
We will use Breeze's least squares function to find the values of a
and b
. The leastSquares
method expects an input matrix of features and a target vector, just like the LogisticRegression
class that we defined in the previous chapter. Recall that in Chapter 2, Manipulating Data with Breeze, when we prepared the training set for logistic regression classification, we introduced a dummy feature that was one for every participant to provide the degree of freedom for the y intercept. We will use the same approach here. Our feature matrix, therefore, contains two columns—one that is 1
everywhere and one for the height:
scala> val features = DenseMatrix.horzcat( DenseMatrix.ones[Double](data.npoints, 1), data.heights.toDenseMatrix.t ) features: breeze.linalg.DenseMatrix[Double] = 1.0 182.0 1.0 161.0 1.0 161.0 1.0 177.0 1.0 157.0 ... scala> import breeze.stats.regression._ import breeze.stats.regression._ scala> val leastSquaresResult = leastSquares(features, data.weights) leastSquaresResult: breeze.stats.regression.LeastSquaresRegressionResult = <function1>
The leastSquares
method returns an instance of LeastSquareRegressionResult
, which contains a
coefficients
attribute containing the coefficients that best fit the data:
scala> leastSquaresResult.coefficients breeze.linalg.DenseVector[Double] = DenseVector(-131.042322, 1.1521875)
The best-fit line is therefore:
Let's extract the coefficients. An elegant way of doing this is to use Scala's pattern matching capabilities:
scala> val Array(a, b) = leastSquaresResult.coefficients.toArray a: Double = -131.04232269750622 b: Double = 1.1521875435418725
By writing val Array(a, b) = ...
, we are telling Scala that the right-hand side of the expression is a two-element array and to bind the first element of that array to the value a
and the second to the value b
. See Appendix, Pattern Matching and Extractors, for a discussion of pattern matching.
We can now add the best-fit line to our graph. We start by generating evenly-spaced dummy height values:
scala> val dummyHeights = linspace(min(data.heights), max(data.heights), 200) dummyHeights: breeze.linalg.DenseVector[Double] = DenseVector(148.0, ... scala> val fittedWeights = a :+ (b :* dummyHeights) fittedWeights: breeze.linalg.DenseVector[Double] = DenseVector(39.4814... scala> plt += plot(dummyHeights, fittedWeights, colorcode="red") breeze.plot.Plot = breeze.plot.Plot@501ea274
Let's also add the equation for the best-fit line to the graph as an annotation. We will first generate the label:
scala> val label = f"weight = $a%.4f + $b%.4f * height" label: String = weight = -131.0423 + 1.1522 * height
To add an annotation, we must access the underlying JFreeChart plot:
scala> import org.jfree.chart.annotations.XYTextAnnotation import org.jfree.chart.annotations.XYTextAnnotation scala> plt.plot.addAnnotation(new XYTextAnnotation(label, 175.0, 105.0))
The
XYTextAnnotation
constructor takes three parameters: the annotation string and a pair of (x, y) coordinates defining the centre of the annotation on the graph. The coordinates of the annotation are expressed in the coordinate system of the data. Thus, calling new XYTextAnnotation(label, 175.0, 105.0)
generates an annotation whose centroid is at the point corresponding to a height of 175 cm and weight of 105 kg: