- Scala for Data Science
- Pascal Bugnion
- 745字
- 2021-07-23 14:33:06
Customizing the line type
So far, we have just plotted lines using the default settings. Breeze lets us customize how lines are drawn, at least to some extent.
For this example, we will use the height-weight data discussed in Chapter 2, Manipulating Data with Breeze. We will use the Scala shell here for demonstrative purposes, but you will find a program in BreezeDemo.scala
that follows the example shell session.
The code examples for this chapter come with a module for loading the data, HWData.scala
, that loads the data from the CSVs:
scala> val data = HWData.load data: HWData = HWData [ 181 rows ] scala> data.heights breeze.linalg.DenseVector[Double] = DenseVector(182.0, ... scala> data.weights breeze.linalg.DenseVector[Double] = DenseVector(77.0, 58.0...
Let's create a scatter plot of the heights against the weights:
scala> val fig = Figure("height vs. weight") fig: breeze.plot.Figure = breeze.plot.Figure@743f2558 scala> val plt = fig.subplot(0) plt: breeze.plot.Plot = breeze.plot.Plot@501ea274 scala> plt += plot(data.heights, data.weights, '+', colorcode="black") breeze.plot.Plot = breeze.plot.Plot@501ea274
This produces a scatter-plot of the height-weight data:

Note that we passed a third argument to the plot
method, '+'
. This controls the plotting style. As of this writing, there are three available styles: '-'
(the default), '+'
, and '.'
. Experiment with these to see what they do. Finally, we pass a colorcode="black"
argument to control the color of the line. This is either a color name or an RGB triple, written as a string. Thus, to plot red points, we could have passed colorcode="[255,0,0]"
.
Looking at the height-weight plot, there is clearly a trend between height and weight. Let's try and fit a straight line through the data points. We will fit the following function:

Note
Scientific literature suggests that it would be better to fit something more like . You should find it straightforward to fit a quadratic line to the data, should you wish to.
We will use Breeze's least squares function to find the values of a
and b
. The leastSquares
method expects an input matrix of features and a target vector, just like the LogisticRegression
class that we defined in the previous chapter. Recall that in Chapter 2, Manipulating Data with Breeze, when we prepared the training set for logistic regression classification, we introduced a dummy feature that was one for every participant to provide the degree of freedom for the y intercept. We will use the same approach here. Our feature matrix, therefore, contains two columns—one that is 1
everywhere and one for the height:
scala> val features = DenseMatrix.horzcat( DenseMatrix.ones[Double](data.npoints, 1), data.heights.toDenseMatrix.t ) features: breeze.linalg.DenseMatrix[Double] = 1.0 182.0 1.0 161.0 1.0 161.0 1.0 177.0 1.0 157.0 ... scala> import breeze.stats.regression._ import breeze.stats.regression._ scala> val leastSquaresResult = leastSquares(features, data.weights) leastSquaresResult: breeze.stats.regression.LeastSquaresRegressionResult = <function1>
The leastSquares
method returns an instance of LeastSquareRegressionResult
, which contains a coefficients
attribute containing the coefficients that best fit the data:
scala> leastSquaresResult.coefficients breeze.linalg.DenseVector[Double] = DenseVector(-131.042322, 1.1521875)
The best-fit line is therefore:

Let's extract the coefficients. An elegant way of doing this is to use Scala's pattern matching capabilities:
scala> val Array(a, b) = leastSquaresResult.coefficients.toArray a: Double = -131.04232269750622 b: Double = 1.1521875435418725
By writing val Array(a, b) = ...
, we are telling Scala that the right-hand side of the expression is a two-element array and to bind the first element of that array to the value a
and the second to the value b
. See Appendix, Pattern Matching and Extractors, for a discussion of pattern matching.
We can now add the best-fit line to our graph. We start by generating evenly-spaced dummy height values:
scala> val dummyHeights = linspace(min(data.heights), max(data.heights), 200) dummyHeights: breeze.linalg.DenseVector[Double] = DenseVector(148.0, ... scala> val fittedWeights = a :+ (b :* dummyHeights) fittedWeights: breeze.linalg.DenseVector[Double] = DenseVector(39.4814... scala> plt += plot(dummyHeights, fittedWeights, colorcode="red") breeze.plot.Plot = breeze.plot.Plot@501ea274
Let's also add the equation for the best-fit line to the graph as an annotation. We will first generate the label:
scala> val label = f"weight = $a%.4f + $b%.4f * height" label: String = weight = -131.0423 + 1.1522 * height
To add an annotation, we must access the underlying JFreeChart plot:
scala> import org.jfree.chart.annotations.XYTextAnnotation import org.jfree.chart.annotations.XYTextAnnotation scala> plt.plot.addAnnotation(new XYTextAnnotation(label, 175.0, 105.0))
The XYTextAnnotation
constructor takes three parameters: the annotation string and a pair of (x, y) coordinates defining the centre of the annotation on the graph. The coordinates of the annotation are expressed in the coordinate system of the data. Thus, calling new XYTextAnnotation(label, 175.0, 105.0)
generates an annotation whose centroid is at the point corresponding to a height of 175 cm and weight of 105 kg:

- JavaScript從入門到精通(微視頻精編版)
- Java多線程編程實戰(zhàn)指南:設計模式篇(第2版)
- Mastering ServiceStack
- PaaS程序設計
- Hands-On Data Structures and Algorithms with JavaScript
- Getting Started with PowerShell
- Python高效開發(fā)實戰(zhàn):Django、Tornado、Flask、Twisted(第3版)
- 概率成形編碼調(diào)制技術理論及應用
- Rust Essentials(Second Edition)
- Gradle for Android
- D3.js 4.x Data Visualization(Third Edition)
- 執(zhí)劍而舞:用代碼創(chuàng)作藝術
- Learning Material Design
- Django實戰(zhàn):Python Web典型模塊與項目開發(fā)
- Windows Phone 8 Game Development