官术网_书友最值得收藏!

Customizing the line type

So far, we have just plotted lines using the default settings. Breeze lets us customize how lines are drawn, at least to some extent.

For this example, we will use the height-weight data discussed in Chapter 2, Manipulating Data with Breeze. We will use the Scala shell here for demonstrative purposes, but you will find a program in BreezeDemo.scala that follows the example shell session.

The code examples for this chapter come with a module for loading the data, HWData.scala, that loads the data from the CSVs:

scala> val data = HWData.load
data: HWData = HWData [ 181 rows ]

scala> data.heights
breeze.linalg.DenseVector[Double] = DenseVector(182.0, ...

scala> data.weights
breeze.linalg.DenseVector[Double] = DenseVector(77.0, 58.0...

Let's create a scatter plot of the heights against the weights:

scala> val fig = Figure("height vs. weight")
fig: breeze.plot.Figure = breeze.plot.Figure@743f2558

scala> val plt = fig.subplot(0)
plt: breeze.plot.Plot = breeze.plot.Plot@501ea274

scala> plt += plot(data.heights, data.weights, '+', colorcode="black")
breeze.plot.Plot = breeze.plot.Plot@501ea274

This produces a scatter-plot of the height-weight data:

Note that we passed a third argument to the plot method, '+'. This controls the plotting style. As of this writing, there are three available styles: '-' (the default), '+', and '.'. Experiment with these to see what they do. Finally, we pass a colorcode="black" argument to control the color of the line. This is either a color name or an RGB triple, written as a string. Thus, to plot red points, we could have passed colorcode="[255,0,0]".

Looking at the height-weight plot, there is clearly a trend between height and weight. Let's try and fit a straight line through the data points. We will fit the following function:

Note

Scientific literature suggests that it would be better to fit something more like Customizing the line type. You should find it straightforward to fit a quadratic line to the data, should you wish to.

We will use Breeze's least squares function to find the values of a and b. The leastSquares method expects an input matrix of features and a target vector, just like the LogisticRegression class that we defined in the previous chapter. Recall that in Chapter 2, Manipulating Data with Breeze, when we prepared the training set for logistic regression classification, we introduced a dummy feature that was one for every participant to provide the degree of freedom for the y intercept. We will use the same approach here. Our feature matrix, therefore, contains two columns—one that is 1 everywhere and one for the height:

scala> val features = DenseMatrix.horzcat(
 DenseMatrix.ones[Double](data.npoints, 1),
 data.heights.toDenseMatrix.t
)
features: breeze.linalg.DenseMatrix[Double] =
1.0 182.0
1.0 161.0
1.0 161.0
1.0 177.0
1.0 157.0
...

scala> import breeze.stats.regression._
import breeze.stats.regression._

scala> val leastSquaresResult = leastSquares(features, data.weights)
leastSquaresResult: breeze.stats.regression.LeastSquaresRegressionResult = <function1>

The leastSquares method returns an instance of LeastSquareRegressionResult, which contains a coefficients attribute containing the coefficients that best fit the data:

scala> leastSquaresResult.coefficients
breeze.linalg.DenseVector[Double] = DenseVector(-131.042322, 1.1521875)

The best-fit line is therefore:

Let's extract the coefficients. An elegant way of doing this is to use Scala's pattern matching capabilities:

scala> val Array(a, b) = leastSquaresResult.coefficients.toArray
a: Double = -131.04232269750622
b: Double = 1.1521875435418725

By writing val Array(a, b) = ..., we are telling Scala that the right-hand side of the expression is a two-element array and to bind the first element of that array to the value a and the second to the value b. See Appendix, Pattern Matching and Extractors, for a discussion of pattern matching.

We can now add the best-fit line to our graph. We start by generating evenly-spaced dummy height values:

scala> val dummyHeights = linspace(min(data.heights), max(data.heights), 200)
dummyHeights: breeze.linalg.DenseVector[Double] = DenseVector(148.0, ...

scala> val fittedWeights = a :+ (b :* dummyHeights)
fittedWeights: breeze.linalg.DenseVector[Double] = DenseVector(39.4814...

scala> plt += plot(dummyHeights, fittedWeights, colorcode="red")
breeze.plot.Plot = breeze.plot.Plot@501ea274

Let's also add the equation for the best-fit line to the graph as an annotation. We will first generate the label:

scala> val label = f"weight = $a%.4f + $b%.4f * height"
label: String = weight = -131.0423 + 1.1522 * height

To add an annotation, we must access the underlying JFreeChart plot:

scala> import org.jfree.chart.annotations.XYTextAnnotation
import org.jfree.chart.annotations.XYTextAnnotation

scala> plt.plot.addAnnotation(new XYTextAnnotation(label, 175.0, 105.0))

The XYTextAnnotation constructor takes three parameters: the annotation string and a pair of (x, y) coordinates defining the centre of the annotation on the graph. The coordinates of the annotation are expressed in the coordinate system of the data. Thus, calling new XYTextAnnotation(label, 175.0, 105.0) generates an annotation whose centroid is at the point corresponding to a height of 175 cm and weight of 105 kg:

主站蜘蛛池模板: 基隆市| 玉环县| 迁安市| 略阳县| 延津县| 天祝| 英超| 旬阳县| 龙陵县| 安塞县| 宁南县| 河北区| 望城县| 广汉市| 汤阴县| 玛沁县| 二连浩特市| 赤城县| 会理县| 武安市| 佛坪县| 专栏| 怀安县| 陇西县| 尼木县| 安岳县| 开原市| 博罗县| 浑源县| 昌图县| 江北区| 璧山县| 原阳县| 新蔡县| 惠东县| 乳山市| 内丘县| 盐山县| 平乡县| 盱眙县| 海晏县|