官术网_书友最值得收藏!

Customizing the line type

So far, we have just plotted lines using the default settings. Breeze lets us customize how lines are drawn, at least to some extent.

For this example, we will use the height-weight data discussed in Chapter 2, Manipulating Data with Breeze. We will use the Scala shell here for demonstrative purposes, but you will find a program in BreezeDemo.scala that follows the example shell session.

The code examples for this chapter come with a module for loading the data, HWData.scala, that loads the data from the CSVs:

scala> val data = HWData.load
data: HWData = HWData [ 181 rows ]

scala> data.heights
breeze.linalg.DenseVector[Double] = DenseVector(182.0, ...

scala> data.weights
breeze.linalg.DenseVector[Double] = DenseVector(77.0, 58.0...

Let's create a scatter plot of the heights against the weights:

scala> val fig = Figure("height vs. weight")
fig: breeze.plot.Figure = breeze.plot.Figure@743f2558

scala> val plt = fig.subplot(0)
plt: breeze.plot.Plot = breeze.plot.Plot@501ea274

scala> plt += plot(data.heights, data.weights, '+', colorcode="black")
breeze.plot.Plot = breeze.plot.Plot@501ea274

This produces a scatter-plot of the height-weight data:

Note that we passed a third argument to the plot method, '+'. This controls the plotting style. As of this writing, there are three available styles: '-' (the default), '+', and '.'. Experiment with these to see what they do. Finally, we pass a colorcode="black" argument to control the color of the line. This is either a color name or an RGB triple, written as a string. Thus, to plot red points, we could have passed colorcode="[255,0,0]".

Looking at the height-weight plot, there is clearly a trend between height and weight. Let's try and fit a straight line through the data points. We will fit the following function:

Note

Scientific literature suggests that it would be better to fit something more like Customizing the line type. You should find it straightforward to fit a quadratic line to the data, should you wish to.

We will use Breeze's least squares function to find the values of a and b. The leastSquares method expects an input matrix of features and a target vector, just like the LogisticRegression class that we defined in the previous chapter. Recall that in Chapter 2, Manipulating Data with Breeze, when we prepared the training set for logistic regression classification, we introduced a dummy feature that was one for every participant to provide the degree of freedom for the y intercept. We will use the same approach here. Our feature matrix, therefore, contains two columns—one that is 1 everywhere and one for the height:

scala> val features = DenseMatrix.horzcat(
 DenseMatrix.ones[Double](data.npoints, 1),
 data.heights.toDenseMatrix.t
)
features: breeze.linalg.DenseMatrix[Double] =
1.0 182.0
1.0 161.0
1.0 161.0
1.0 177.0
1.0 157.0
...

scala> import breeze.stats.regression._
import breeze.stats.regression._

scala> val leastSquaresResult = leastSquares(features, data.weights)
leastSquaresResult: breeze.stats.regression.LeastSquaresRegressionResult = <function1>

The leastSquares method returns an instance of LeastSquareRegressionResult, which contains a coefficients attribute containing the coefficients that best fit the data:

scala> leastSquaresResult.coefficients
breeze.linalg.DenseVector[Double] = DenseVector(-131.042322, 1.1521875)

The best-fit line is therefore:

Let's extract the coefficients. An elegant way of doing this is to use Scala's pattern matching capabilities:

scala> val Array(a, b) = leastSquaresResult.coefficients.toArray
a: Double = -131.04232269750622
b: Double = 1.1521875435418725

By writing val Array(a, b) = ..., we are telling Scala that the right-hand side of the expression is a two-element array and to bind the first element of that array to the value a and the second to the value b. See Appendix, Pattern Matching and Extractors, for a discussion of pattern matching.

We can now add the best-fit line to our graph. We start by generating evenly-spaced dummy height values:

scala> val dummyHeights = linspace(min(data.heights), max(data.heights), 200)
dummyHeights: breeze.linalg.DenseVector[Double] = DenseVector(148.0, ...

scala> val fittedWeights = a :+ (b :* dummyHeights)
fittedWeights: breeze.linalg.DenseVector[Double] = DenseVector(39.4814...

scala> plt += plot(dummyHeights, fittedWeights, colorcode="red")
breeze.plot.Plot = breeze.plot.Plot@501ea274

Let's also add the equation for the best-fit line to the graph as an annotation. We will first generate the label:

scala> val label = f"weight = $a%.4f + $b%.4f * height"
label: String = weight = -131.0423 + 1.1522 * height

To add an annotation, we must access the underlying JFreeChart plot:

scala> import org.jfree.chart.annotations.XYTextAnnotation
import org.jfree.chart.annotations.XYTextAnnotation

scala> plt.plot.addAnnotation(new XYTextAnnotation(label, 175.0, 105.0))

The XYTextAnnotation constructor takes three parameters: the annotation string and a pair of (x, y) coordinates defining the centre of the annotation on the graph. The coordinates of the annotation are expressed in the coordinate system of the data. Thus, calling new XYTextAnnotation(label, 175.0, 105.0) generates an annotation whose centroid is at the point corresponding to a height of 175 cm and weight of 105 kg:

主站蜘蛛池模板: 南郑县| 葵青区| 承德市| 民丰县| 宜君县| 义乌市| 南充市| 杭锦旗| 运城市| 威海市| 广宁县| 竹山县| 泸水县| 绥江县| 儋州市| 商城县| 綦江县| 贵定县| 花垣县| 景德镇市| 清流县| 临漳县| 陇川县| 曲阜市| 白城市| 石河子市| 阜宁县| 东乌| 扶风县| 沾化县| 新宾| 慈利县| 禄劝| 武城县| 中超| 福鼎市| 湄潭县| 德州市| 施秉县| 房产| 宣城市|