官术网_书友最值得收藏!

The regression

And so, now we're ready to build the regression model. Bear in mind that this section is highly iterative in real life. I will describe the iterations, but will only share the model that I chose to settle on.

The github.com/sajari/regression package does an admirable job. But we want to extend the package a little to be able to compare models and the coefficients of the parameters. So I wrote this function:

func runRegression(Xs [][]float64, Ys []float64, hdr []string) (r *regression.Regression, stdErr []float64) {
r = new(regression.Regression)
dp := make(regression.DataPoints, 0, len(Xs))
for i, h := range hdr {
r.SetVar(i, h)
}
for i, row := range Xs {
if i < 3 {
log.Printf("Y %v Row %v", Ys[i], row)
}
dp = append(dp, regression.DataPoint(Ys[i], row))
}
r.Train(dp...)
r.Run()

// calculate StdErr
var sseY float64
sseX := make([]float64, len(hdr)+1)
meanX := make([]float64, len(hdr)+1)
for i, row := range Xs {
pred, _ := r.Predict(row)
sseY += (Ys[i] - pred) * (Ys[i] - pred)
for j, c := range row {
meanX[j+1] += c
}
}
sseY /= float64(len(Xs) - len(hdr) - 1) // n - df ; df = len(hdr) + 1
vecf64.ScaleInv(meanX, float64(len(Xs)))
sseX[0] = 1
for _, row := range Xs {
for j, c := range row {
sseX[j+1] += (c - meanX[j+1]) * (c - meanX[j+1])
}
}
sseY = math.Sqrt(sseY)
vecf64.Sqrt(sseX)
vecf64.ScaleInvR(sseX, sseY)

return r, sseX
}

runRegression will perform the regression analysis, and print the outputs of the standard errors of the coefficients. It is an estimate of the standard deviation of the coefficients—imagine this model being run many many times: each time the coefficients might be slightly different. The standard error simply reports amount of variation in the coefficients.

The standard errors are calculated with the help of the gorgonia.org/vecf64 package, which performs in-place operations for vectors. Optionally, you may choose to write them as loops.

This function also introduces us to the API for the github.com/sajari/regression package—to predict, simply use r.Predict(vars). This will be useful in cases where one would like to use this model for production.

For now, let us focus on the other half of the main function:

  // do the regessions
r, stdErr := runRegression(it, YsBack, newHdr)
tdist := distuv.StudentsT{Mu: 0, Sigma: 1, Nu: float64(len(it) - len(newHdr) - 1), Src: rand.New(rand.NewSource(uint64(time.Now().UnixNano())))}
fmt.Printf("R^2: %1.3f\n", r.R2)
fmt.Printf("\tVariable \tCoefficient \tStdErr \tt-stat\tp-value\n")
fmt.Printf("\tIntercept: \t%1.5f \t%1.5f \t%1.5f \t%1.5f\n", r.Coeff(0), stdErr[0], r.Coeff(0)/stdErr[0], tdist.Prob(r.Coeff(0)/stdErr[0]))
for i, h := range newHdr {
b := r.Coeff(i + 1)
e := stdErr[i+1]
t := b / e
p := tdist.Prob(t)
fmt.Printf("\t%v: \t%1.5f \t%1.5f \t%1.5f \t%1.5f\n", h, b, e, t, p)
}
...

Here, we run the regression, and then we print the results. We don't just want to output the regression coefficients. We also want to output the standard errors, the t-statistic, and the P-value. This would give us some confidence over the estimated coefficients.

tdist := distuv.StudentsT{Mu: 0, Sigma: 1, Nu: float64(len(it) - len(newHdr) - 1), Src: rand.New(rand.NewSource(uint64(time.Now().UnixNano())))} creates a Student's t-distribution, which we will compare against our data. The t-statistic is very simply calculated by dividing the coefficient by the standard error.

主站蜘蛛池模板: 岳池县| 汉源县| 修武县| 武功县| 精河县| 临汾市| 彰化市| 龙山县| 安国市| 石屏县| 双流县| 泰宁县| 密山市| 昌都县| 禹州市| 南华县| 嘉荫县| 双流县| 天镇县| 博白县| 金堂县| 南岸区| 宁城县| 廉江市| 杨浦区| 会理县| 灵石县| 句容市| 海丰县| 南江县| 维西| 射阳县| 金乡县| 聂荣县| 靖边县| 化隆| 永寿县| 广东省| 巢湖市| 长泰县| 来凤县|