官术网_书友最值得收藏!

Discussion and further work

This model is now ready to be used to predict things. Is this the best model? No, it's not. Finding the best model is a never ending quest. To be sure, there are indefinite ways of improving this model. One can use LASSO methods to determine the importance of variables before using them.

The model is not only the linear regression, but also the data cleaning functions and ingestion functions that come with it. This leads to a very high number of tweakable parameters. Maybe if you didn't like the way I imputed data, you can always write your own method!

Furthermore the code in this chapter can be cleaned up further. Instead of returning so many values in the clean function, a new tuple type can be created to hold the Xs and Ys—a data frame of sorts. In fact, that's what we're going to build in the upcoming chapters. Several functions can be made more efficient using a state-holder struct.

If you will note, there are not very many statistical packages like Pandas for Go. This is not for the lack of trying. Go as a language is all about solving problems, not about building generic packages. There are definitely dataframe-like packages in Go, but in my experience, using them tends to blind one to the most obvious and efficient solutions. Often, it's better to build your own data structures that are specific to the problem at hand.

For the most part in Go, the model building is an iterative process, while productionizing the model is a process that happens after the model has been built. This chapter shows that with a little awkwardness, it is possible to build a model using an iterative process that immediately translates to a production-ready system.

主站蜘蛛池模板: 文安县| 温州市| 赣榆县| 婺源县| 吴堡县| 浏阳市| 涞水县| 赤壁市| 综艺| 高要市| 乌鲁木齐市| 平利县| 彰化县| 西城区| 新建县| 盘锦市| 福清市| 香港 | 镇康县| 滨州市| 武冈市| 什邡市| 冕宁县| 巴中市| 沐川县| 合川市| 永靖县| 杭锦旗| 汝南县| 沁阳市| 新兴县| 盐池县| 鱼台县| 上虞市| 珲春市| 方正县| 柳林县| 湾仔区| 青河县| 道孚县| 海伦市|