官术网_书友最值得收藏!

Discussion and further work

This model is now ready to be used to predict things. Is this the best model? No, it's not. Finding the best model is a never ending quest. To be sure, there are indefinite ways of improving this model. One can use LASSO methods to determine the importance of variables before using them.

The model is not only the linear regression, but also the data cleaning functions and ingestion functions that come with it. This leads to a very high number of tweakable parameters. Maybe if you didn't like the way I imputed data, you can always write your own method!

Furthermore the code in this chapter can be cleaned up further. Instead of returning so many values in the clean function, a new tuple type can be created to hold the Xs and Ys—a data frame of sorts. In fact, that's what we're going to build in the upcoming chapters. Several functions can be made more efficient using a state-holder struct.

If you will note, there are not very many statistical packages like Pandas for Go. This is not for the lack of trying. Go as a language is all about solving problems, not about building generic packages. There are definitely dataframe-like packages in Go, but in my experience, using them tends to blind one to the most obvious and efficient solutions. Often, it's better to build your own data structures that are specific to the problem at hand.

For the most part in Go, the model building is an iterative process, while productionizing the model is a process that happens after the model has been built. This chapter shows that with a little awkwardness, it is possible to build a model using an iterative process that immediately translates to a production-ready system.

主站蜘蛛池模板: 嘉兴市| 瓮安县| 双城市| 文安县| 贡山| 田东县| 宿松县| 肇庆市| 弥渡县| 玉田县| 湖北省| 大余县| 织金县| 宁陕县| 建宁县| 武平县| 建平县| 长兴县| 双城市| 丰顺县| 平邑县| 江山市| 寿光市| 托克逊县| 昌邑市| 繁昌县| 临汾市| 泊头市| 屏东县| 马龙县| 义乌市| 康乐县| 青州市| 沧源| 新田县| 鄂伦春自治旗| 阿巴嘎旗| 阜新| 凌源市| 绿春县| 固阳县|