官术网_书友最值得收藏!

Summary

In this chapter, we used several of scikit-learn's methods for building a standard workflow to run and evaluate data mining models. We introduced the Nearest Neighbors algorithm, which is implemented in scikit-learn as an estimator. Using this class is quite easy; first, we call the fit function on our training data, and second, we use the predict function to predict the class of testing samples.

We then looked at pre-processing by fixing poor feature scaling. This was done using a Transformer object and the MinMaxScaler class. These functions also have a fit method and then a transform, which takes data of one form as an input and returns a transformed dataset as an output.

To investigate these transformations further, try swapping out the MinMaxScaler with some of the other mentioned transformers. Which is the most effective and why would this be the case?

Other transformers also exist in scikit-learn, which we will use later in this book, such as PCA. Try some of these out as well, referencing scikit-learn's excellent documentation at https://scikit-learn.org/stable/modules/preprocessing.html

In the next chapter, we will use these concepts in a larger example, predicting the outcome of sports matches using real-world data.

主站蜘蛛池模板: 松阳县| 五寨县| 德阳市| 恩平市| 即墨市| 乌拉特后旗| 南投县| 微博| 苗栗市| 松溪县| 华池县| 新化县| 宿迁市| 乐都县| 修水县| 临泽县| 射阳县| 华池县| 徐汇区| 洪雅县| 延庆县| 南木林县| 原阳县| 白沙| 江源县| 大冶市| 平昌县| 清水县| 汉阴县| 图木舒克市| 宁津县| 雅江县| 叙永县| 孟州市| 阜南县| 舒城县| 息烽县| 鹤壁市| 成武县| 都昌县| 承德市|