官术网_书友最值得收藏!

Summary

In this chapter, we used several of scikit-learn's methods for building a standard workflow to run and evaluate data mining models. We introduced the Nearest Neighbors algorithm, which is implemented in scikit-learn as an estimator. Using this class is quite easy; first, we call the fit function on our training data, and second, we use the predict function to predict the class of testing samples.

We then looked at pre-processing by fixing poor feature scaling. This was done using a Transformer object and the MinMaxScaler class. These functions also have a fit method and then a transform, which takes data of one form as an input and returns a transformed dataset as an output.

To investigate these transformations further, try swapping out the MinMaxScaler with some of the other mentioned transformers. Which is the most effective and why would this be the case?

Other transformers also exist in scikit-learn, which we will use later in this book, such as PCA. Try some of these out as well, referencing scikit-learn's excellent documentation at https://scikit-learn.org/stable/modules/preprocessing.html

In the next chapter, we will use these concepts in a larger example, predicting the outcome of sports matches using real-world data.

主站蜘蛛池模板: 滨州市| 哈巴河县| 贵定县| 济阳县| 射阳县| 葵青区| 栖霞市| 西安市| 石家庄市| 双鸭山市| 监利县| 通道| 津南区| 宁陵县| 罗平县| 阿合奇县| 三门县| 白山市| 二连浩特市| 崇义县| 桂林市| 大理市| 嵊泗县| 福建省| 济源市| 玉山县| 黄浦区| 舒城县| 二连浩特市| 磐安县| 兴隆县| 泸西县| 临猗县| 耿马| 牟定县| 鄂州市| 嘉善县| 吉木萨尔县| 揭阳市| 闽清县| 富源县|