官术网_书友最值得收藏!

Summary

In this chapter, we used several of scikit-learn's methods for building a standard workflow to run and evaluate data mining models. We introduced the Nearest Neighbors algorithm, which is implemented in scikit-learn as an estimator. Using this class is quite easy; first, we call the fit function on our training data, and second, we use the predict function to predict the class of testing samples.

We then looked at pre-processing by fixing poor feature scaling. This was done using a Transformer object and the MinMaxScaler class. These functions also have a fit method and then a transform, which takes data of one form as an input and returns a transformed dataset as an output.

To investigate these transformations further, try swapping out the MinMaxScaler with some of the other mentioned transformers. Which is the most effective and why would this be the case?

Other transformers also exist in scikit-learn, which we will use later in this book, such as PCA. Try some of these out as well, referencing scikit-learn's excellent documentation at https://scikit-learn.org/stable/modules/preprocessing.html

In the next chapter, we will use these concepts in a larger example, predicting the outcome of sports matches using real-world data.

主站蜘蛛池模板: 鹤峰县| 卓资县| 杂多县| 富顺县| 张家口市| 平安县| 黄浦区| 鄄城县| 沾化县| 怀集县| 大兴区| 潮安县| 察雅县| 仁化县| 遂平县| 娄底市| 安阳县| 沂南县| 桂平市| 开化县| 杂多县| 沁阳市| 安吉县| 泽州县| 遵义县| 华亭县| 南充市| 古蔺县| 长乐市| 郸城县| 斗六市| 炎陵县| 大荔县| 磐石市| 中江县| 三江| 邵东县| 达日县| 鲜城| 汝南县| 安仁县|