官术网_书友最值得收藏!

Summary

In this chapter, we extended our use of scikit-learn's classifiers to perform classification and introduced the pandaslibrary to manage our data. We analyzed real-world data on basketball results from the NBA, saw some of the problems that even well-curated data introduces, and created new features for our analysis.

We saw the effect that good features have on performance and used an ensemble algorithm, random forests, to further improve the accuracy. To take these concepts further, try to create your own features and test them out. Which features perform better? If you have trouble coming up with features, think about what other datasets can be included. For example, if key players are injured, this might affect the results of a specific match and cause a better team to lose.

In the next chapter, we will extend the affinity analysis that we performed in the first chapter to create a program to find similar books. We will see how to use algorithms for ranking and also use an approximation to improve the scalability of data mining.

主站蜘蛛池模板: 临夏县| 营山县| 旬阳县| 桐乡市| 揭东县| 合江县| 成武县| 钟祥市| 青田县| 河西区| 景东| 伊春市| 安顺市| 大余县| 沁水县| 卢湾区| 渭南市| 金华市| 大安市| 盈江县| 商水县| 龙泉市| 同心县| 南宁市| 奈曼旗| 尉氏县| 遂宁市| 高清| 宜州市| 宁河县| 瑞安市| 广水市| 佛教| 铜鼓县| 华容县| 尼木县| 郧西县| 梁河县| 论坛| 五台县| 页游|