官术网_书友最值得收藏!

Summary

In this chapter, we extended our use of scikit-learn's classifiers to perform classification and introduced the pandaslibrary to manage our data. We analyzed real-world data on basketball results from the NBA, saw some of the problems that even well-curated data introduces, and created new features for our analysis.

We saw the effect that good features have on performance and used an ensemble algorithm, random forests, to further improve the accuracy. To take these concepts further, try to create your own features and test them out. Which features perform better? If you have trouble coming up with features, think about what other datasets can be included. For example, if key players are injured, this might affect the results of a specific match and cause a better team to lose.

In the next chapter, we will extend the affinity analysis that we performed in the first chapter to create a program to find similar books. We will see how to use algorithms for ranking and also use an approximation to improve the scalability of data mining.

主站蜘蛛池模板: 宜宾县| 七台河市| 治县。| 葫芦岛市| 左权县| 怀远县| 高台县| 罗平县| 大竹县| 中阳县| 玉田县| 白水县| 连平县| 富阳市| 弋阳县| 白城市| 黑山县| 沈阳市| 定边县| 潍坊市| 临沭县| 安图县| 洛浦县| 安丘市| 孟州市| 将乐县| 宜川县| 皮山县| 济南市| 威信县| 西丰县| 新蔡县| 武穴市| 白玉县| 南汇区| 宜君县| 南充市| 浦北县| 茌平县| 南丰县| 上栗县|