官术网_书友最值得收藏!

Summary

In this chapter, we introduced data mining using Python. If you could run the code in this section (note that the full code is available in the supplied code package), then your computer is set up for much of the rest of the book. Other Python libraries will be introduced in later chapters to perform more specialized tasks.

We used the Jupyter Notebook to run our code, which allows us to immediately view the results of a small section of the code. Jupyter Notebook is a useful tool that will be used throughout the book.

We introduced a simple affinity analysis, finding products that are purchased together. This type of exploratory analysis gives an insight into a business process, an environment, or a scenario. The information from these types of analysis can assist in business processes, find the next big medical breakthrough, or create the next artificial intelligence.

Also, in this chapter, there was a simple classification example using the OneR algorithm. This simple algorithm simply finds the best feature and predicts the class that most frequently had this value in the training dataset.

To expand on the outcomes of this chapter, think about how you would implement a variant of OneR that can take multiple feature/value pairs into consideration. Take a shot at implementing your new algorithm and evaluating it. Remember to test your algorithm on a separate dataset to the training data. Otherwise, you run the risk of over fitting your data.

Over the next few chapters, we will expand on the concepts of classification and affinity analysis. We will also introduce classifiers in the scikit-learn package and use them to do our machine learning, rather than writing the algorithms ourselves.

主站蜘蛛池模板: 华宁县| 清流县| 铁岭县| 乌鲁木齐市| 梅州市| 建平县| 峨山| 江陵县| 芜湖县| 尼勒克县| 霍林郭勒市| 南汇区| 冕宁县| 敖汉旗| 德格县| 台南县| 静宁县| 建宁县| 河北区| 崇州市| 米泉市| 融水| 樟树市| 偏关县| 绥德县| 宣化县| 海淀区| 石屏县| 牙克石市| 隆德县| 佳木斯市| 麟游县| 张家港市| 连州市| 延安市| 山西省| 疏附县| 慈溪市| 重庆市| 区。| 潢川县|