官术网_书友最值得收藏!

Chapter 1. Expanding Your Data Mining Toolbox

When faced with sensory information, human beings naturally want to find patterns to explain, differentiate, categorize, and predict. This process of looking for patterns all around us is a fundamental human activity, and the human brain is quite good at it. With this skill, our ancient ancestors became better at hunting, gathering, cooking, and organizing. It is no wonder that pattern recognition and pattern prediction were some of the first tasks humans set out to computerize, and this desire continues in earnest today. Depending on the goals of a given project, finding patterns in data using computers nowadays involves database systems, artificial intelligence, statistics, information retrieval, computer vision, and any number of other various subfields of computer science, information systems, mathematics, or business, just to name a few. No matter what we call this activity – knowledge discovery in databases, data mining, data science – its primary mission is always to find interesting patterns.

Despite this humble-sounding mission, data mining has existed for long enough and has built up enough variation in how it is implemented that it has now become a large and complicated field to master. We can think of a cooking school, where every beginner chef is first taught how to boil water and how to use a knife before moving to more advanced skills, such as making puff pastry or deboning a raw chicken. In data mining, we also have common techniques that even the newest data miners will learn: How to build a classifier and how to find clusters in data. The title of this book, however, is Mastering Data Mining with Python, and so, as a mastering-level book, the aim is to teach you some of the techniques you may not have seen in earlier data mining projects.

In this first chapter, we will cover the following topics:

  • What is data mining? We will situate data mining in the growing field of other similar concepts, and we will learn a bit about the history of how this discipline has grown and changed.
  • How do we do data mining? Here, we compare several processes or methodologies commonly used in data mining projects.
  • What are the techniques used in data mining? In this section, we will summarize each of the data analysis techniques that are typically included in a definition of data mining, and we will highlight the more exotic or underappreciated techniques that we will be covering in this mastering-level book.
  • How do we set up a data mining work environment? Finally, we will walk through setting up a Python-based development environment that we will use to complete the projects in the rest of this book.
主站蜘蛛池模板: 许昌市| 秦皇岛市| 六安市| 上林县| 阜康市| 柳州市| 抚远县| 正蓝旗| 铜山县| 花莲县| 彭泽县| 古丈县| 淳安县| 民县| 水城县| 莱州市| 昌乐县| 油尖旺区| 十堰市| 涞源县| 枣阳市| 柘城县| 永城市| 平潭县| 昭平县| 金沙县| 郁南县| 镇康县| 夏河县| 内江市| 茶陵县| 寿宁县| 吉木萨尔县| 依兰县| 始兴县| 渝北区| 梁山县| 信丰县| 鹿邑县| 阜阳市| 泾源县|