官术网_书友最值得收藏!

Preface

Over the past decade, cheaper data storage, faster hardware, and impressive advances in algorithms have combined to pave the way for a rapid ascendance of data science as one of the most important opportunities in computing. While the term data science can include everything from cleaning data and storing data to visualizing it in graphs and charts, the area that has made the most significant gain is the invention of intelligent and sophisticated algorithms for analyzing data. Using computers to find the interesting patterns buried within massive amounts of data is called data mining, an area that encompasses elements of database systems, statistics, and machine learning.

Right now there are dozens of great data mining and machine learning books available for software developers to get up to date on all these advances in the field. What most of these books have in common is that they all cover a small set of tried-and-true methods for finding patterns in data: classification, clustering, decision trees, and regression. Of course, all of these are critically important methods for any data miner to know and they are popular because they can be effective. But these same few techniques are not the whole story. Data mining is a rich field encompassing many dozens of techniques to uncover patterns and make predictions. A true master of data mining should have many tools in her toolbox, not just a few. Thus, the mission of this book, Mastering Data Mining with Python, is to introduce some of the lesser-known data mining concepts that are typically only covered in academic textbooks.

This book uses the Python programming language and a project-based approach to introduce diverse and often overlooked data mining concepts, such as association rules, entity matching, network analysis, text mining, and anomaly detection. Each chapter thoroughly illustrates the basics of one particular data mining technique, provides alternatives for evaluating its effectiveness, and then implements the technique using real-world data.

Our focus on real-world data is another feature of this book that sets it apart from many other data mining books. The true test of whether we have mastered a concept is whether we can apply a method to a new, unknown problem. In our case, this means applying each data mining method to a new problem area or a new data set. The emphasis on real data also means that our results may not always be as clean and tidy as results that come from a canned, example data set. For this reason, each chapter includes a discussion for how to critically evaluate the method. Do the results make sense? What do the results mean? How can the results be improved?

So, in many ways, this book picks up where some of the other data mining books leave off. If you want to round up your growing data mining toolbox with a set of interesting but often overlooked techniques, then read on to learn the specific topics we will cover and how they will be applied in each chapter.

主站蜘蛛池模板: 三门峡市| 称多县| 仙游县| 思茅市| 营口市| 桃源县| 兰州市| 阿克| 滦平县| 拜泉县| 绥芬河市| 长葛市| 柳江县| 辉南县| 商都县| 满城县| 连南| 和田县| 资阳市| 若尔盖县| 濮阳市| 沧州市| 什邡市| 武城县| 邯郸县| 巴中市| 北安市| 青海省| 长武县| 鹰潭市| 平罗县| 库伦旗| 湛江市| 治多县| 寻乌县| 象州县| 宁强县| 寿阳县| 福建省| 思茅市| 孟连|