官术网_书友最值得收藏!

What this book covers

The first three and a half chapters of the book are focused on the procedural nuts and bolts of a data mining project. This includes creating a data mining Python environment, loading data from a variety of sources, and munging the data for downstream analysis. The remaining content in the book is mostly conceptual, and delivered in a conversational style very close to how I would train a new hire at my company. 

Chapter 1, Data Mining and Getting Started with Python Tools, covers the topic of getting started with your software environment. It also covers how to download and install high-speed Python and popular libraries such as pandasscikit-learn, and seaborn. After reading this chapter and setting up your environment, you should be ready to follow along with the demonstrations throughout the rest of the book. 

Chapter 2, Basic Terminology and our End-to-End Example, covers the basic statistics and data terminology that are required for working in data mining. The final portion of the chapter is dedicated to a full working example, which combined the types of techniques that will be introduced later on in this book. You will also have a better understanding of the thought processes behind analysis and the common steps taken to address a problem statement that you may encounter in the field.

Chapter 3, Collecting, Exploring, and Visualizing Data, covers the basics of loading data from databases, disks, and web sources. It also covers the basic SQL queries, and pandas' access and search functions. The last sections of the chapter introduce the common types of plots using Seaborn.

Chapter 4, Cleaning and Readying Data for Analysis, covers the basics of data cleanup and dimensionality reduction. After reading it, you will understand how to work with missing values, rescale input data, and handle categorical variables. You will also understand the troubles of high-dimensional data, and how to combat this with feature reduction techniques including filter, wrapper, and transformation methods.

Chapter 5, Grouping and Clustering Data, introduces the background and thought processes that goes into designing a clustering algorithm for data mining work. It then introduces common clustering methods in the field and carries out a comparison between all of them with toy datasets. After reading this chapter, you will know the difference between algorithms that cluster based on means separation, density, and connectivity. You will also be able to look at a plot of incoming data and have some intuition on whether clustering will fit your mining project.

Chapter 6Prediction with Regression and Classification, covers the basics behind using a computer to learn prediction models by introducing the loss function and gradient descent. It then introduces the concepts of overfitting, underfitting, and the penalty approach to regularize your model during fits. It also covers common regression and classification techniques, and the regularized versions of each of these where appropriate. The chapter finishes with a discussion of best practices for model tuning, including cross-validation and grid search.

Chapter 7, Advanced Topics – Building a Data Processing Pipeline and Deploying, This chapter covers a strategy for pipe-lining and deploying using built-in Scikit-learn methods. It also introduces the pickle module for model persistence and storage, as well as discussing Python-specific concerns at deployment time.

主站蜘蛛池模板: 苏尼特左旗| 和平区| 林西县| 益阳市| 大渡口区| 贵港市| 南华县| 砚山县| 苍梧县| 普陀区| 伊通| 德庆县| 庆元县| 新和县| 天等县| 泰安市| 普洱| 团风县| 维西| 广西| 马关县| 长阳| 子长县| 库尔勒市| 岳普湖县| 察雅县| 同江市| 漳平市| 安宁市| 天镇县| 长宁区| 图木舒克市| 宁海县| 九寨沟县| 景德镇市| 漳州市| 惠来县| 仙游县| 全椒县| 探索| 喜德县|