舉報

會員
Python Data Mining Quick Start Guide
Dataminingisanecessaryandpredictableresponsetothedawnoftheinformationage.Itistypicallydefinedasthepatternand/ortrenddiscoveryphaseinthedataminingpipeline,andPythonisapopulartoolforperformingthesetasksasitoffersawidevarietyoftoolsfordatamining.ThisbookwillserveasaquickintroductiontotheconceptofdataminingandputtingittopracticalusewiththehelpofpopularPythonpackagesandlibraries.Youwillgetahands-ondemonstrationofworkingwithdifferentreal-worlddatasetsandextractingusefulinsightsfromthemusingpopularPythonlibrariessuchasNumPy,pandas,scikit-learn,andmatplotlib.Youwillthenlearnthedifferentstagesofdataminingsuchasdataloading,cleaning,analysis,andvisualization.Youwillalsogetafullconceptualdescriptionofpopulardatatransformation,clustering,andclassificationtechniques.Bytheendofthisbook,youwillbeabletobuildanefficientdataminingpipelineusingPythonwithoutanyhassle.
目錄(168章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Python Data Mining Quick Start Guide
- Dedication
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Data Mining and Getting Started with Python Tools
- Descriptive predictive and prescriptive analytics
- What will and will not be covered in this book
- Recommended readings for further explanation
- Setting up Python environments for data mining
- Installing the Anaconda distribution and Conda package manager
- Installing on Linux
- Installing on Windows
- Installing on macOS
- Launching the Spyder IDE
- Launching a Jupyter Notebook
- Installing high-performance Python distribution
- Recommended libraries and how to install
- Recommended libraries
- Summary
- Basic Terminology and Our End-to-End Example
- Basic data terminology
- Sample spaces
- Variable types
- Data types
- Basic summary statistics
- An end-to-end example of data mining in Python
- Loading data into memory – viewing and managing with ease using pandas
- Plotting and exploring data – harnessing the power of Seaborn
- Transforming data – PCA and LDA with scikit-learn
- Quantifying separations – k-means clustering and the silhouette score
- Making decisions or predictions
- Summary
- Collecting Exploring and Visualizing Data
- Types of data sources and loading into pandas
- Databases
- Basic Structured Query Language (SQL) queries
- Disks
- Web sources
- From URLs
- From Scikit-learn and Seaborn-included sets
- Access search and sanity checks with pandas
- Basic plotting in Seaborn
- Popular types of plots for visualizing data
- Scatter plots
- Histograms
- Jointplots
- Violin plots
- Pairplots
- Summary
- Cleaning and Readying Data for Analysis
- The scikit-learn transformer API
- Cleaning input data
- Missing values
- Finding and removing missing values
- Imputing to replace the missing values
- Feature scaling
- Normalization
- Standardization
- Handling categorical data
- Ordinal encoding
- One-hot encoding
- Label encoding
- High-dimensional data
- Dimension reduction
- Feature selection
- Feature filtering
- The variance threshold
- The correlation coefficient
- Wrapper methods
- Sequential feature selection
- Transformation
- PCA
- LDA
- Summary
- Grouping and Clustering Data
- Introducing clustering concepts
- Location of the group
- Euclidean space (centroids)
- Non-Euclidean space (medioids)
- Similarity
- Euclidean space
- The Euclidean distance
- The Manhattan distance
- Maximum distance
- Non-Euclidean space
- The cosine distance
- The Jaccard distance
- Termination condition
- With known number of groupings
- Without known number of groupings
- Quality score and silhouette score
- Clustering methods
- Means separation
- K-means
- Finding k
- K-means++
- Mini batch K-means
- Hierarchical clustering
- Reuse the dendrogram to find number of clusters
- Plot dendrogram
- Density clustering
- Spectral clustering
- Summary
- Prediction with Regression and Classification
- Scikit-learn Estimator API
- Introducing prediction concepts
- Prediction nomenclature
- Mathematical machinery
- Loss function
- Gradient descent
- Fit quality regimes
- Regression
- Metrics of regression model prediction
- Regression example dataset
- Linear regression
- Extension to multivariate form
- Regularization with penalized regression
- Regularization penalties
- Classification
- Classification example dataset
- Metrics of classification model prediction
- Multi-class classification
- One-versus-all
- One-versus-one
- Logistic regression
- Regularized logistic regression
- Support vector machines
- Soft-margin with C
- The kernel trick
- Tree-based classification
- Decision trees
- Node splitting with Gini
- Random forest
- Avoid overfitting and speed up the fits
- Built-in validation with bagging
- Tuning a prediction model
- Cross-validation
- Introduction of the validation set
- Multiple validation sets with k-fold method
- Grid search for hyperparameter tuning
- Summary
- Advanced Topics - Building a Data Processing Pipeline and Deploying It
- Pipelining your analysis
- Scikit-learn's pipeline object
- Deploying the model
- Serializing a model and storing with the pickle module
- Loading a serialized model and predicting
- Python-specific deployment concerns
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 15:20:20
推薦閱讀
- 32位嵌入式系統(tǒng)與SoC設(shè)計導(dǎo)論
- 人工智能超越人類
- Ansible Quick Start Guide
- 程序設(shè)計語言與編譯
- 計算機控制技術(shù)
- 自動檢測與傳感技術(shù)
- 數(shù)據(jù)庫原理與應(yīng)用技術(shù)學(xué)習(xí)指導(dǎo)
- Zabbix Network Monitoring(Second Edition)
- Windows游戲程序設(shè)計基礎(chǔ)
- 計算機網(wǎng)絡(luò)原理與技術(shù)
- MATLAB/Simulink權(quán)威指南:開發(fā)環(huán)境、程序設(shè)計、系統(tǒng)仿真與案例實戰(zhàn)
- Dreamweaver CS6精彩網(wǎng)頁制作與網(wǎng)站建設(shè)
- 網(wǎng)絡(luò)脆弱性掃描產(chǎn)品原理及應(yīng)用
- 青少年VEX IQ機器人實訓(xùn)課程(初級)
- 大型機系統(tǒng)應(yīng)用基礎(chǔ)
- 歐姆龍PLC應(yīng)用系統(tǒng)設(shè)計實例精解
- fastText Quick Start Guide
- Data Science with Python
- Deep Learning with PyTorch Quick Start Guide
- 巧學(xué)活用WPS
- 開源技術(shù)選型手冊
- 網(wǎng)絡(luò)工程師必讀:網(wǎng)絡(luò)安全系統(tǒng)設(shè)計
- 數(shù)字孿生技術(shù)與工程實踐:模型+數(shù)據(jù)驅(qū)動的智能系統(tǒng)
- Mastering Docker Enterprise
- J2ME手機游戲設(shè)計與開發(fā)
- 機器學(xué)習(xí)從入門到入職:用sklearn與keras搭建人工智能模型
- Hands-On Data Science with R
- 單片機原理及應(yīng)用技術(shù)
- Internet應(yīng)用(第4版)上機指導(dǎo)與練習(xí)
- Arduino for Kids