舉報

會員
Python Data Mining Quick Start Guide
Dataminingisanecessaryandpredictableresponsetothedawnoftheinformationage.Itistypicallydefinedasthepatternand/ortrenddiscoveryphaseinthedataminingpipeline,andPythonisapopulartoolforperformingthesetasksasitoffersawidevarietyoftoolsfordatamining.ThisbookwillserveasaquickintroductiontotheconceptofdataminingandputtingittopracticalusewiththehelpofpopularPythonpackagesandlibraries.Youwillgetahands-ondemonstrationofworkingwithdifferentreal-worlddatasetsandextractingusefulinsightsfromthemusingpopularPythonlibrariessuchasNumPy,pandas,scikit-learn,andmatplotlib.Youwillthenlearnthedifferentstagesofdataminingsuchasdataloading,cleaning,analysis,andvisualization.Youwillalsogetafullconceptualdescriptionofpopulardatatransformation,clustering,andclassificationtechniques.Bytheendofthisbook,youwillbeabletobuildanefficientdataminingpipelineusingPythonwithoutanyhassle.
目錄(168章)
倒序
- coverpage
- Title Page
- Copyright and Credits
- Python Data Mining Quick Start Guide
- Dedication
- About Packt
- Why subscribe?
- Packt.com
- Contributors
- About the author
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Data Mining and Getting Started with Python Tools
- Descriptive predictive and prescriptive analytics
- What will and will not be covered in this book
- Recommended readings for further explanation
- Setting up Python environments for data mining
- Installing the Anaconda distribution and Conda package manager
- Installing on Linux
- Installing on Windows
- Installing on macOS
- Launching the Spyder IDE
- Launching a Jupyter Notebook
- Installing high-performance Python distribution
- Recommended libraries and how to install
- Recommended libraries
- Summary
- Basic Terminology and Our End-to-End Example
- Basic data terminology
- Sample spaces
- Variable types
- Data types
- Basic summary statistics
- An end-to-end example of data mining in Python
- Loading data into memory – viewing and managing with ease using pandas
- Plotting and exploring data – harnessing the power of Seaborn
- Transforming data – PCA and LDA with scikit-learn
- Quantifying separations – k-means clustering and the silhouette score
- Making decisions or predictions
- Summary
- Collecting Exploring and Visualizing Data
- Types of data sources and loading into pandas
- Databases
- Basic Structured Query Language (SQL) queries
- Disks
- Web sources
- From URLs
- From Scikit-learn and Seaborn-included sets
- Access search and sanity checks with pandas
- Basic plotting in Seaborn
- Popular types of plots for visualizing data
- Scatter plots
- Histograms
- Jointplots
- Violin plots
- Pairplots
- Summary
- Cleaning and Readying Data for Analysis
- The scikit-learn transformer API
- Cleaning input data
- Missing values
- Finding and removing missing values
- Imputing to replace the missing values
- Feature scaling
- Normalization
- Standardization
- Handling categorical data
- Ordinal encoding
- One-hot encoding
- Label encoding
- High-dimensional data
- Dimension reduction
- Feature selection
- Feature filtering
- The variance threshold
- The correlation coefficient
- Wrapper methods
- Sequential feature selection
- Transformation
- PCA
- LDA
- Summary
- Grouping and Clustering Data
- Introducing clustering concepts
- Location of the group
- Euclidean space (centroids)
- Non-Euclidean space (medioids)
- Similarity
- Euclidean space
- The Euclidean distance
- The Manhattan distance
- Maximum distance
- Non-Euclidean space
- The cosine distance
- The Jaccard distance
- Termination condition
- With known number of groupings
- Without known number of groupings
- Quality score and silhouette score
- Clustering methods
- Means separation
- K-means
- Finding k
- K-means++
- Mini batch K-means
- Hierarchical clustering
- Reuse the dendrogram to find number of clusters
- Plot dendrogram
- Density clustering
- Spectral clustering
- Summary
- Prediction with Regression and Classification
- Scikit-learn Estimator API
- Introducing prediction concepts
- Prediction nomenclature
- Mathematical machinery
- Loss function
- Gradient descent
- Fit quality regimes
- Regression
- Metrics of regression model prediction
- Regression example dataset
- Linear regression
- Extension to multivariate form
- Regularization with penalized regression
- Regularization penalties
- Classification
- Classification example dataset
- Metrics of classification model prediction
- Multi-class classification
- One-versus-all
- One-versus-one
- Logistic regression
- Regularized logistic regression
- Support vector machines
- Soft-margin with C
- The kernel trick
- Tree-based classification
- Decision trees
- Node splitting with Gini
- Random forest
- Avoid overfitting and speed up the fits
- Built-in validation with bagging
- Tuning a prediction model
- Cross-validation
- Introduction of the validation set
- Multiple validation sets with k-fold method
- Grid search for hyperparameter tuning
- Summary
- Advanced Topics - Building a Data Processing Pipeline and Deploying It
- Pipelining your analysis
- Scikit-learn's pipeline object
- Deploying the model
- Serializing a model and storing with the pickle module
- Loading a serialized model and predicting
- Python-specific deployment concerns
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 15:20:20
推薦閱讀
- 大學計算機信息技術導論
- Practical Ansible 2
- IoT Penetration Testing Cookbook
- 模型制作
- AWS Certified SysOps Administrator:Associate Guide
- Blender Compositing and Post Processing
- Photoshop行業(yè)應用基礎
- 人工智能:語言智能處理
- Photoshop CS5圖像處理入門、進階與提高
- 網(wǎng)絡服務器搭建與管理
- 基于RPA技術財務機器人的應用與研究
- 大數(shù)據(jù)導論
- 計算機辦公應用培訓教程
- 計算機應用基礎學習指導與練習(Windows XP+Office 2003)
- Wireshark Revealed:Essential Skills for IT Professionals
- 數(shù)據(jù)庫技術及應用
- AI成“神”之日:人工智能的終極演變
- 新手學Photoshop CS6數(shù)碼照片處理
- 數(shù)字媒體交互設計原理與方法
- 智能儀器基礎
- 生成式AI與新質(zhì)內(nèi)容生產(chǎn)力:從理論解讀到實際應用
- Implementing Azure Cloud Design Patterns
- Splunk 7.x Quick Start Guide
- 工業(yè)控制系統(tǒng)及應用:SCADA系統(tǒng)篇(第2版)
- Flink基礎教程
- 網(wǎng)絡硬件搭建與配置實踐
- 移動機器人導航定位技術
- 伺服控制技術自學手冊
- Flex3.0 RIA開發(fā)詳解
- Practical OneOps