舉報

會員
Feature Engineering Made Easy
Ifyouareadatascienceprofessionaloramachinelearningengineerlookingtostrengthenyourpredictiveanalyticsmodel,thenthisbookisaperfectguideforyou.SomebasicunderstandingofthemachinelearningconceptsandPythonscriptingwouldbeenoughtogetstartedwiththisbook.
目錄(174章)
倒序
- 封面
- 版權信息
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the authors
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Introduction to Feature Engineering
- Motivating example – AI-powered communications
- Why feature engineering matters
- What is feature engineering?
- Understanding the basics of data and machine learning
- Supervised learning
- Unsupervised learning
- Unsupervised learning example – marketing segments
- Evaluation of machine learning algorithms and feature engineering procedures
- Example of feature engineering procedures – can anyone really predict the weather?
- Steps to evaluate a feature engineering procedure
- Evaluating supervised learning algorithms
- Evaluating unsupervised learning algorithms
- Feature understanding – what’s in my dataset?
- Feature improvement – cleaning datasets
- Feature selection – say no to bad attributes
- Feature construction – can we build it?
- Feature transformation – enter math-man
- Feature learning – using AI to better our AI
- Summary
- Feature Understanding – What's in My Dataset?
- The structure or lack thereof of data
- An example of unstructured data – server logs
- Quantitative versus qualitative data
- Salary ranges by job classification
- The four levels of data
- The nominal level
- Mathematical operations allowed
- The ordinal level
- Mathematical operations allowed
- The interval level
- Mathematical operations allowed
- Plotting two columns at the interval level
- The ratio level
- Mathematical operations allowed
- Recap of the levels of data
- Summary
- Feature Improvement - Cleaning Datasets
- Identifying missing values in data
- The Pima Indian Diabetes Prediction dataset
- The exploratory data analysis (EDA)
- Dealing with missing values in a dataset
- Removing harmful rows of data
- Imputing the missing values in data
- Imputing values in a machine learning pipeline
- Pipelines in machine learning
- Standardization and normalization
- Z-score standardization
- The min-max scaling method
- The row normalization method
- Putting it all together
- Summary
- Feature Construction
- Examining our dataset
- Imputing categorical features
- Custom imputers
- Custom category imputer
- Custom quantitative imputer
- Encoding categorical variables
- Encoding at the nominal level
- Encoding at the ordinal level
- Bucketing continuous features into categories
- Creating our pipeline
- Extending numerical features
- Activity recognition from the Single Chest-Mounted Accelerometer dataset
- Polynomial features
- Parameters
- Exploratory data analysis
- Text-specific feature construction
- Bag of words representation
- CountVectorizer
- CountVectorizer parameters
- The Tf-idf vectorizer
- Using text in machine learning pipelines
- Summary
- Feature Selection
- Achieving better performance in feature engineering
- A case study – a credit card defaulting dataset
- Creating a baseline machine learning pipeline
- The types of feature selection
- Statistical-based feature selection
- Using Pearson correlation to select features
- Feature selection using hypothesis testing
- Interpreting the p-value
- Ranking the p-value
- Model-based feature selection
- A brief refresher on natural language processing
- Using machine learning to select features
- Tree-based model feature selection metrics
- Linear models and regularization
- A brief introduction to regularization
- Linear model coefficients as another feature importance metric
- Choosing the right feature selection method
- Summary
- Feature Transformations
- Dimension reduction – feature transformations versus feature selection versus feature construction
- Principal Component Analysis
- How PCA works
- PCA with the Iris dataset – manual example
- Creating the covariance matrix of the dataset
- Calculating the eigenvalues of the covariance matrix
- Keeping the top k eigenvalues (sorted by the descending eigenvalues)
- Using the kept eigenvectors to transform new data-points
- Scikit-learn's PCA
- How centering and scaling data affects PCA
- A deeper look into the principal components
- Linear Discriminant Analysis
- How LDA works
- Calculating the mean vectors of each class
- Calculating within-class and between-class scatter matrices
- Calculating eigenvalues and eigenvectors for SW-1SB
- Keeping the top k eigenvectors by ordering them by descending eigenvalues
- Using the top eigenvectors to project onto the new space
- How to use LDA in scikit-learn
- LDA versus PCA – iris dataset
- Summary
- Feature Learning
- Parametric assumptions of data
- Non-parametric fallacy
- The algorithms of this chapter
- Restricted Boltzmann Machines
- Not necessarily dimension reduction
- The graph of a Restricted Boltzmann Machine
- The restriction of a Boltzmann Machine
- Reconstructing the data
- MNIST dataset
- The BernoulliRBM
- Extracting PCA components from MNIST
- Extracting RBM components from MNIST
- Using RBMs in a machine learning pipeline
- Using a linear model on raw pixel values
- Using a linear model on extracted PCA components
- Using a linear model on extracted RBM components
- Learning text features – word vectorizations
- Word embeddings
- Two approaches to word embeddings - Word2vec and GloVe
- Word2Vec - another shallow neural network
- The gensim package for creating Word2vec embeddings
- Application of word embeddings - information retrieval
- Summary
- Case Studies
- Case study 1 - facial recognition
- Applications of facial recognition
- The data
- Some data exploration
- Applied facial recognition
- Case study 2 - predicting topics of hotel reviews data
- Applications of text clustering
- Hotel review data
- Exploration of the data
- The clustering model
- SVD versus PCA components
- Latent semantic analysis
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-25 22:46:20
推薦閱讀
- SQL Server 2008數據庫應用技術(第二版)
- Learning Spring Boot
- Python廣告數據挖掘與分析實戰
- Libgdx Cross/platform Game Development Cookbook
- 業務數據分析:五招破解業務難題
- 商業分析思維與實踐:用數據分析解決商業問題
- Remote Usability Testing
- Spark大數據分析實戰
- 新手學會計(2013-2014實戰升級版)
- 貫通SQL Server 2008數據庫系統開發
- SQL Server 2012實施與管理實戰指南
- Mastering ROS for Robotics Programming(Second Edition)
- Oracle 11g+ASP.NET數據庫系統開發案例教程
- 數據中臺實戰:手把手教你搭建數據中臺
- 區塊鏈應用開發指南:業務場景剖析與實戰
- ECharts數據可視化:入門、實戰與進階
- 碼上行動:利用Python與ChatGPT高效搞定Excel數據分析
- R數據挖掘實戰
- 全球智庫評價報告(2015)
- Learning Game AI Programming with Lua
- MySQL應用實戰與性能調優
- SQL面試寶典:圖解數據庫求職題
- Java Deep Learning Essentials
- Kubernetes云原生數據管理
- 短文本數據理解
- 數據挖掘技術及其在恒星光譜分析中的應用研究
- 數據庫開發技術標準教程
- 高性能MySQL(第4版)
- 云計算與大數據
- 數據架構之道:數據模型設計與管控