舉報

會員
Feature Engineering Made Easy
Ifyouareadatascienceprofessionaloramachinelearningengineerlookingtostrengthenyourpredictiveanalyticsmodel,thenthisbookisaperfectguideforyou.SomebasicunderstandingofthemachinelearningconceptsandPythonscriptingwouldbeenoughtogetstartedwiththisbook.
最新章節
- Leave a review - let other readers know what you think
- Other Books You May Enjoy
- Summary
- Latent semantic analysis
- SVD versus PCA components
- The clustering model
品牌:中圖公司
上架時間:2021-06-25 22:41:30
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Leave a review - let other readers know what you think 更新時間:2021-06-25 22:46:20
- Other Books You May Enjoy
- Summary
- Latent semantic analysis
- SVD versus PCA components
- The clustering model
- Exploration of the data
- Hotel review data
- Applications of text clustering
- Case study 2 - predicting topics of hotel reviews data
- Applied facial recognition
- Some data exploration
- The data
- Applications of facial recognition
- Case study 1 - facial recognition
- Case Studies
- Summary
- Application of word embeddings - information retrieval
- The gensim package for creating Word2vec embeddings
- Word2Vec - another shallow neural network
- Two approaches to word embeddings - Word2vec and GloVe
- Word embeddings
- Learning text features – word vectorizations
- Using a linear model on extracted RBM components
- Using a linear model on extracted PCA components
- Using a linear model on raw pixel values
- Using RBMs in a machine learning pipeline
- Extracting RBM components from MNIST
- Extracting PCA components from MNIST
- The BernoulliRBM
- MNIST dataset
- Reconstructing the data
- The restriction of a Boltzmann Machine
- The graph of a Restricted Boltzmann Machine
- Not necessarily dimension reduction
- Restricted Boltzmann Machines
- The algorithms of this chapter
- Non-parametric fallacy
- Parametric assumptions of data
- Feature Learning
- Summary
- LDA versus PCA – iris dataset
- How to use LDA in scikit-learn
- Using the top eigenvectors to project onto the new space
- Keeping the top k eigenvectors by ordering them by descending eigenvalues
- Calculating eigenvalues and eigenvectors for SW-1SB
- Calculating within-class and between-class scatter matrices
- Calculating the mean vectors of each class
- How LDA works
- Linear Discriminant Analysis
- A deeper look into the principal components
- How centering and scaling data affects PCA
- Scikit-learn's PCA
- Using the kept eigenvectors to transform new data-points
- Keeping the top k eigenvalues (sorted by the descending eigenvalues)
- Calculating the eigenvalues of the covariance matrix
- Creating the covariance matrix of the dataset
- PCA with the Iris dataset – manual example
- How PCA works
- Principal Component Analysis
- Dimension reduction – feature transformations versus feature selection versus feature construction
- Feature Transformations
- Summary
- Choosing the right feature selection method
- Linear model coefficients as another feature importance metric
- A brief introduction to regularization
- Linear models and regularization
- Tree-based model feature selection metrics
- Using machine learning to select features
- A brief refresher on natural language processing
- Model-based feature selection
- Ranking the p-value
- Interpreting the p-value
- Feature selection using hypothesis testing
- Using Pearson correlation to select features
- Statistical-based feature selection
- The types of feature selection
- Creating a baseline machine learning pipeline
- A case study – a credit card defaulting dataset
- Achieving better performance in feature engineering
- Feature Selection
- Summary
- Using text in machine learning pipelines
- The Tf-idf vectorizer
- CountVectorizer parameters
- CountVectorizer
- Bag of words representation
- Text-specific feature construction
- Exploratory data analysis
- Parameters
- Polynomial features
- Activity recognition from the Single Chest-Mounted Accelerometer dataset
- Extending numerical features
- Creating our pipeline
- Bucketing continuous features into categories
- Encoding at the ordinal level
- Encoding at the nominal level
- Encoding categorical variables
- Custom quantitative imputer
- Custom category imputer
- Custom imputers
- Imputing categorical features
- Examining our dataset
- Feature Construction
- Summary
- Putting it all together
- The row normalization method
- The min-max scaling method
- Z-score standardization
- Standardization and normalization
- Pipelines in machine learning
- Imputing values in a machine learning pipeline
- Imputing the missing values in data
- Removing harmful rows of data
- Dealing with missing values in a dataset
- The exploratory data analysis (EDA)
- The Pima Indian Diabetes Prediction dataset
- Identifying missing values in data
- Feature Improvement - Cleaning Datasets
- Summary
- Recap of the levels of data
- Mathematical operations allowed
- The ratio level
- Plotting two columns at the interval level
- Mathematical operations allowed
- The interval level
- Mathematical operations allowed
- The ordinal level
- Mathematical operations allowed
- The nominal level
- The four levels of data
- Salary ranges by job classification
- Quantitative versus qualitative data
- An example of unstructured data – server logs
- The structure or lack thereof of data
- Feature Understanding – What's in My Dataset?
- Summary
- Feature learning – using AI to better our AI
- Feature transformation – enter math-man
- Feature construction – can we build it?
- Feature selection – say no to bad attributes
- Feature improvement – cleaning datasets
- Feature understanding – what’s in my dataset?
- Evaluating unsupervised learning algorithms
- Evaluating supervised learning algorithms
- Steps to evaluate a feature engineering procedure
- Example of feature engineering procedures – can anyone really predict the weather?
- Evaluation of machine learning algorithms and feature engineering procedures
- Unsupervised learning example – marketing segments
- Unsupervised learning
- Supervised learning
- Understanding the basics of data and machine learning
- What is feature engineering?
- Why feature engineering matters
- Motivating example – AI-powered communications
- Introduction to Feature Engineering
- Reviews
- Get in touch
- Conventions used
- Download the color images
- Download the example code files
- To get the most out of this book
- What this book covers
- Who this book is for
- Preface
- Packt is searching for authors like you
- About the reviewer
- About the authors
- Contributors
- PacktPub.com
- Why subscribe?
- Packt Upsell
- 版權信息
- 封面
- 封面
- 版權信息
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the authors
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Introduction to Feature Engineering
- Motivating example – AI-powered communications
- Why feature engineering matters
- What is feature engineering?
- Understanding the basics of data and machine learning
- Supervised learning
- Unsupervised learning
- Unsupervised learning example – marketing segments
- Evaluation of machine learning algorithms and feature engineering procedures
- Example of feature engineering procedures – can anyone really predict the weather?
- Steps to evaluate a feature engineering procedure
- Evaluating supervised learning algorithms
- Evaluating unsupervised learning algorithms
- Feature understanding – what’s in my dataset?
- Feature improvement – cleaning datasets
- Feature selection – say no to bad attributes
- Feature construction – can we build it?
- Feature transformation – enter math-man
- Feature learning – using AI to better our AI
- Summary
- Feature Understanding – What's in My Dataset?
- The structure or lack thereof of data
- An example of unstructured data – server logs
- Quantitative versus qualitative data
- Salary ranges by job classification
- The four levels of data
- The nominal level
- Mathematical operations allowed
- The ordinal level
- Mathematical operations allowed
- The interval level
- Mathematical operations allowed
- Plotting two columns at the interval level
- The ratio level
- Mathematical operations allowed
- Recap of the levels of data
- Summary
- Feature Improvement - Cleaning Datasets
- Identifying missing values in data
- The Pima Indian Diabetes Prediction dataset
- The exploratory data analysis (EDA)
- Dealing with missing values in a dataset
- Removing harmful rows of data
- Imputing the missing values in data
- Imputing values in a machine learning pipeline
- Pipelines in machine learning
- Standardization and normalization
- Z-score standardization
- The min-max scaling method
- The row normalization method
- Putting it all together
- Summary
- Feature Construction
- Examining our dataset
- Imputing categorical features
- Custom imputers
- Custom category imputer
- Custom quantitative imputer
- Encoding categorical variables
- Encoding at the nominal level
- Encoding at the ordinal level
- Bucketing continuous features into categories
- Creating our pipeline
- Extending numerical features
- Activity recognition from the Single Chest-Mounted Accelerometer dataset
- Polynomial features
- Parameters
- Exploratory data analysis
- Text-specific feature construction
- Bag of words representation
- CountVectorizer
- CountVectorizer parameters
- The Tf-idf vectorizer
- Using text in machine learning pipelines
- Summary
- Feature Selection
- Achieving better performance in feature engineering
- A case study – a credit card defaulting dataset
- Creating a baseline machine learning pipeline
- The types of feature selection
- Statistical-based feature selection
- Using Pearson correlation to select features
- Feature selection using hypothesis testing
- Interpreting the p-value
- Ranking the p-value
- Model-based feature selection
- A brief refresher on natural language processing
- Using machine learning to select features
- Tree-based model feature selection metrics
- Linear models and regularization
- A brief introduction to regularization
- Linear model coefficients as another feature importance metric
- Choosing the right feature selection method
- Summary
- Feature Transformations
- Dimension reduction – feature transformations versus feature selection versus feature construction
- Principal Component Analysis
- How PCA works
- PCA with the Iris dataset – manual example
- Creating the covariance matrix of the dataset
- Calculating the eigenvalues of the covariance matrix
- Keeping the top k eigenvalues (sorted by the descending eigenvalues)
- Using the kept eigenvectors to transform new data-points
- Scikit-learn's PCA
- How centering and scaling data affects PCA
- A deeper look into the principal components
- Linear Discriminant Analysis
- How LDA works
- Calculating the mean vectors of each class
- Calculating within-class and between-class scatter matrices
- Calculating eigenvalues and eigenvectors for SW-1SB
- Keeping the top k eigenvectors by ordering them by descending eigenvalues
- Using the top eigenvectors to project onto the new space
- How to use LDA in scikit-learn
- LDA versus PCA – iris dataset
- Summary
- Feature Learning
- Parametric assumptions of data
- Non-parametric fallacy
- The algorithms of this chapter
- Restricted Boltzmann Machines
- Not necessarily dimension reduction
- The graph of a Restricted Boltzmann Machine
- The restriction of a Boltzmann Machine
- Reconstructing the data
- MNIST dataset
- The BernoulliRBM
- Extracting PCA components from MNIST
- Extracting RBM components from MNIST
- Using RBMs in a machine learning pipeline
- Using a linear model on raw pixel values
- Using a linear model on extracted PCA components
- Using a linear model on extracted RBM components
- Learning text features – word vectorizations
- Word embeddings
- Two approaches to word embeddings - Word2vec and GloVe
- Word2Vec - another shallow neural network
- The gensim package for creating Word2vec embeddings
- Application of word embeddings - information retrieval
- Summary
- Case Studies
- Case study 1 - facial recognition
- Applications of facial recognition
- The data
- Some data exploration
- Applied facial recognition
- Case study 2 - predicting topics of hotel reviews data
- Applications of text clustering
- Hotel review data
- Exploration of the data
- The clustering model
- SVD versus PCA components
- Latent semantic analysis
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-25 22:46:20