舉報

會員
Feature Engineering Made Easy
Ifyouareadatascienceprofessionaloramachinelearningengineerlookingtostrengthenyourpredictiveanalyticsmodel,thenthisbookisaperfectguideforyou.SomebasicunderstandingofthemachinelearningconceptsandPythonscriptingwouldbeenoughtogetstartedwiththisbook.
目錄(174章)
倒序
- 封面
- 版權信息
- Packt Upsell
- Why subscribe?
- PacktPub.com
- Contributors
- About the authors
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Introduction to Feature Engineering
- Motivating example – AI-powered communications
- Why feature engineering matters
- What is feature engineering?
- Understanding the basics of data and machine learning
- Supervised learning
- Unsupervised learning
- Unsupervised learning example – marketing segments
- Evaluation of machine learning algorithms and feature engineering procedures
- Example of feature engineering procedures – can anyone really predict the weather?
- Steps to evaluate a feature engineering procedure
- Evaluating supervised learning algorithms
- Evaluating unsupervised learning algorithms
- Feature understanding – what’s in my dataset?
- Feature improvement – cleaning datasets
- Feature selection – say no to bad attributes
- Feature construction – can we build it?
- Feature transformation – enter math-man
- Feature learning – using AI to better our AI
- Summary
- Feature Understanding – What's in My Dataset?
- The structure or lack thereof of data
- An example of unstructured data – server logs
- Quantitative versus qualitative data
- Salary ranges by job classification
- The four levels of data
- The nominal level
- Mathematical operations allowed
- The ordinal level
- Mathematical operations allowed
- The interval level
- Mathematical operations allowed
- Plotting two columns at the interval level
- The ratio level
- Mathematical operations allowed
- Recap of the levels of data
- Summary
- Feature Improvement - Cleaning Datasets
- Identifying missing values in data
- The Pima Indian Diabetes Prediction dataset
- The exploratory data analysis (EDA)
- Dealing with missing values in a dataset
- Removing harmful rows of data
- Imputing the missing values in data
- Imputing values in a machine learning pipeline
- Pipelines in machine learning
- Standardization and normalization
- Z-score standardization
- The min-max scaling method
- The row normalization method
- Putting it all together
- Summary
- Feature Construction
- Examining our dataset
- Imputing categorical features
- Custom imputers
- Custom category imputer
- Custom quantitative imputer
- Encoding categorical variables
- Encoding at the nominal level
- Encoding at the ordinal level
- Bucketing continuous features into categories
- Creating our pipeline
- Extending numerical features
- Activity recognition from the Single Chest-Mounted Accelerometer dataset
- Polynomial features
- Parameters
- Exploratory data analysis
- Text-specific feature construction
- Bag of words representation
- CountVectorizer
- CountVectorizer parameters
- The Tf-idf vectorizer
- Using text in machine learning pipelines
- Summary
- Feature Selection
- Achieving better performance in feature engineering
- A case study – a credit card defaulting dataset
- Creating a baseline machine learning pipeline
- The types of feature selection
- Statistical-based feature selection
- Using Pearson correlation to select features
- Feature selection using hypothesis testing
- Interpreting the p-value
- Ranking the p-value
- Model-based feature selection
- A brief refresher on natural language processing
- Using machine learning to select features
- Tree-based model feature selection metrics
- Linear models and regularization
- A brief introduction to regularization
- Linear model coefficients as another feature importance metric
- Choosing the right feature selection method
- Summary
- Feature Transformations
- Dimension reduction – feature transformations versus feature selection versus feature construction
- Principal Component Analysis
- How PCA works
- PCA with the Iris dataset – manual example
- Creating the covariance matrix of the dataset
- Calculating the eigenvalues of the covariance matrix
- Keeping the top k eigenvalues (sorted by the descending eigenvalues)
- Using the kept eigenvectors to transform new data-points
- Scikit-learn's PCA
- How centering and scaling data affects PCA
- A deeper look into the principal components
- Linear Discriminant Analysis
- How LDA works
- Calculating the mean vectors of each class
- Calculating within-class and between-class scatter matrices
- Calculating eigenvalues and eigenvectors for SW-1SB
- Keeping the top k eigenvectors by ordering them by descending eigenvalues
- Using the top eigenvectors to project onto the new space
- How to use LDA in scikit-learn
- LDA versus PCA – iris dataset
- Summary
- Feature Learning
- Parametric assumptions of data
- Non-parametric fallacy
- The algorithms of this chapter
- Restricted Boltzmann Machines
- Not necessarily dimension reduction
- The graph of a Restricted Boltzmann Machine
- The restriction of a Boltzmann Machine
- Reconstructing the data
- MNIST dataset
- The BernoulliRBM
- Extracting PCA components from MNIST
- Extracting RBM components from MNIST
- Using RBMs in a machine learning pipeline
- Using a linear model on raw pixel values
- Using a linear model on extracted PCA components
- Using a linear model on extracted RBM components
- Learning text features – word vectorizations
- Word embeddings
- Two approaches to word embeddings - Word2vec and GloVe
- Word2Vec - another shallow neural network
- The gensim package for creating Word2vec embeddings
- Application of word embeddings - information retrieval
- Summary
- Case Studies
- Case study 1 - facial recognition
- Applications of facial recognition
- The data
- Some data exploration
- Applied facial recognition
- Case study 2 - predicting topics of hotel reviews data
- Applications of text clustering
- Hotel review data
- Exploration of the data
- The clustering model
- SVD versus PCA components
- Latent semantic analysis
- Summary
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-25 22:46:20
推薦閱讀
- GitHub Essentials
- Microsoft SQL Server企業級平臺管理實踐
- 圖解機器學習算法
- 數據庫應用基礎教程(Visual FoxPro 9.0)
- Oracle高性能自動化運維
- Starling Game Development Essentials
- 深入淺出 Hyperscan:高性能正則表達式算法原理與設計
- 大數據架構商業之路:從業務需求到技術方案
- 云原生數據中臺:架構、方法論與實踐
- 數據科學工程實踐:用戶行為分析與建模、A/B實驗、SQLFlow
- MySQL數據庫實用教程
- Google Cloud Platform for Architects
- Learning Ansible
- 推薦系統全鏈路設計:原理解讀與業務實踐
- 代碼的未來
- 數據庫基礎與應用
- Flume日志收集與MapReduce模式
- 數據可視化五部曲
- 數據時代的品牌智造
- UnrealScript Game Programming Cookbook
- MySQL必知必會
- Spark大數據處理與分析
- HBase應用實戰與性能調優
- Hands-On Design Patterns with Swift
- 實用推薦系統
- 劍指大數據:企業級數據倉庫項目實戰(電商版)
- 數據庫系統管理應用
- 數據科學技術:文本分析和知識圖譜
- SQL Server 2012數據庫項目教程
- Ethereum Projects for Beginners