舉報

會員
Effective Amazon Machine Learning
最新章節(jié):
Summary
Thisbookisintendedfordatascientistsandmanagersofpredictiveanalyticsprojects;itwillteachbeginner-toadvanced-levelmachinelearningpractitionershowtoleverageAmazonMachineLearningandcomplementtheirexistingDataSciencetoolbox.NosubstantivepriorknowledgeofMachineLearning,DataScience,statistics,orcodingisrequired.
目錄(224章)
倒序
- 封面
- 版權(quán)信息
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Customer Feedback
- Dedication
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Errata
- Piracy
- Questions
- Introduction to Machine Learning and Predictive Analytics
- Introducing Amazon Machine Learning
- Machine Learning as a Service
- Leveraging full AWS integration
- Comparing performances
- Engineering data versus model variety
- Amazon's expertise and the gradient descent algorithm
- Pricing
- Understanding predictive analytics
- Building the simplest predictive analytics algorithm
- Regression versus classification
- Expanding regression to classification with logistic regression
- Extracting features to predict outcomes
- Diving further into linear modeling for prediction
- Validating the dataset
- Missing from Amazon ML
- The statistical approach versus the machine learning approach
- Summary
- Machine Learning Definitions and Concepts
- What's an algorithm? What's a model?
- Dealing with messy data
- Classic datasets versus real-world datasets
- Assumptions for multiclass linear models
- Missing values
- Normalization
- Imbalanced datasets
- Addressing multicollinearity
- Detecting outliers
- Accepting non-linear patterns
- Adding features?
- Preprocessing recapitulation
- The predictive analytics workflow
- Training and evaluation in Amazon ML
- Identifying and correcting poor performances
- Underfitting
- Overfitting
- Regularization on linear models
- L2 regularization and Ridge
- L1 regularization and Lasso
- Evaluating the performance of your model
- Summary
- Overview of an Amazon Machine Learning Workflow
- Opening an Amazon Web Services Account
- Security
- Setting up the account
- Creating a user
- Defining policies
- Creating login credentials
- Choosing a region
- Overview of a standard Amazon Machine Learning workflow
- The dataset
- Loading the data on S3
- Declaring a datasource
- Creating the datasource
- The model
- The evaluation of the model
- Comparing with a baseline
- Making batch predictions
- Summary
- Loading and Preparing the Dataset
- Working with datasets
- Finding open datasets
- Introducing the Titanic dataset
- Preparing the data
- Splitting the data
- Loading data on S3
- Creating a bucket
- Loading the data
- Granting permissions
- Formatting the data
- Creating the datasource
- Verifying the data schema
- Reusing the schema
- Examining data statistics
- Feature engineering with Athena
- Introducing Athena
- A brief tour of AWS Athena
- Creating a titanic database
- Using the wizard
- Creating the database and table directly in SQL
- Data munging in SQL
- Missing values
- Handling outliers in the fare
- Extracting the title from the name
- Inferring the deck from the cabin
- Calculating family size
- Wrapping up
- Creating an improved datasource
- Summary
- Model Creation
- Transforming data with recipes
- Managing variables
- Grouping variables
- Naming variables with assignments
- Specifying outputs
- Data processing through seven transformations
- Using simple transformations
- Text mining
- Coupling variables
- Binning numeric values
- Creating a model
- Editing the suggested recipe
- Applying recipes to the Titanic dataset
- Choosing between recipes and data pre-processing.
- Parametrizing the model
- Setting model memory
- Setting the number of data passes
- Choosing regularization
- Creating an evaluation
- Evaluating the model
- Evaluating binary classification
- Exploring the model performances
- Evaluating linear regression
- Evaluating multiclass classification
- Analyzing the logs
- Optimizing the learning rate
- Visualizing convergence
- Impact of regularization
- Comparing different recipes on the Titanic dataset
- Keeping variables as numeric or applying quantile binning?
- Parsing the model logs
- Summary
- Predictions and Performances
- Making batch predictions
- Creating the batch prediction job
- Interpreting prediction outputs
- Reading the manifest file
- Reading the results file
- Assessing our predictions
- Evaluating the held-out dataset
- Finding out who will survive
- Multiplying trials
- Making real-time predictions
- Manually exploring variable influence
- Setting up real-time predictions
- AWS SDK
- Setting up AWS credentials
- AWS access keys
- Setting up AWS CLI
- Python SDK
- Summary
- Command Line and SDK
- Getting started and setting up
- Using the CLI versus SDK
- Installing AWS CLI
- Picking up CLI syntax
- Passing parameters using JSON files
- Introducing the Ames Housing dataset
- Splitting the dataset with shell commands
- A simple project using the CLI
- An overview of Amazon ML CLI commands
- Creating the datasource
- Creating the model
- Evaluating our model with create-evaluation
- What is cross-validation?
- Implementing Monte Carlo cross-validation
- Generating the shuffled datasets
- Generating the datasources template
- Generating the models template
- Generating the evaluations template
- The results
- Conclusion
- Boto3 the Python SDK
- Working with the Python SDK for Amazon Machine Learning
- Waiting on operation completion
- Wrapping up the Python-based workflow
- Implementing recursive feature selection with Boto3
- Managing schema and recipe
- Summary
- Creating Datasources from Redshift
- Choosing between RDS and Redshift
- Creating a Redshift instance
- Connecting through the command line
- Executing Redshift queries using Psql
- Creating our own non-linear dataset
- Uploading the nonlinear data to Redshift
- Introducing polynomial regression
- Establishing a baseline
- Polynomial regression in Amazon ML
- Driving the trials in Python
- Interpreting the results
- Summary
- Building a Streaming Data Analysis Pipeline
- Streaming Twitter sentiment analysis
- Popularity contest on twitter
- The training dataset and the model
- Kinesis
- Kinesis Stream
- Kinesis Analytics
- Setting up Kinesis Firehose
- Producing tweets
- The Redshift database
- Adding Redshift to the Kinesis Firehose
- Setting up the roles and policies
- Dependencies and debugging
- Data format synchronization
- Debugging
- Preprocessing with Lambda
- Analyzing the results
- Download the dataset from RedShift
- Sentiment analysis with TextBlob
- Removing duplicate tweets
- And what is the most popular vegetable?
- Going beyond classification and regression
- Summary 更新時間:2021-07-03 00:18:24
推薦閱讀
- Hands-On Data Structures and Algorithms with Rust
- 數(shù)據(jù)分析實戰(zhàn):基于EXCEL和SPSS系列工具的實踐
- 數(shù)據(jù)挖掘原理與實踐
- Python數(shù)據(jù)挖掘:入門、進階與實用案例分析
- Google Visualization API Essentials
- InfluxDB原理與實戰(zhàn)
- 云計算服務保障體系
- 企業(yè)主數(shù)據(jù)管理實務
- 計算機視覺
- 數(shù)據(jù)庫應用系統(tǒng)技術(shù)
- 數(shù)據(jù)指標體系:構(gòu)建方法與應用實踐
- 數(shù)據(jù)庫原理與設(shè)計實驗教程(MySQL版)
- 企業(yè)大數(shù)據(jù)處理:Spark、Druid、Flume與Kafka應用實踐
- Kubernetes快速進階與實戰(zhàn)
- 數(shù)據(jù)分析方法及應用:基于SPSS和EXCEL環(huán)境
- Tableau商業(yè)分析從新手到高手(視頻版)
- Unity 4.x Game AI Programming
- CORS Essentials
- Visual Studio 2010(C#)Web數(shù)據(jù)庫項目開發(fā)
- Mobile Application Penetration Testing
- SQL Server 2008數(shù)據(jù)庫應用技術(shù)(第三版)
- 劍指大數(shù)據(jù):Flink實時數(shù)據(jù)倉庫項目實戰(zhàn)(電商版)
- 數(shù)字化轉(zhuǎn)型 架構(gòu)與方法
- Hands-On Deep Learning with R
- 技術(shù)人修煉之道:從程序員到百萬高管的72項技能
- 高性能MySQL(第4版)
- 游戲數(shù)據(jù)分析的藝術(shù)
- 數(shù)據(jù)庫云平臺理論與實踐
- Learning SciPy for Numerical and Scientific Computing
- 數(shù)據(jù)庫技術(shù)與應用:Access 2010