舉報

會員
Hands-On Data Science and Python Machine Learning
最新章節:
Summary
IfyouareabuddingdatascientistoradataanalystwhowantstoanalyzeandgainactionableinsightsfromdatausingPython,thisbookisforyou.ProgrammerswithsomeexperienceinPythonwhowanttoenterthelucrativeworldofDataSciencewillalsofindthisbooktobeveryuseful,butyoudon'tneedtobeanexpertPythoncoderormathematiciantogetthemostfromthisbook.
最新章節
- Summary
- Attribution errors
- Data pollution
- Auditing selection bias issues
- Selection bias
- Seasonal effects
品牌:中圖公司
上架時間:2021-07-15 16:58:32
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Summary 更新時間:2021-07-15 17:16:18
- Attribution errors
- Data pollution
- Auditing selection bias issues
- Selection bias
- Seasonal effects
- Novelty effects
- A/B test gotchas
- Determining how long to run an experiment for
- A/A testing
- Sample size increased seven-digits
- Sample size increased to six-digits
- Does the sample size make a difference?
- When there's no real difference between the two groups
- Running A/B test on some experimental data
- Measuring t-statistics and p-values using Python
- The p-value
- The t-statistic or t-test
- T-test and p-value
- Variance is your enemy
- How to attribute conversions
- Measuring conversion for A/B testing
- A/B tests
- A/B testing concepts
- Testing and Experimental Design
- Summary
- Implementing linear regression
- How Spark 2.0 MLlib works
- Using the Spark 2.0 DataFrame API for MLlib
- Running the algorithm
- Using the Wikipedia search engine algorithm
- Computing the TF-IDF score
- Creating and transforming a HashingTF object
- Creating the initial RDD
- Import statements
- Searching wikipedia with Spark MLlib
- Using TF- IDF
- TF-IDF in practice
- TF-IDF
- Running the code
- Within set sum of squared errors (WSSSE)
- K-Means Clustering in Spark
- Running the script
- Creating a test candidate and building our decision tree
- Importing and cleaning our data
- Creating the SparkContext
- Exploring decision trees code
- Decision Trees in Spark with MLlib
- Rating data type
- LabeledPoint data type
- The vector data type
- Special MLlib data types
- Some MLlib Capabilities
- Introducing MLlib
- Actions
- Using map()
- Transformations
- RDD operations
- More ways to create RDDs
- Loading an RDD from a text file
- Creating an RDD using a Python list
- Creating RDDs
- The SparkContext object
- Spark and Resilient Distributed Datasets (RDD)
- Python versus Scala for Spark
- Components of Spark
- It's not difficult
- It's young
- It's fast
- It's scalable
- Spark introduction
- Installing Spark
- Installing the Java Development Kit
- Installing Spark on other operating systems
- Installing Spark on Windows
- Installing Spark
- Apache Spark - Machine Learning on Big Data
- Summary
- Activity for outliers
- Dealing with outliers
- Detecting outliers
- Normalizing numerical data
- Activity for web log data
- Modification four - applying website-specific filters
- Filtering the activity of spiders/robots
- Modification three - checking the user agents
- Modification two - filtering post requests
- Modification one - filtering the request field
- Applying a regular expression on the web log
- Cleaning web log data
- Data cleaning and normalisation
- Example of k-fold cross-validation using scikit-learn
- K-fold cross-validation to avoid overfitting
- Bias/variance trade-off
- Dealing with Real-World Data
- Summary
- Dynamic programming
- Markov decision process
- Fancy words
- The better way
- The simple approach
- The exploration problem
- Q-learning
- Reinforcement learning
- ETL versus ELT
- Data warehousing overview
- Activity
- A PCA example with the Iris dataset
- Principal component analysis
- Dimensionality reduction
- Dimensionality reduction and principal component analysis
- Activity
- Using KNN to predict a rating for a movie
- K-nearest neighbors - concepts
- More Data Mining and Machine Learning Techniques
- Summary
- Improving the recommendation results
- Removing entries with the drop command
- Using the groupby command to combine rows
- Understanding movie recommendations with an example
- Making movie recommendations to people
- Improving the results of movie similarities
- The corrwith function
- Understanding the code
- Finding movie similarities
- Collaborative filtering using Python
- How item-based collaborative filtering works?
- Understanding item-based collaborative filtering
- Item-based collaborative filtering
- Limitations of user-based collaborative filtering
- User-based collaborative filtering
- What are recommender systems?
- Recommender Systems
- Summary
- Activity
- Using SVM to cluster people by using scikit-learn
- Support vector machine overview
- Ensemble learning
- Activity
- Ensemble learning – Using a random forest
- Decision trees - Predicting hiring decisions using Python
- Random forests technique
- Walking through a decision tree
- Decision tree example
- Decision trees - Concepts
- Measuring entropy
- Activity
- Clustering people based on income and age
- Limitations to k-means clustering
- K-Means clustering
- Activity
- Implementing a spam classifier with Na?ve Bayes
- Bayesian methods - Concepts
- Activity
- Using train/test to prevent overfitting of a polynomial regression
- K-fold cross validation
- Evaluating supervised learning
- Supervised learning
- Unsupervised learning
- Machine learning and train/test
- Machine Learning with Python
- Summary
- Multi-level models
- Activity for multivariate regression
- Multivariate regression using Python
- Multivariate regression and predicting car prices
- Activity for polynomial regression
- Computing the r-squared error
- Implementing polynomial regression using NumPy
- Polynomial regression
- Activity for linear regression
- Computing linear regression and r-squared using Python
- Interpreting r-squared
- Computing r-squared
- The co-efficient of determination or r-squared
- The gradient descent technique
- The ordinary least squares technique
- Linear regression
- Predictive Models
- Summary
- Bayes' theorem
- My assignment solution
- Conditional probability assignment
- Conditional probability exercises in Python
- Conditional probability
- Correlation activity
- Computing correlation – The NumPy way
- Computing correlation – The hard way
- Computing covariance and correlation in Python
- Correlation
- Measuring covariance
- Defining the concepts
- Covariance and correlation
- Try it yourself
- Generating box-and-whisker plots
- Generating histograms
- Generating scatter plots
- Generating bar charts
- Generating pie charts
- A fun example
- Labeling axes and adding a legend
- Changing line types and colors
- Adding a grid
- Adjusting the axes
- Saving graphs as images
- Generating multiple plots on one graph
- A crash course in Matplotlib
- Matplotlib and Advanced Probability Concepts
- Summary
- Computing moments in Python
- Moments
- Computing percentiles in Python
- Quartiles
- Percentiles
- Percentiles and moments
- Poisson probability mass function
- Binomial probability mass function
- The exponential probability distribution or Power law
- Normal or Gaussian distribution
- Uniform distribution
- Types of data distributions
- Probability mass functions
- Probability density functions
- The probability density function and probability mass functions
- Probability density function and probability mass function
- Try it yourself
- Using Python to compute standard deviation and variance
- Analyzing standard deviation and variance on a histogram
- The Mathematical explanation
- Population variance versus sample variance
- Identifying outliers with standard deviation
- Standard deviation
- Measuring variance
- Variance
- Standard deviation and variance
- Some exercises
- Calculating mode using the SciPy package
- Analyzing the effect of outliers
- Calculating median using the NumPy package
- Visualizing data using matplotlib
- Calculating mean using the NumPy package
- Using mean median and mode in Python
- Mode
- The factor of outliers
- Median
- Mean
- Mean median and mode
- Ordinal data
- Categorical data
- Continuous data
- Discrete data
- Numerical data
- Types of data
- Statistics and Probability Refresher and Python Practice
- Summary
- Using the Canopy IDE
- Running Python scripts in command prompt
- More options than just the IPython/Jupyter Notebook
- Running Python scripts
- Exploring activity
- The while loop
- Looping
- The if-else loop
- The if statement
- Understanding boolean expressions
- Lambda functions - functional programming
- Functions in Python
- Python basics - Part 2
- Iterating through entries
- Dictionaries
- List of tuples
- Dereferencing an element
- Tuples
- Reverse sort
- The sort function
- Dereferencing a single element
- Complex data structures
- The append function
- Adding list to list
- Negative syntax
- Post colon
- Pre colon
- Experimenting with lists
- Data structures
- Importing modules
- Understanding Python code
- Python basics - Part 1
- Using and understanding IPython (Jupyter) Notebooks
- If you occasionally get problems opening your IPNYB files
- Giving the installation a test run
- Installing Enthought Canopy
- Getting Started
- Questions
- Piracy
- Errata
- Downloading the color images of this book
- Downloading the example code
- Customer support
- Reader feedback
- Conventions
- Who this book is for
- Preface
- Customer Feedback
- Why subscribe?
- www.PacktPub.com
- About the Author
- Credits
- Hands-On Data Science and Python Machine Learning
- Copyright
- Title Page
- cover
- cover
- Title Page
- Copyright
- Hands-On Data Science and Python Machine Learning
- Credits
- About the Author
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Getting Started
- Installing Enthought Canopy
- Giving the installation a test run
- If you occasionally get problems opening your IPNYB files
- Using and understanding IPython (Jupyter) Notebooks
- Python basics - Part 1
- Understanding Python code
- Importing modules
- Data structures
- Experimenting with lists
- Pre colon
- Post colon
- Negative syntax
- Adding list to list
- The append function
- Complex data structures
- Dereferencing a single element
- The sort function
- Reverse sort
- Tuples
- Dereferencing an element
- List of tuples
- Dictionaries
- Iterating through entries
- Python basics - Part 2
- Functions in Python
- Lambda functions - functional programming
- Understanding boolean expressions
- The if statement
- The if-else loop
- Looping
- The while loop
- Exploring activity
- Running Python scripts
- More options than just the IPython/Jupyter Notebook
- Running Python scripts in command prompt
- Using the Canopy IDE
- Summary
- Statistics and Probability Refresher and Python Practice
- Types of data
- Numerical data
- Discrete data
- Continuous data
- Categorical data
- Ordinal data
- Mean median and mode
- Mean
- Median
- The factor of outliers
- Mode
- Using mean median and mode in Python
- Calculating mean using the NumPy package
- Visualizing data using matplotlib
- Calculating median using the NumPy package
- Analyzing the effect of outliers
- Calculating mode using the SciPy package
- Some exercises
- Standard deviation and variance
- Variance
- Measuring variance
- Standard deviation
- Identifying outliers with standard deviation
- Population variance versus sample variance
- The Mathematical explanation
- Analyzing standard deviation and variance on a histogram
- Using Python to compute standard deviation and variance
- Try it yourself
- Probability density function and probability mass function
- The probability density function and probability mass functions
- Probability density functions
- Probability mass functions
- Types of data distributions
- Uniform distribution
- Normal or Gaussian distribution
- The exponential probability distribution or Power law
- Binomial probability mass function
- Poisson probability mass function
- Percentiles and moments
- Percentiles
- Quartiles
- Computing percentiles in Python
- Moments
- Computing moments in Python
- Summary
- Matplotlib and Advanced Probability Concepts
- A crash course in Matplotlib
- Generating multiple plots on one graph
- Saving graphs as images
- Adjusting the axes
- Adding a grid
- Changing line types and colors
- Labeling axes and adding a legend
- A fun example
- Generating pie charts
- Generating bar charts
- Generating scatter plots
- Generating histograms
- Generating box-and-whisker plots
- Try it yourself
- Covariance and correlation
- Defining the concepts
- Measuring covariance
- Correlation
- Computing covariance and correlation in Python
- Computing correlation – The hard way
- Computing correlation – The NumPy way
- Correlation activity
- Conditional probability
- Conditional probability exercises in Python
- Conditional probability assignment
- My assignment solution
- Bayes' theorem
- Summary
- Predictive Models
- Linear regression
- The ordinary least squares technique
- The gradient descent technique
- The co-efficient of determination or r-squared
- Computing r-squared
- Interpreting r-squared
- Computing linear regression and r-squared using Python
- Activity for linear regression
- Polynomial regression
- Implementing polynomial regression using NumPy
- Computing the r-squared error
- Activity for polynomial regression
- Multivariate regression and predicting car prices
- Multivariate regression using Python
- Activity for multivariate regression
- Multi-level models
- Summary
- Machine Learning with Python
- Machine learning and train/test
- Unsupervised learning
- Supervised learning
- Evaluating supervised learning
- K-fold cross validation
- Using train/test to prevent overfitting of a polynomial regression
- Activity
- Bayesian methods - Concepts
- Implementing a spam classifier with Na?ve Bayes
- Activity
- K-Means clustering
- Limitations to k-means clustering
- Clustering people based on income and age
- Activity
- Measuring entropy
- Decision trees - Concepts
- Decision tree example
- Walking through a decision tree
- Random forests technique
- Decision trees - Predicting hiring decisions using Python
- Ensemble learning – Using a random forest
- Activity
- Ensemble learning
- Support vector machine overview
- Using SVM to cluster people by using scikit-learn
- Activity
- Summary
- Recommender Systems
- What are recommender systems?
- User-based collaborative filtering
- Limitations of user-based collaborative filtering
- Item-based collaborative filtering
- Understanding item-based collaborative filtering
- How item-based collaborative filtering works?
- Collaborative filtering using Python
- Finding movie similarities
- Understanding the code
- The corrwith function
- Improving the results of movie similarities
- Making movie recommendations to people
- Understanding movie recommendations with an example
- Using the groupby command to combine rows
- Removing entries with the drop command
- Improving the recommendation results
- Summary
- More Data Mining and Machine Learning Techniques
- K-nearest neighbors - concepts
- Using KNN to predict a rating for a movie
- Activity
- Dimensionality reduction and principal component analysis
- Dimensionality reduction
- Principal component analysis
- A PCA example with the Iris dataset
- Activity
- Data warehousing overview
- ETL versus ELT
- Reinforcement learning
- Q-learning
- The exploration problem
- The simple approach
- The better way
- Fancy words
- Markov decision process
- Dynamic programming
- Summary
- Dealing with Real-World Data
- Bias/variance trade-off
- K-fold cross-validation to avoid overfitting
- Example of k-fold cross-validation using scikit-learn
- Data cleaning and normalisation
- Cleaning web log data
- Applying a regular expression on the web log
- Modification one - filtering the request field
- Modification two - filtering post requests
- Modification three - checking the user agents
- Filtering the activity of spiders/robots
- Modification four - applying website-specific filters
- Activity for web log data
- Normalizing numerical data
- Detecting outliers
- Dealing with outliers
- Activity for outliers
- Summary
- Apache Spark - Machine Learning on Big Data
- Installing Spark
- Installing Spark on Windows
- Installing Spark on other operating systems
- Installing the Java Development Kit
- Installing Spark
- Spark introduction
- It's scalable
- It's fast
- It's young
- It's not difficult
- Components of Spark
- Python versus Scala for Spark
- Spark and Resilient Distributed Datasets (RDD)
- The SparkContext object
- Creating RDDs
- Creating an RDD using a Python list
- Loading an RDD from a text file
- More ways to create RDDs
- RDD operations
- Transformations
- Using map()
- Actions
- Introducing MLlib
- Some MLlib Capabilities
- Special MLlib data types
- The vector data type
- LabeledPoint data type
- Rating data type
- Decision Trees in Spark with MLlib
- Exploring decision trees code
- Creating the SparkContext
- Importing and cleaning our data
- Creating a test candidate and building our decision tree
- Running the script
- K-Means Clustering in Spark
- Within set sum of squared errors (WSSSE)
- Running the code
- TF-IDF
- TF-IDF in practice
- Using TF- IDF
- Searching wikipedia with Spark MLlib
- Import statements
- Creating the initial RDD
- Creating and transforming a HashingTF object
- Computing the TF-IDF score
- Using the Wikipedia search engine algorithm
- Running the algorithm
- Using the Spark 2.0 DataFrame API for MLlib
- How Spark 2.0 MLlib works
- Implementing linear regression
- Summary
- Testing and Experimental Design
- A/B testing concepts
- A/B tests
- Measuring conversion for A/B testing
- How to attribute conversions
- Variance is your enemy
- T-test and p-value
- The t-statistic or t-test
- The p-value
- Measuring t-statistics and p-values using Python
- Running A/B test on some experimental data
- When there's no real difference between the two groups
- Does the sample size make a difference?
- Sample size increased to six-digits
- Sample size increased seven-digits
- A/A testing
- Determining how long to run an experiment for
- A/B test gotchas
- Novelty effects
- Seasonal effects
- Selection bias
- Auditing selection bias issues
- Data pollution
- Attribution errors
- Summary 更新時間:2021-07-15 17:16:18