舉報

會員
Hands-On Data Science and Python Machine Learning
最新章節:
Summary
IfyouareabuddingdatascientistoradataanalystwhowantstoanalyzeandgainactionableinsightsfromdatausingPython,thisbookisforyou.ProgrammerswithsomeexperienceinPythonwhowanttoenterthelucrativeworldofDataSciencewillalsofindthisbooktobeveryuseful,butyoudon'tneedtobeanexpertPythoncoderormathematiciantogetthemostfromthisbook.
目錄(311章)
倒序
- cover
- Title Page
- Copyright
- Hands-On Data Science and Python Machine Learning
- Credits
- About the Author
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Getting Started
- Installing Enthought Canopy
- Giving the installation a test run
- If you occasionally get problems opening your IPNYB files
- Using and understanding IPython (Jupyter) Notebooks
- Python basics - Part 1
- Understanding Python code
- Importing modules
- Data structures
- Experimenting with lists
- Pre colon
- Post colon
- Negative syntax
- Adding list to list
- The append function
- Complex data structures
- Dereferencing a single element
- The sort function
- Reverse sort
- Tuples
- Dereferencing an element
- List of tuples
- Dictionaries
- Iterating through entries
- Python basics - Part 2
- Functions in Python
- Lambda functions - functional programming
- Understanding boolean expressions
- The if statement
- The if-else loop
- Looping
- The while loop
- Exploring activity
- Running Python scripts
- More options than just the IPython/Jupyter Notebook
- Running Python scripts in command prompt
- Using the Canopy IDE
- Summary
- Statistics and Probability Refresher and Python Practice
- Types of data
- Numerical data
- Discrete data
- Continuous data
- Categorical data
- Ordinal data
- Mean median and mode
- Mean
- Median
- The factor of outliers
- Mode
- Using mean median and mode in Python
- Calculating mean using the NumPy package
- Visualizing data using matplotlib
- Calculating median using the NumPy package
- Analyzing the effect of outliers
- Calculating mode using the SciPy package
- Some exercises
- Standard deviation and variance
- Variance
- Measuring variance
- Standard deviation
- Identifying outliers with standard deviation
- Population variance versus sample variance
- The Mathematical explanation
- Analyzing standard deviation and variance on a histogram
- Using Python to compute standard deviation and variance
- Try it yourself
- Probability density function and probability mass function
- The probability density function and probability mass functions
- Probability density functions
- Probability mass functions
- Types of data distributions
- Uniform distribution
- Normal or Gaussian distribution
- The exponential probability distribution or Power law
- Binomial probability mass function
- Poisson probability mass function
- Percentiles and moments
- Percentiles
- Quartiles
- Computing percentiles in Python
- Moments
- Computing moments in Python
- Summary
- Matplotlib and Advanced Probability Concepts
- A crash course in Matplotlib
- Generating multiple plots on one graph
- Saving graphs as images
- Adjusting the axes
- Adding a grid
- Changing line types and colors
- Labeling axes and adding a legend
- A fun example
- Generating pie charts
- Generating bar charts
- Generating scatter plots
- Generating histograms
- Generating box-and-whisker plots
- Try it yourself
- Covariance and correlation
- Defining the concepts
- Measuring covariance
- Correlation
- Computing covariance and correlation in Python
- Computing correlation – The hard way
- Computing correlation – The NumPy way
- Correlation activity
- Conditional probability
- Conditional probability exercises in Python
- Conditional probability assignment
- My assignment solution
- Bayes' theorem
- Summary
- Predictive Models
- Linear regression
- The ordinary least squares technique
- The gradient descent technique
- The co-efficient of determination or r-squared
- Computing r-squared
- Interpreting r-squared
- Computing linear regression and r-squared using Python
- Activity for linear regression
- Polynomial regression
- Implementing polynomial regression using NumPy
- Computing the r-squared error
- Activity for polynomial regression
- Multivariate regression and predicting car prices
- Multivariate regression using Python
- Activity for multivariate regression
- Multi-level models
- Summary
- Machine Learning with Python
- Machine learning and train/test
- Unsupervised learning
- Supervised learning
- Evaluating supervised learning
- K-fold cross validation
- Using train/test to prevent overfitting of a polynomial regression
- Activity
- Bayesian methods - Concepts
- Implementing a spam classifier with Na?ve Bayes
- Activity
- K-Means clustering
- Limitations to k-means clustering
- Clustering people based on income and age
- Activity
- Measuring entropy
- Decision trees - Concepts
- Decision tree example
- Walking through a decision tree
- Random forests technique
- Decision trees - Predicting hiring decisions using Python
- Ensemble learning – Using a random forest
- Activity
- Ensemble learning
- Support vector machine overview
- Using SVM to cluster people by using scikit-learn
- Activity
- Summary
- Recommender Systems
- What are recommender systems?
- User-based collaborative filtering
- Limitations of user-based collaborative filtering
- Item-based collaborative filtering
- Understanding item-based collaborative filtering
- How item-based collaborative filtering works?
- Collaborative filtering using Python
- Finding movie similarities
- Understanding the code
- The corrwith function
- Improving the results of movie similarities
- Making movie recommendations to people
- Understanding movie recommendations with an example
- Using the groupby command to combine rows
- Removing entries with the drop command
- Improving the recommendation results
- Summary
- More Data Mining and Machine Learning Techniques
- K-nearest neighbors - concepts
- Using KNN to predict a rating for a movie
- Activity
- Dimensionality reduction and principal component analysis
- Dimensionality reduction
- Principal component analysis
- A PCA example with the Iris dataset
- Activity
- Data warehousing overview
- ETL versus ELT
- Reinforcement learning
- Q-learning
- The exploration problem
- The simple approach
- The better way
- Fancy words
- Markov decision process
- Dynamic programming
- Summary
- Dealing with Real-World Data
- Bias/variance trade-off
- K-fold cross-validation to avoid overfitting
- Example of k-fold cross-validation using scikit-learn
- Data cleaning and normalisation
- Cleaning web log data
- Applying a regular expression on the web log
- Modification one - filtering the request field
- Modification two - filtering post requests
- Modification three - checking the user agents
- Filtering the activity of spiders/robots
- Modification four - applying website-specific filters
- Activity for web log data
- Normalizing numerical data
- Detecting outliers
- Dealing with outliers
- Activity for outliers
- Summary
- Apache Spark - Machine Learning on Big Data
- Installing Spark
- Installing Spark on Windows
- Installing Spark on other operating systems
- Installing the Java Development Kit
- Installing Spark
- Spark introduction
- It's scalable
- It's fast
- It's young
- It's not difficult
- Components of Spark
- Python versus Scala for Spark
- Spark and Resilient Distributed Datasets (RDD)
- The SparkContext object
- Creating RDDs
- Creating an RDD using a Python list
- Loading an RDD from a text file
- More ways to create RDDs
- RDD operations
- Transformations
- Using map()
- Actions
- Introducing MLlib
- Some MLlib Capabilities
- Special MLlib data types
- The vector data type
- LabeledPoint data type
- Rating data type
- Decision Trees in Spark with MLlib
- Exploring decision trees code
- Creating the SparkContext
- Importing and cleaning our data
- Creating a test candidate and building our decision tree
- Running the script
- K-Means Clustering in Spark
- Within set sum of squared errors (WSSSE)
- Running the code
- TF-IDF
- TF-IDF in practice
- Using TF- IDF
- Searching wikipedia with Spark MLlib
- Import statements
- Creating the initial RDD
- Creating and transforming a HashingTF object
- Computing the TF-IDF score
- Using the Wikipedia search engine algorithm
- Running the algorithm
- Using the Spark 2.0 DataFrame API for MLlib
- How Spark 2.0 MLlib works
- Implementing linear regression
- Summary
- Testing and Experimental Design
- A/B testing concepts
- A/B tests
- Measuring conversion for A/B testing
- How to attribute conversions
- Variance is your enemy
- T-test and p-value
- The t-statistic or t-test
- The p-value
- Measuring t-statistics and p-values using Python
- Running A/B test on some experimental data
- When there's no real difference between the two groups
- Does the sample size make a difference?
- Sample size increased to six-digits
- Sample size increased seven-digits
- A/A testing
- Determining how long to run an experiment for
- A/B test gotchas
- Novelty effects
- Seasonal effects
- Selection bias
- Auditing selection bias issues
- Data pollution
- Attribution errors
- Summary 更新時間:2021-07-15 17:16:18
推薦閱讀
- 軟件界面交互設計基礎
- Windows系統管理與服務配置
- Developing Middleware in Java EE 8
- TypeScript實戰指南
- MySQL數據庫管理與開發實踐教程 (清華電腦學堂)
- ASP.NET 3.5程序設計與項目實踐
- AutoCAD VBA參數化繪圖程序開發與實戰編碼
- C和C++游戲趣味編程
- IBM Cognos Business Intelligence 10.1 Dashboarding cookbook
- MySQL程序員面試筆試寶典
- Hands-On Neural Network Programming with C#
- Python商務數據分析(微課版)
- SignalR:Real-time Application Development(Second Edition)
- Magento 2 Beginners Guide
- OpenCV Android開發實戰
- Python一行流:像專家一樣寫代碼
- jQuery Mobile Web Development Essentials(Second Edition)
- C Primer Plus(第6版)中文版【最新修訂版】
- 計算機軟件項目實訓指導
- Go語言編程之旅:一起用Go做項目
- 零基礎入門學習C語言:帶你學C帶你飛
- C#教程
- Web 2.0策略指南
- C++ Game Development Cookbook
- Python3從入門到實戰
- C/C++程序設計教程:面向對象分冊
- 深入理解Android網絡編程:技術詳解與最佳實踐
- AWS Administration:The Definitive Guide
- Instant IntroJs
- Visual Basic程序設計實驗與考試指導