舉報

會員
Hands-On Exploratory Data Analysis with Python
ExploratoryDataAnalysis(EDA)isanapproachtodataanalysisthatinvolvestheapplicationofdiversetechniquestogaininsightsintoadataset.ThisbookwillhelpyougainpracticalknowledgeofthemainpillarsofEDA-datacleaning,datapreparation,dataexploration,anddatavisualization.You’llstartbyperformingEDAusingopensourcedatasetsandperformsimpletoadvancedanalysestoturndataintomeaningfulinsights.You’llthenlearnvariousdescriptivestatisticaltechniquestodescribethebasiccharacteristicsofdataandprogresstoperformingEDAontime-seriesdata.Asyouadvance,you’lllearnhowtoimplementEDAtechniquesformodeldevelopmentandevaluationandbuildpredictivemodelstovisualizeresults.UsingPythonfordataanalysis,you’llworkwithreal-worlddatasets,understanddata,summarizeitscharacteristics,andvisualizeitforbusinessintelligence.BytheendofthisEDAbook,you’llhavedevelopedtheskillsrequiredtocarryoutapreliminaryinvestigationonanydataset,yieldinsightsintodata,presentyourresultswithvisualaids,andbuildamodelthatcorrectlypredictsfutureoutcomes.
最新章節(jié)
- Leave a review - let other readers know what you think
- Other Books You May Enjoy
- Further reading
- Using regular expressions
- Using string functions with a pandas DataFrame
- Using pandas vectorized string functions
品牌:中圖公司
上架時間:2021-06-24 15:29:00
出版社:Packt Publishing
本書數(shù)字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發(fā)行
- Leave a review - let other readers know what you think 更新時間:2021-06-24 16:45:36
- Other Books You May Enjoy
- Further reading
- Using regular expressions
- Using string functions with a pandas DataFrame
- Using pandas vectorized string functions
- Formatting strings
- Escape sequencing in Python
- Deleting/updating from a string
- String slicing
- Accessing characters in Python
- Creating strings
- String manipulation
- Appendix
- Further reading
- Summary
- Model development and evaluation
- 3-D visualization
- Discrete categorical attributes
- Multivariate analysis on the combined dataframe
- Univariate analysis
- Grouping columns
- Concatenating dataframes
- Converting into a categorical column
- Adding a new attribute
- Red wine versus white wine
- Analyzing white wine
- Alcohol versus pH
- Alcohol versus quality
- Finding correlated columns
- Analyzing red wine
- Data wrangling
- Descriptive statistics
- Loading the dataset
- Disclosing the wine quality dataset
- Technical requirements
- EDA on Wine Quality Data Analysis
- Further reading
- Summary
- Model deployment
- Best model selection and evaluation
- Model evaluation
- Model creation and training
- Training sets and corpus creation
- Data preparation
- Data cleaning normalization and transformation
- Data analysis
- Data collection
- Data preprocessing
- Unified machine learning workflow
- Applications of reinforcement learning
- Difference between supervised and reinforcement learning
- Understanding reinforcement learning
- Word cloud
- Plotting clusters
- Extracting keywords
- Clustering using MiniBatch K-means clustering
- Applications of unsupervised learning
- Understanding unsupervised learning
- Classification
- Regression
- Understanding supervised learning
- Types of machine learning
- Technical requirements
- Model Development and Evaluation
- Further reading
- Summary
- Implementing a multiple linear regression model
- Understanding accuracy
- Computing accuracy
- Model evaluation
- Constructing a linear regression model
- Model development and evaluation
- Nonlinear regression
- Multiple linear regression
- Simple linear regression
- Types of regression
- Understanding regression
- p-hacking
- T-test
- Types of hypothesis testing
- Average reading time
- statsmodels library
- Hypothesis testing principle
- Hypothesis testing
- Technical requirements
- Hypothesis Testing and Regression
- Section 3: Model Development and Evaluation
- Further reading
- Summary
- Resampling time series data
- Grouping time series data
- Visualizing time series
- Time-based indexing
- Data cleaning
- TSA with Open Power System Data
- Characteristics of time series data
- Univariate time series
- Fundamentals of TSA
- Understanding the time series dataset
- Technical requirements
- Time Series Analysis
- Further reading
- Summary
- Correlation does not imply causation
- Outlining Simpson's paradox
- Discussing multivariate analysis using the Titanic dataset
- Understanding multivariate analysis
- Understanding bivariate analysis
- Understanding univariate analysis
- Types of analysis
- Introducing correlation
- Technical requirements
- Correlation
- Further reading
- Summary
- Cross-tabulations
- Pivot tables
- Pivot tables and cross-tabulations
- Group-wise transformations
- Renaming grouped aggregation columns
- Group-wise operations
- Data aggregation
- Mean
- Max and min
- Selecting a subset of columns
- Groupby mechanics
- Understanding groupby()
- Technical requirements
- Grouping Datasets
- Further reading
- Summary
- Visualizing quartiles
- Quartiles
- Calculating percentiles
- Types of kurtosis
- Kurtosis
- Skewness
- Variance
- Standard deviation
- Measures of dispersion
- Mode
- Median
- Mean/average
- Measures of central tendency
- Descriptive statistics
- Cumulative distribution function
- Binomial distribution
- Exponential distribution
- Normal distribution
- Uniform distribution
- Distribution function
- Understanding statistics
- Technical requirements
- Descriptive Statistics
- Section 2: Descriptive Statistics
- Further reading
- Summary
- Challenges
- Benefits of data transformation
- String manipulation
- Computing indicators/dummy variables
- Random sampling with replacement
- Random sampling without replacement
- Permutation and random sampling
- Outlier detection and filtering
- Discretization and binning
- Renaming axis indexes
- Interpolating missing values
- Backward and forward filling
- Filling missing values
- Mathematical operations with NaN
- Dropping by columns
- Dropping by rows
- Dropping missing values
- NaN values in pandas objects
- Handling missing data
- Replacing values
- Performing data deduplication
- Transformation techniques
- Reshaping and pivoting
- Merging on index
- Using pd.merge() methods with outer join
- Using the pd.merge() method with a right join
- Using the pd.merge() method with a left join
- Using df.merge with an inner join
- Concatenating along with an axis
- Merging database-style dataframes
- Background
- Technical requirements
- Data Transformation
- Further reading
- Summary
- Most frequently used words
- Number of emails per day
- Average emails per day and hour
- Time of day
- Number of emails
- Data analysis
- Refactoring timezones
- Dropping columns
- Data refactoring
- Applying descriptive statistics
- Removing NaN values
- Converting the date
- Loading the CSV file
- Data cleansing
- Data transformation
- Loading the dataset
- Technical requirements
- EDA with Personal Email
- Further reading
- Summary
- Other libraries to explore
- Choosing the best chart
- Lollipop chart
- Histogram
- Polar chart
- Table chart
- Pie chart
- Area plot and stacked plot
- Scatter plot using seaborn
- Bubble chart
- Scatter plot
- Bar charts
- Steps involved
- Line chart
- Technical requirements
- Visual Aids for EDA
- Further reading
- Summary
- Matplotlib
- SciPy
- Pandas
- NumPy
- Getting started with EDA
- Software tools available for EDA
- Comparing EDA with classical and Bayesian analysis
- Ratio
- Interval
- Ordinal
- Nominal
- Measurement scales
- Categorical data
- Continuous data
- Discrete data
- Numerical data
- Making sense of data
- Steps in EDA
- The significance of EDA
- Understanding data science
- Exploratory Data Analysis Fundamentals
- Section 1: The Fundamentals of EDA
- Reviews
- Get in touch
- Conventions used
- Download the color images
- Download the example code files
- To get the most out of this book
- What this book covers
- Who this book is for
- Preface
- Packt is searching for authors like you
- About the reviewer
- About the authors
- Contributors
- Why subscribe?
- About Packt
- Hands-On Exploratory Data Analysis with Python
- Copyright and Credits
- Title Page
- 封面
- 封面
- Title Page
- Copyright and Credits
- Hands-On Exploratory Data Analysis with Python
- About Packt
- Why subscribe?
- Contributors
- About the authors
- About the reviewer
- Packt is searching for authors like you
- Preface
- Who this book is for
- What this book covers
- To get the most out of this book
- Download the example code files
- Download the color images
- Conventions used
- Get in touch
- Reviews
- Section 1: The Fundamentals of EDA
- Exploratory Data Analysis Fundamentals
- Understanding data science
- The significance of EDA
- Steps in EDA
- Making sense of data
- Numerical data
- Discrete data
- Continuous data
- Categorical data
- Measurement scales
- Nominal
- Ordinal
- Interval
- Ratio
- Comparing EDA with classical and Bayesian analysis
- Software tools available for EDA
- Getting started with EDA
- NumPy
- Pandas
- SciPy
- Matplotlib
- Summary
- Further reading
- Visual Aids for EDA
- Technical requirements
- Line chart
- Steps involved
- Bar charts
- Scatter plot
- Bubble chart
- Scatter plot using seaborn
- Area plot and stacked plot
- Pie chart
- Table chart
- Polar chart
- Histogram
- Lollipop chart
- Choosing the best chart
- Other libraries to explore
- Summary
- Further reading
- EDA with Personal Email
- Technical requirements
- Loading the dataset
- Data transformation
- Data cleansing
- Loading the CSV file
- Converting the date
- Removing NaN values
- Applying descriptive statistics
- Data refactoring
- Dropping columns
- Refactoring timezones
- Data analysis
- Number of emails
- Time of day
- Average emails per day and hour
- Number of emails per day
- Most frequently used words
- Summary
- Further reading
- Data Transformation
- Technical requirements
- Background
- Merging database-style dataframes
- Concatenating along with an axis
- Using df.merge with an inner join
- Using the pd.merge() method with a left join
- Using the pd.merge() method with a right join
- Using pd.merge() methods with outer join
- Merging on index
- Reshaping and pivoting
- Transformation techniques
- Performing data deduplication
- Replacing values
- Handling missing data
- NaN values in pandas objects
- Dropping missing values
- Dropping by rows
- Dropping by columns
- Mathematical operations with NaN
- Filling missing values
- Backward and forward filling
- Interpolating missing values
- Renaming axis indexes
- Discretization and binning
- Outlier detection and filtering
- Permutation and random sampling
- Random sampling without replacement
- Random sampling with replacement
- Computing indicators/dummy variables
- String manipulation
- Benefits of data transformation
- Challenges
- Summary
- Further reading
- Section 2: Descriptive Statistics
- Descriptive Statistics
- Technical requirements
- Understanding statistics
- Distribution function
- Uniform distribution
- Normal distribution
- Exponential distribution
- Binomial distribution
- Cumulative distribution function
- Descriptive statistics
- Measures of central tendency
- Mean/average
- Median
- Mode
- Measures of dispersion
- Standard deviation
- Variance
- Skewness
- Kurtosis
- Types of kurtosis
- Calculating percentiles
- Quartiles
- Visualizing quartiles
- Summary
- Further reading
- Grouping Datasets
- Technical requirements
- Understanding groupby()
- Groupby mechanics
- Selecting a subset of columns
- Max and min
- Mean
- Data aggregation
- Group-wise operations
- Renaming grouped aggregation columns
- Group-wise transformations
- Pivot tables and cross-tabulations
- Pivot tables
- Cross-tabulations
- Summary
- Further reading
- Correlation
- Technical requirements
- Introducing correlation
- Types of analysis
- Understanding univariate analysis
- Understanding bivariate analysis
- Understanding multivariate analysis
- Discussing multivariate analysis using the Titanic dataset
- Outlining Simpson's paradox
- Correlation does not imply causation
- Summary
- Further reading
- Time Series Analysis
- Technical requirements
- Understanding the time series dataset
- Fundamentals of TSA
- Univariate time series
- Characteristics of time series data
- TSA with Open Power System Data
- Data cleaning
- Time-based indexing
- Visualizing time series
- Grouping time series data
- Resampling time series data
- Summary
- Further reading
- Section 3: Model Development and Evaluation
- Hypothesis Testing and Regression
- Technical requirements
- Hypothesis testing
- Hypothesis testing principle
- statsmodels library
- Average reading time
- Types of hypothesis testing
- T-test
- p-hacking
- Understanding regression
- Types of regression
- Simple linear regression
- Multiple linear regression
- Nonlinear regression
- Model development and evaluation
- Constructing a linear regression model
- Model evaluation
- Computing accuracy
- Understanding accuracy
- Implementing a multiple linear regression model
- Summary
- Further reading
- Model Development and Evaluation
- Technical requirements
- Types of machine learning
- Understanding supervised learning
- Regression
- Classification
- Understanding unsupervised learning
- Applications of unsupervised learning
- Clustering using MiniBatch K-means clustering
- Extracting keywords
- Plotting clusters
- Word cloud
- Understanding reinforcement learning
- Difference between supervised and reinforcement learning
- Applications of reinforcement learning
- Unified machine learning workflow
- Data preprocessing
- Data collection
- Data analysis
- Data cleaning normalization and transformation
- Data preparation
- Training sets and corpus creation
- Model creation and training
- Model evaluation
- Best model selection and evaluation
- Model deployment
- Summary
- Further reading
- EDA on Wine Quality Data Analysis
- Technical requirements
- Disclosing the wine quality dataset
- Loading the dataset
- Descriptive statistics
- Data wrangling
- Analyzing red wine
- Finding correlated columns
- Alcohol versus quality
- Alcohol versus pH
- Analyzing white wine
- Red wine versus white wine
- Adding a new attribute
- Converting into a categorical column
- Concatenating dataframes
- Grouping columns
- Univariate analysis
- Multivariate analysis on the combined dataframe
- Discrete categorical attributes
- 3-D visualization
- Model development and evaluation
- Summary
- Further reading
- Appendix
- String manipulation
- Creating strings
- Accessing characters in Python
- String slicing
- Deleting/updating from a string
- Escape sequencing in Python
- Formatting strings
- Using pandas vectorized string functions
- Using string functions with a pandas DataFrame
- Using regular expressions
- Further reading
- Other Books You May Enjoy
- Leave a review - let other readers know what you think 更新時間:2021-06-24 16:45:36