舉報

會員
Statistics for Data Science
James D. Miller 著
更新時間:2021-07-02 14:59:37
開會員,本書免費讀 >
最新章節:
Summary
Thisbookisintendedforthosedeveloperswhoarewillingtoenterthefieldofdatascienceandarelookingforconciseinformationofstatisticswiththehelpofinsightfulprogramsandsimpleexplanation.SomebasichandsonRwillbeuseful.
最新章節
- Summary
- Cross-tabbing and plotting
- Understanding the challenge
- Data developer
- Preparing
- Understanding the data
品牌:中圖公司
上架時間:2021-07-02 12:39:58
出版社:Packt Publishing
本書數字版權由中圖公司提供,并由其授權上海閱文信息技術有限公司制作發行
- Summary 更新時間:2021-07-02 14:59:37
- Cross-tabbing and plotting
- Understanding the challenge
- Data developer
- Preparing
- Understanding the data
- Using R to apply machine learning techniques to a database
- Iteration
- Deployment
- Interpretation
- Learning
- Preparation
- Understanding
- Machine learning in practice
- Applications of machine learning
- Most popular
- Reinforcement learning
- Semi-supervised learning
- Unsupervised learning
- Supervised learning
- Types of machine learning
- Optimization
- Evaluation
- Representation
- Key elements of machine learning
- Overview of machine learning concepts
- Machine learning
- Relationships
- What's the difference?
- Data models
- Data structures
- Data structures and data models
- Database Structures and Machine Learning
- Summary
- Moving on
- Using R and an SVM to classify data in a database
- Predicting credit scores
- Reference resources
- Downside
- More than classification
- Drawing the line
- Feature space and cheap computations
- The trick
- Definition and purpose of an SVM
- Definitions
- Common guidelines
- Guidelines for classifying data
- Data classification in statistics
- Database classification
- Database Classification using Support Vector Machines
- Summary
- Example results
- Ready for boosting
- Training
- Prepping the data
- Using R to illustrate boosting methods
- What you can learn from boosting (to help) your database
- AdaBoost
- How it started
- Back to boosting
- Which way?
- Complexity
- Training and prediction time
- Model bias
- Weak to strong
- Weak and strong learners
- Noisy data
- Noise
- ANOVA
- Variance
- Bias sample selection
- Bias data collection
- Causes of bias
- Categorizing bias
- Bias
- Definition and purpose
- Boosting your Database
- Summary
- Neuroscience
- Fraud detection
- Stock market prediction
- Image compression
- Character recognition
- Popular use cases
- A use case
- Black boxes
- NNET
- ANN2
- ANN
- R packages for ANN development
- Cross-validation
- Model parameters
- Data splitting
- Data prep and preprocessing
- References
- R-based neural networks
- No memory address to store results
- Not serial
- No single or main node
- Neural network models and database models
- Understanding the concepts
- Solution
- Training
- Layers
- Nodes
- Defining neural network
- Ask any data scientist
- Databases and Neural Networks
- Summary
- Example of a learning curve
- Learning curves
- Questions to ask
- R and statistical assessment
- Preparing data
- Cross-validation
- Relevance
- Categorizing quality
- Data assessment and data quality assurance
- Planning
- Development versus assessment
- Evaluation
- Planning for assessment
- Baselines
- Objectives
- Assessment and statistical assessment
- Database Development and Assessment
- Summary
- Parameter Setup
- Using R for statistical regularization
- Reliability
- Back to regularization
- Casual inference
- Variation of coefficients
- Transformation
- Speed
- Relevance
- Simplification
- Improving data or a data model
- Using data to understand statistical regularization
- Classification
- High-dimensional data
- Sparse solutions
- Collinearity
- Opportunities for regularization
- Least angles
- Lasso
- Ridge
- Various statistical regularization methods
- Statistical regularization
- Regularization for Database Improvement
- Summary
- Step 4: Accuracy and error
- Step 4: Reviewing the model
- Step 3: Predicting the projected profit on test data
- Step 2: Creating the model on the training data
- Step 1: Chunking the data
- Predicting with our linear model
- The graphical analysis
- Establishing the data profile
- A working example
- R and statistical regression
- Project profitability
- Testing significance of differences
- Exploring relationships
- Summarizing data
- Identifying opportunities for statistical regression
- Does it fit?
- Choosing your technique
- Techniques and approaches for regression
- Introducing statistical regression
- Database Progression to Database Regression
- Summary
- R and statistical analysis
- Successful statistical analysis
- Establishing the nature of data
- Summarization modeling
- Group comparison conclusions
- Samples
- Comparing groups
- Summarization
- Statistical analysis
- Looking closer
- Data analysis
- Statistical Analysis for the Database Developer
- Summary
- Sequence mining
- Frequent item-setting
- Frequent patterning
- Calculating statistical significance
- Dimensional reduction
- A cluster analysis
- Missing values
- Current smokers
- Visualizations
- Choosing R for data mining
- Mining versus querying
- Purpose
- Logistic analysis
- Regression analysis
- Factor analysis
- Discriminant analysis
- Correlation analysis
- Cluster analysis
- Visualization
- Common techniques
- Data mining
- Data Mining and the Database Developer
- Summary
- Deterministic imputation
- Deductive correction
- Transformations
- Standardization
- Harmonization
- Enhancing data
- Validity checking
- Domain expertise
- Step 2 – Addressing the outliers
- Step 1 – Profiling the data
- Outliers
- R and common data issues
- Cleaning techniques
- Contextual data issues
- Common data issues
- Understanding basic data cleaning
- A Developer's Approach to Data Cleaning
- Summary
- Writing
- Confidence interval
- Big Data
- Predictive modeling
- Outlier
- Mode
- Lift
- Boosting
- Neural networks
- Cross-validation
- Assessment
- Regularization
- D3
- Visualization
- Munging and wrangling
- Machine learning
- Decision trees
- Data mining
- Distributions
- Coding
- Statistical comparison
- Clustering
- Classification
- Categorical data
- Fitting
- Regression
- Statistical inference
- False positives
- Probability
- Statistical population
- Common terminology
- Bringing statistics into data science
- Thinking like a data scientist
- Deciding (or planning) based upon acquired insight
- Analyzing the data and/or applying machine learning to the data
- Exploring and visualizing data
- Processing data
- Collecting data
- Key objectives of data science
- Declaring the Objectives
- Summary
- Let's move ahead
- Transitioning to a data scientist
- Seeing the future
- Perpetual learning
- Increased marketability
- Adding to your personal toolbox
- Using statistical thinking during program or database designing
- Developing a better approach to understanding data
- Advantages of thinking like a data scientist
- Tools of the trade
- Visualizing
- Financial reporting
- Performance
- Querying and mining
- Quality questions
- New data new source
- Developer versus scientist
- Thought process
- Issue or insights
- Data modeling
- Data quality or data cleansing
- Querying or mining
- Objectives of a data developer
- Data developer thinking
- Transitioning from Data Developer to Data Scientist
- Questions
- Piracy
- Errata
- Downloading the color images of this book
- Downloading the example code
- Customer support
- Reader feedback
- Conventions
- Who this book is for
- What you need for this book
- What this book covers
- Preface
- Customer Feedback
- Why subscribe?
- www.PacktPub.com
- About the Reviewer
- About the Author
- Credits
- Statistics for Data Science
- Copyright
- Title Page
- coverpage
- coverpage
- Title Page
- Copyright
- Statistics for Data Science
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Transitioning from Data Developer to Data Scientist
- Data developer thinking
- Objectives of a data developer
- Querying or mining
- Data quality or data cleansing
- Data modeling
- Issue or insights
- Thought process
- Developer versus scientist
- New data new source
- Quality questions
- Querying and mining
- Performance
- Financial reporting
- Visualizing
- Tools of the trade
- Advantages of thinking like a data scientist
- Developing a better approach to understanding data
- Using statistical thinking during program or database designing
- Adding to your personal toolbox
- Increased marketability
- Perpetual learning
- Seeing the future
- Transitioning to a data scientist
- Let's move ahead
- Summary
- Declaring the Objectives
- Key objectives of data science
- Collecting data
- Processing data
- Exploring and visualizing data
- Analyzing the data and/or applying machine learning to the data
- Deciding (or planning) based upon acquired insight
- Thinking like a data scientist
- Bringing statistics into data science
- Common terminology
- Statistical population
- Probability
- False positives
- Statistical inference
- Regression
- Fitting
- Categorical data
- Classification
- Clustering
- Statistical comparison
- Coding
- Distributions
- Data mining
- Decision trees
- Machine learning
- Munging and wrangling
- Visualization
- D3
- Regularization
- Assessment
- Cross-validation
- Neural networks
- Boosting
- Lift
- Mode
- Outlier
- Predictive modeling
- Big Data
- Confidence interval
- Writing
- Summary
- A Developer's Approach to Data Cleaning
- Understanding basic data cleaning
- Common data issues
- Contextual data issues
- Cleaning techniques
- R and common data issues
- Outliers
- Step 1 – Profiling the data
- Step 2 – Addressing the outliers
- Domain expertise
- Validity checking
- Enhancing data
- Harmonization
- Standardization
- Transformations
- Deductive correction
- Deterministic imputation
- Summary
- Data Mining and the Database Developer
- Data mining
- Common techniques
- Visualization
- Cluster analysis
- Correlation analysis
- Discriminant analysis
- Factor analysis
- Regression analysis
- Logistic analysis
- Purpose
- Mining versus querying
- Choosing R for data mining
- Visualizations
- Current smokers
- Missing values
- A cluster analysis
- Dimensional reduction
- Calculating statistical significance
- Frequent patterning
- Frequent item-setting
- Sequence mining
- Summary
- Statistical Analysis for the Database Developer
- Data analysis
- Looking closer
- Statistical analysis
- Summarization
- Comparing groups
- Samples
- Group comparison conclusions
- Summarization modeling
- Establishing the nature of data
- Successful statistical analysis
- R and statistical analysis
- Summary
- Database Progression to Database Regression
- Introducing statistical regression
- Techniques and approaches for regression
- Choosing your technique
- Does it fit?
- Identifying opportunities for statistical regression
- Summarizing data
- Exploring relationships
- Testing significance of differences
- Project profitability
- R and statistical regression
- A working example
- Establishing the data profile
- The graphical analysis
- Predicting with our linear model
- Step 1: Chunking the data
- Step 2: Creating the model on the training data
- Step 3: Predicting the projected profit on test data
- Step 4: Reviewing the model
- Step 4: Accuracy and error
- Summary
- Regularization for Database Improvement
- Statistical regularization
- Various statistical regularization methods
- Ridge
- Lasso
- Least angles
- Opportunities for regularization
- Collinearity
- Sparse solutions
- High-dimensional data
- Classification
- Using data to understand statistical regularization
- Improving data or a data model
- Simplification
- Relevance
- Speed
- Transformation
- Variation of coefficients
- Casual inference
- Back to regularization
- Reliability
- Using R for statistical regularization
- Parameter Setup
- Summary
- Database Development and Assessment
- Assessment and statistical assessment
- Objectives
- Baselines
- Planning for assessment
- Evaluation
- Development versus assessment
- Planning
- Data assessment and data quality assurance
- Categorizing quality
- Relevance
- Cross-validation
- Preparing data
- R and statistical assessment
- Questions to ask
- Learning curves
- Example of a learning curve
- Summary
- Databases and Neural Networks
- Ask any data scientist
- Defining neural network
- Nodes
- Layers
- Training
- Solution
- Understanding the concepts
- Neural network models and database models
- No single or main node
- Not serial
- No memory address to store results
- R-based neural networks
- References
- Data prep and preprocessing
- Data splitting
- Model parameters
- Cross-validation
- R packages for ANN development
- ANN
- ANN2
- NNET
- Black boxes
- A use case
- Popular use cases
- Character recognition
- Image compression
- Stock market prediction
- Fraud detection
- Neuroscience
- Summary
- Boosting your Database
- Definition and purpose
- Bias
- Categorizing bias
- Causes of bias
- Bias data collection
- Bias sample selection
- Variance
- ANOVA
- Noise
- Noisy data
- Weak and strong learners
- Weak to strong
- Model bias
- Training and prediction time
- Complexity
- Which way?
- Back to boosting
- How it started
- AdaBoost
- What you can learn from boosting (to help) your database
- Using R to illustrate boosting methods
- Prepping the data
- Training
- Ready for boosting
- Example results
- Summary
- Database Classification using Support Vector Machines
- Database classification
- Data classification in statistics
- Guidelines for classifying data
- Common guidelines
- Definitions
- Definition and purpose of an SVM
- The trick
- Feature space and cheap computations
- Drawing the line
- More than classification
- Downside
- Reference resources
- Predicting credit scores
- Using R and an SVM to classify data in a database
- Moving on
- Summary
- Database Structures and Machine Learning
- Data structures and data models
- Data structures
- Data models
- What's the difference?
- Relationships
- Machine learning
- Overview of machine learning concepts
- Key elements of machine learning
- Representation
- Evaluation
- Optimization
- Types of machine learning
- Supervised learning
- Unsupervised learning
- Semi-supervised learning
- Reinforcement learning
- Most popular
- Applications of machine learning
- Machine learning in practice
- Understanding
- Preparation
- Learning
- Interpretation
- Deployment
- Iteration
- Using R to apply machine learning techniques to a database
- Understanding the data
- Preparing
- Data developer
- Understanding the challenge
- Cross-tabbing and plotting
- Summary 更新時間:2021-07-02 14:59:37