目錄(314章)
倒序
- coverpage
- Title Page
- Copyright
- Statistics for Data Science
- Credits
- About the Author
- About the Reviewer
- www.PacktPub.com
- Why subscribe?
- Customer Feedback
- Preface
- What this book covers
- What you need for this book
- Who this book is for
- Conventions
- Reader feedback
- Customer support
- Downloading the example code
- Downloading the color images of this book
- Errata
- Piracy
- Questions
- Transitioning from Data Developer to Data Scientist
- Data developer thinking
- Objectives of a data developer
- Querying or mining
- Data quality or data cleansing
- Data modeling
- Issue or insights
- Thought process
- Developer versus scientist
- New data new source
- Quality questions
- Querying and mining
- Performance
- Financial reporting
- Visualizing
- Tools of the trade
- Advantages of thinking like a data scientist
- Developing a better approach to understanding data
- Using statistical thinking during program or database designing
- Adding to your personal toolbox
- Increased marketability
- Perpetual learning
- Seeing the future
- Transitioning to a data scientist
- Let's move ahead
- Summary
- Declaring the Objectives
- Key objectives of data science
- Collecting data
- Processing data
- Exploring and visualizing data
- Analyzing the data and/or applying machine learning to the data
- Deciding (or planning) based upon acquired insight
- Thinking like a data scientist
- Bringing statistics into data science
- Common terminology
- Statistical population
- Probability
- False positives
- Statistical inference
- Regression
- Fitting
- Categorical data
- Classification
- Clustering
- Statistical comparison
- Coding
- Distributions
- Data mining
- Decision trees
- Machine learning
- Munging and wrangling
- Visualization
- D3
- Regularization
- Assessment
- Cross-validation
- Neural networks
- Boosting
- Lift
- Mode
- Outlier
- Predictive modeling
- Big Data
- Confidence interval
- Writing
- Summary
- A Developer's Approach to Data Cleaning
- Understanding basic data cleaning
- Common data issues
- Contextual data issues
- Cleaning techniques
- R and common data issues
- Outliers
- Step 1 – Profiling the data
- Step 2 – Addressing the outliers
- Domain expertise
- Validity checking
- Enhancing data
- Harmonization
- Standardization
- Transformations
- Deductive correction
- Deterministic imputation
- Summary
- Data Mining and the Database Developer
- Data mining
- Common techniques
- Visualization
- Cluster analysis
- Correlation analysis
- Discriminant analysis
- Factor analysis
- Regression analysis
- Logistic analysis
- Purpose
- Mining versus querying
- Choosing R for data mining
- Visualizations
- Current smokers
- Missing values
- A cluster analysis
- Dimensional reduction
- Calculating statistical significance
- Frequent patterning
- Frequent item-setting
- Sequence mining
- Summary
- Statistical Analysis for the Database Developer
- Data analysis
- Looking closer
- Statistical analysis
- Summarization
- Comparing groups
- Samples
- Group comparison conclusions
- Summarization modeling
- Establishing the nature of data
- Successful statistical analysis
- R and statistical analysis
- Summary
- Database Progression to Database Regression
- Introducing statistical regression
- Techniques and approaches for regression
- Choosing your technique
- Does it fit?
- Identifying opportunities for statistical regression
- Summarizing data
- Exploring relationships
- Testing significance of differences
- Project profitability
- R and statistical regression
- A working example
- Establishing the data profile
- The graphical analysis
- Predicting with our linear model
- Step 1: Chunking the data
- Step 2: Creating the model on the training data
- Step 3: Predicting the projected profit on test data
- Step 4: Reviewing the model
- Step 4: Accuracy and error
- Summary
- Regularization for Database Improvement
- Statistical regularization
- Various statistical regularization methods
- Ridge
- Lasso
- Least angles
- Opportunities for regularization
- Collinearity
- Sparse solutions
- High-dimensional data
- Classification
- Using data to understand statistical regularization
- Improving data or a data model
- Simplification
- Relevance
- Speed
- Transformation
- Variation of coefficients
- Casual inference
- Back to regularization
- Reliability
- Using R for statistical regularization
- Parameter Setup
- Summary
- Database Development and Assessment
- Assessment and statistical assessment
- Objectives
- Baselines
- Planning for assessment
- Evaluation
- Development versus assessment
- Planning
- Data assessment and data quality assurance
- Categorizing quality
- Relevance
- Cross-validation
- Preparing data
- R and statistical assessment
- Questions to ask
- Learning curves
- Example of a learning curve
- Summary
- Databases and Neural Networks
- Ask any data scientist
- Defining neural network
- Nodes
- Layers
- Training
- Solution
- Understanding the concepts
- Neural network models and database models
- No single or main node
- Not serial
- No memory address to store results
- R-based neural networks
- References
- Data prep and preprocessing
- Data splitting
- Model parameters
- Cross-validation
- R packages for ANN development
- ANN
- ANN2
- NNET
- Black boxes
- A use case
- Popular use cases
- Character recognition
- Image compression
- Stock market prediction
- Fraud detection
- Neuroscience
- Summary
- Boosting your Database
- Definition and purpose
- Bias
- Categorizing bias
- Causes of bias
- Bias data collection
- Bias sample selection
- Variance
- ANOVA
- Noise
- Noisy data
- Weak and strong learners
- Weak to strong
- Model bias
- Training and prediction time
- Complexity
- Which way?
- Back to boosting
- How it started
- AdaBoost
- What you can learn from boosting (to help) your database
- Using R to illustrate boosting methods
- Prepping the data
- Training
- Ready for boosting
- Example results
- Summary
- Database Classification using Support Vector Machines
- Database classification
- Data classification in statistics
- Guidelines for classifying data
- Common guidelines
- Definitions
- Definition and purpose of an SVM
- The trick
- Feature space and cheap computations
- Drawing the line
- More than classification
- Downside
- Reference resources
- Predicting credit scores
- Using R and an SVM to classify data in a database
- Moving on
- Summary
- Database Structures and Machine Learning
- Data structures and data models
- Data structures
- Data models
- What's the difference?
- Relationships
- Machine learning
- Overview of machine learning concepts
- Key elements of machine learning
- Representation
- Evaluation
- Optimization
- Types of machine learning
- Supervised learning
- Unsupervised learning
- Semi-supervised learning
- Reinforcement learning
- Most popular
- Applications of machine learning
- Machine learning in practice
- Understanding
- Preparation
- Learning
- Interpretation
- Deployment
- Iteration
- Using R to apply machine learning techniques to a database
- Understanding the data
- Preparing
- Data developer
- Understanding the challenge
- Cross-tabbing and plotting
- Summary 更新時(shí)間:2021-07-02 14:59:37
推薦閱讀
- Practical Data Analysis
- Dreamweaver CS3網(wǎng)頁制作融會(huì)貫通
- 實(shí)時(shí)流計(jì)算系統(tǒng)設(shè)計(jì)與實(shí)現(xiàn)
- Visual FoxPro 6.0數(shù)據(jù)庫與程序設(shè)計(jì)
- 機(jī)艙監(jiān)測與主機(jī)遙控
- 電腦上網(wǎng)直通車
- 自動(dòng)生產(chǎn)線的拆裝與調(diào)試
- 現(xiàn)代傳感技術(shù)
- 計(jì)算機(jī)網(wǎng)絡(luò)安全
- ESP8266 Home Automation Projects
- 基于單片機(jī)的嵌入式工程開發(fā)詳解
- HTML5 Canvas Cookbook
- ASP.NET 2.0 Web開發(fā)入門指南
- 工業(yè)機(jī)器人集成應(yīng)用
- Embedded Linux Development using Yocto Projects(Second Edition)
- 玩轉(zhuǎn)機(jī)器人:基于Proteus的電路原理仿真(移動(dòng)視頻版)
- 深度學(xué)習(xí)實(shí)戰(zhàn)
- 51單片機(jī)應(yīng)用開發(fā)實(shí)戰(zhàn)手冊
- 仿蛇機(jī)器人的設(shè)計(jì)與制作
- 瘋狂Java實(shí)戰(zhàn)演義
- 中文版Photoshop CS6高手速成
- Getting Started with Flurry Analytics
- Force.com Enterprise Architecture(Second Edition)
- 智能運(yùn)維之道:基于AI技術(shù)的應(yīng)用實(shí)踐
- Information Security Handbook
- Photoshop CS3中文版圖像處理與創(chuàng)意設(shè)計(jì)
- Excel 2007公式、函數(shù)與圖表應(yīng)用
- Mastering pandas
- 大道至簡:軟件工程實(shí)踐者的思想
- Data Manipulation with R