- Statistical Application Development with R and Python(Second Edition)
- Prabhanjan Narayanachar Tattar
- 709字
- 2021-07-02 18:44:04
What this book covers
Chapter 1, Data Characteristics, introduces the different types of data through a questionnaire and dataset. The need of statistical models is elaborated in some interesting contexts. This is followed by a brief explanation of the installation of R and Python and their related packages. Discrete and continuous random variables are discussed through introductory programs. The programs are available in both the languages and although they do not need to be followed, they are more expository in nature.
Chapter 2, Import/Export Data, begins with a concise development of R basics. Data frames, vectors, matrices, and lists are discussed with clear and simpler examples. Importing of data from external files in CSV, XLS, and other formats is elaborated next. Writing data/objects from R for other languages is considered and the chapter concludes with a dialogue on R session management. Python basics, mathematical operations, and other essential operations are explained. Reading data from different format of external file is also illustrated along with the session management required.
Chapter 3, Data Visualization, discusses efficient graphics separately for categorical and numeric datasets. This translates into techniques for bar chart, dot chart, spine and mosaic plot, and four fold plot for categorical data while histogram, box plot, and scatter plot for continuous/numeric data. A very brief introduction to ggplot2 is also provided here. Generating similar plots using both R and Python will be a treatise here.
Chapter 4, Exploratory Analysis, encompasses highly intuitive techniques for the preliminary analysis of data. The visualizing techniques of EDA such as stem-and-leaf, letter values, and the modeling techniques of resistant line, smoothing data, and median polish provide rich insight as a preliminary analysis step. This chapter is driven mainly in R only.
Chapter 5, Statistical Inference, begins with an emphasis on the likelihood function and computing the maximum likelihood estimate. Confidence intervals for parameters of interest is developed using functions defined for specific problems. The chapter also considers important statistical tests of z-test and t-test for comparison of means and chi-square tests and f-test for comparison of variances. The reader will learn how to create new R and Python functions.
Chapter 6, Linear Regression Analysis, builds a linear relationship between an output and a set of explanatory variables. The linear regression model has many underlying assumptions and such details are verified using validation techniques. A model may be affected by a single observation, or a single output value, or an explanatory variable. Statistical metrics are discussed in depth which helps remove one or more types of anomalies. Given a large number of covariates, the efficient model is developed using model selection techniques. While the stats core R package suffices, statsmodels package in Python is very useful.
Chapter 7, The Logistic Regression Model, is useful as a classification model when the output is a binary variable. Diagnostic and model validation through residuals are used which lead to an improved model. ROC curves are next discussed which helps in identifying of a better classification model. The R packages pscl and ROCR are useful while pysal and sklearn are useful in Python.
Chapter 8, Regression Models with Regularization, discusses the problem of over fitting, which arises from the use of models developed in the previous two chapters. Ridge regression significantly reduces the probability of an over fit model and the development of natural spine models also lays the basis for the models considered in the next chapter. Regularization in R is achieved using packages ridge and MASS while sklearn and statsmodels help in Python.
Chapter 9, Classification and Regression Trees, provides a tree-based regression model. The trees are initially built using raw R functions and the final trees are also reproduced using rudimentary codes leading to a clear understanding of the CART mechanism. The pruning procedure is illustrated through one of the languages and the reader should explore to find the fix in another.
Chapter 10, CART and Beyond, considers two enhancements to CART, using bagging and random forests. A consolidation of all the models from Chapter 6, Linear Regression Analysis, to Chapter 10, CART and Beyond, is also provided through a dataset. The ensemble methods is fast emerging as very effective and popular machine learning technique and doing it in both the languages will improve users confidence.
- The Android Game Developer's Handbook
- Effective C#:改善C#代碼的50個有效方法(原書第3版)
- 信息安全技術
- Windows Server 2012 Unified Remote Access Planning and Deployment
- 3D少兒游戲編程(原書第2版)
- Building Minecraft Server Modifications
- PHP 7+MySQL 8動態網站開發從入門到精通(視頻教學版)
- Learning Continuous Integration with TeamCity
- Linux Shell核心編程指南
- INSTANT Yii 1.1 Application Development Starter
- ArcGIS for Desktop Cookbook
- Everyday Data Structures
- Kotlin語言實例精解
- Joomla!Search Engine Optimization
- 算法超簡單:趣味游戲帶你輕松入門與實踐