- Hands-On Data Science with R
- Vitor Bianchi Lanzetta Nataraj Dasgupta Ricardo Anjoleto Farias
- 345字
- 2021-06-10 19:12:32
Useful functions to draw automated summaries
A very standard procedure whenever conducting data analysis with R is to get a glimpse of data. To input head() and tail() functions with a DataFrame is quite common among R users; people tend to use both to check whether data was correctly read. While the latter function will display the last few observations, the former will show you the first ones. That's useful, but not what we're looking for.
There is another function commonly called at the beginning of a data analysis process. It's called summary(). A short demonstration lies ahead:
summary(big_sample)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -11.317 6.586 9.980 9.971 13.345 32.341
This function works differently depending on what class of object you input it with. For both vectors and DataFrames, it will display central trend measures (median and mean) along with other useful information about how your variables are distributed (minimum value, maximum value, first quartile, and third quartile).
Some packages have similar functions. Let's make sure the psych, Hmisc, and pastecs packages are already installed:
pkgs <- c('psych','Hmisc','pastecs')
pkgs <- pkgs[!(pkgs %in% installed.packages())]
if(length(pkgs) != 0) {install.packages(pkgs)}
rm(pkgs)
Now, we can try some descriptive summaries from these packages:
psych::describe(big_sample)
Hmisc::describe(big_sample)
pastecs::stat.desc(big_sample)
Each of these functions will output a different set of information about data that has been input. I encourage the reader to try them all. Which of them do you like best?
This section has introduced you to some of the most popular measures of central tendency and dispersion. Those are not only used to draw descriptive analysis, but they are also used to handle inferences. It's hard to find any model that won't benefit from mean and variance (and standard deviation) at all.
With mean and standard deviation at hand, it's time to move on to inference. The inferences discussed next can be found under an umbrella called statistical hypothesis testing.
- Word 2003、Excel 2003、PowerPoint 2003上機指導與練習
- Mastering Proxmox(Third Edition)
- Visualforce Development Cookbook(Second Edition)
- Canvas LMS Course Design
- Deep Learning Quick Reference
- Photoshop CS4經典380例
- Mastering D3.js
- Mastering Machine Learning Algorithms
- 機器人編程實戰
- WordPress Theme Development Beginner's Guide(Third Edition)
- 筆記本電腦維修90個精選實例
- Photoshop CS4數碼攝影處理50例
- Advanced Deep Learning with Keras
- Hands-On Generative Adversarial Networks with Keras
- Flash CS3動畫制作融會貫通