官术网_书友最值得收藏!

Useful functions to draw automated summaries

A very standard procedure whenever conducting data analysis with R is to get a glimpse of data. To input head() and tail() functions with a DataFrame is quite common among R users; people tend to use both to check whether data was correctly read. While the latter function will display the last few observations, the former will show you the first ones. That's useful, but not what we're looking for.

There is another function commonly called at the beginning of a data analysis process. It's called summary(). A short demonstration lies ahead:

summary(big_sample)
# Min. 1st Qu. Median Mean 3rd Qu. Max.
# -11.317 6.586 9.980 9.971 13.345 32.341

This function works differently depending on what class of object you input it with. For both vectors and DataFrames, it will display central trend measures (median and mean) along with other useful information about how your variables are distributed (minimum value, maximum value, first quartile, and third quartile).

Some packages have similar functions. Let's make sure the psych, Hmisc, and pastecs packages are already installed: 

pkgs <- c('psych','Hmisc','pastecs')
pkgs <- pkgs[!(pkgs %in% installed.packages())]
if(length(pkgs) != 0) {install.packages(pkgs)}
rm(pkgs)

Now, we can try some descriptive summaries from these packages:

psych::describe(big_sample)
Hmisc::describe(big_sample)
pastecs::stat.desc(big_sample)

Each of these functions will output a different set of information about data that has been input. I encourage the reader to try them all. Which of them do you like best?

This section has introduced you to some of the most popular measures of central tendency and dispersion. Those are not only used to draw descriptive analysis, but they are also used to handle inferences. It's hard to find any model that won't benefit from mean and variance (and standard deviation) at all.

The average prediction given by the arithmetic mean is usually more accurate than predictions considered individually. This phenomenon is known as Wisdom of the Crowd.

With mean and standard deviation at hand, it's time to move on to inference. The inferences discussed next can be found under an umbrella called statistical hypothesis testing.

主站蜘蛛池模板: 同仁县| 隆子县| 和顺县| 昭苏县| 利辛县| 桓台县| 固镇县| 南宁市| 西华县| 孝昌县| 老河口市| 凤翔县| 乌兰县| 武定县| 青川县| 林州市| 霸州市| 施秉县| 如东县| 黄陵县| 浦县| 娄底市| 博白县| 山西省| 民县| 灯塔市| 静宁县| 张家港市| 新营市| 汝州市| 鄯善县| 昌黎县| 阜阳市| 衡山县| 青神县| 大厂| 大悟县| 湘阴县| 忻城县| 平乐县| 鄄城县|