官术网_书友最值得收藏!

Descriptive statistics

Traditionally, we could use the base R summary() function to identify some basic statistics. Now, and recently I might add, I like to use the package sjmisc and its descr() function. It produces a more readable output, and you can assign that output to a dataframe. What works well is to create that dataframe, save it as a .csv, and explore it at your leisure. It automatically selects numeric features only. It also fits well with tidyverse so that you can incorporate dplyr functions such as group_by() and filter(). Here's an example in our case where we examine the descriptive stats for the infantry of the Confederate Army. The output will consist of the following:

  • var: feature name
  • type: integer
  • n: number of observations
  • NA.prc: percent of missing values
  • mean
  • sd: standard deviation
  • se: standard error
  • md: median
  • trimmed: trimmed mean
  • range
  • skew
gettysburg %>%
dplyr::filter(army == "Confederate" & type == "Infantry") %>%
sjmisc::descr() -> descr_stats

readr::write_csv(descr_stats, 'descr_stats.csv')

The following is abbreviated output from the preceding code saved to a spreadsheet:

In this one table, we can discern some rather interesting tidbits. In particular is the percent of missing values per feature. If you modify the precious code to examine the Union Army, you'll find that there're no missing values. The reason the usurpers from the South had missing values is based on a couple of factors; either shoddy staff work in compiling the numbers on July 3rd or the records were lost over the years. Note that, for the number of men captured, if you remove the missing value, all other values are zero, so we could just replace the missing value with it. The Rebels did not report troops as captured, but rather as missing, in contrast with the Union.

Once you feel comfortable with the descriptive statistics, move on to exploring the categorical features in the next section.

主站蜘蛛池模板: 清镇市| 根河市| 葵青区| 长沙市| 关岭| 遵化市| 尉犁县| 澄江县| 嘉禾县| 泗洪县| 昆山市| 珲春市| 连南| 尼玛县| 信阳市| 康保县| 湖南省| 武邑县| 田东县| 商都县| 高青县| 镇坪县| 讷河市| 麻栗坡县| 芜湖市| 万荣县| 新化县| 扎囊县| 庆阳市| 商都县| 肥西县| 金山区| 渝中区| 克拉玛依市| 静乐县| 夹江县| 惠东县| 荥阳市| 淅川县| 阜南县| 惠来县|