官术网_书友最值得收藏!

About the dataset

The dataset that we will be focusing on throughout this chapter is the Auto.MPG dataset, which is used predominantly with the R language. This dataset gives the complete details of fuel economy data for the years 1999 and 2008 for 38 popular car models. This dataset also comes with the ggplot2 package, which we will cover in the coming chapters.

For now, we will focus on importing the dataset from the CSV file, which you can download from the following link: 

https://github.com/PacktPublishing/Hands-On-Exploratory-Data-Analysis-with-R/tree/master/ch03

For more details pertaining to the dataset, you can refer to the following link:

https://archive.ics.uci.edu/ml/datasets/auto+mpg

Once the download is complete, we can import the CSV file into the dataset. With this conversion, we can include the dataset in the R workspace:

> mpg <-read.csv("highway_mpg.csv", stringsAsFactors = FALSE)
> View(mpg)

From this, we get the following output:

As shown in the preceding screenshot, the Auto.MPG dataset includes various attributes, as follows:

The dataset, which is represented in tabular format, is as follows:

The description, including data types for each attribute of the dataset, can be achieved with the following command:

> str(mpg)   
'data.frame':  234 obs. of  11 variables:   
 $ manufacturer: chr  "audi" "audi"   "audi" "audi" ...   
 $ model       : chr  "a4" "a4"   "a4" "a4" ...   
 $ displ       : num  1.8 1.8 2 2 2.8 2.8   3.1 1.8 1.8 2 ...   
 $ year        : int  1999 1999 2008 2008   1999 1999 2008 1999 1999 2008 ...   
 $ cyl         : int  4 4 4 4 6 6 6 4 4 4   ...   
 $ trans       : chr  "auto(l5)"   "manual(m5)" "manual(m6)" "auto(av)" ...   
 $ drv         : chr  "f" "f"   "f" "f" ...   
 $ cty         : int  18 21 20 21 16 18   18 18 16 20 ...   
 $ hwy         : int  29 29 31 30 26 26   27 26 25 28 ...   
 $ fl          : chr  "p" "p"   "p" "p" ...   
 $ class       : chr  "compact"   "compact" "compact" "compact" ...   

The str function is declared as an alternative to the summary function. It displays the internal structure of an R object in a compact manner.

主站蜘蛛池模板: 眉山市| 营口市| 南通市| 云阳县| 玉林市| 钟山县| 汾阳市| 安化县| 察隅县| 兰考县| 平乐县| 若尔盖县| 喀什市| 宁强县| 巴青县| 黄山市| 深水埗区| 安宁市| 宁都县| 保定市| 乌苏市| 万宁市| 佛教| 两当县| 龙川县| 喀喇沁旗| 大厂| 威信县| 永吉县| 安龙县| 南木林县| 义乌市| 岑巩县| 延边| 盘山县| 九江市| 香格里拉县| 驻马店市| 刚察县| 东山县| 福安市|