官术网_书友最值得收藏!

Dataframes, lists, arrays, and matrices

Dataframes have several important features that make them useful for data analysis:

  • Rectangular data structures, with the typical use being cases (for example, the days in one month) listed down the rows and variables (page views, unique visitors, or referrers) listed along the columns
  • A mix of data types is supported. A typical data frame might include variables containing dates, numbers (integers or floats), and text
  • With subsetting and variable extraction, R provides a lot of built-in functionality to select rows and variables within a dataframe
  • Many functions include a data argument, which makes it very simple to pass dataframes into functions and process only the variables and cases that are relevant, which makes for cleaner and simpler code

We can inspect the first few rows of the dataframe using the head(analyticsData) command. The following screenshot shows the output of this command:

As you can see, there are four variables within the dataframe: one contains dates, two contain integer variables, and one contains a numeric variable.

Variables can be extracted from dataframes very simply using the $ operator, as follows:

> analyticsData$pageViews
[1] 836 676 940 689 647 899 934 718 776 570 651 816
[13] 731 604 627 946 634 990 994 599 657 642 894 983
[25] 646 540 756 989 965 821

Variables can also be extracted from dataframes using [], as shown in the following command:

> analyticsData[, "pageViews"]

Note the use of a comma with nothing before it to indicate that all rows are required. In general, dataframes can be accessed using dataObject[x,y], with x being the number(s) or name(s) of the rows required and y being the number(s) or name(s) of the columns required. For example, if the first 10 rows were required from the pageViews column, it could be achieved like this:

> analyticsData[1:10,"pageViews"]
[1] 836 676 940 689 647 899 934 718 776 570

Leaving the space before the comma blank returns all rows, and leaving the space after the comma blank returns all variables. For example, the following command returns the first three rows of all variables:

> analyticsData[1:3,]

The following screenshot shows the output of this command:

Dataframes are a special type of list. Lists can hold many different types of data, including lists. As with many data types in R, their elements can be named, which can be useful to write code that is easy to understand. Let's make a list of the options for dinner, with drink quantities expressed in milliliters.

In the following example, please also note the use of the c() function, which is used to produce vectors and lists by giving their elements separated by commas. R will pick an appropriate class for the return value, string for vectors that contain strings, numeric for those that only contain numbers, logical for Boolean values, and so on:

> dinnerList <- list("Vegetables" =
  c("Potatoes", "Cabbage", "Carrots"),
  "Dessert" = c("Ice cream", "Apple pie"),
  "Drinks" = c(250, 330, 500)
  )
Note that code is indented throughout, although entering code directly into the console will not produce indentations; it is done for readability.

Indexing is similar to dataframes (which are, after all, just a special instance of a list). They can be indexed by number, as shown in the following command:

> dinnerList[1:2]
$Vegetables
[1] "Potatoes" "Cabbage"  "Carrots"
    
$Dessert
[1] "Ice cream" "Apple pie"

This returns a list. Returning an object of the appropriate class is achieved using [[]]:

> dinnerList[[3]]
[1] 250 330 500

In this case, a numeric vector is returned. They can also be indexed by name, as shown in the following code:

> dinnerList["Drinks"]
$Drinks
[1] 250 330 500

Note that this also returns a list.

Matrices and arrays, which, unlike dataframes, only hold one type of data, also make use of square brackets for indexing, with analyticsMatrix[, 3:6] returning all rows of the third to sixth columns, analyticsMatrix[1, 3] returning just the first row of the third column, and analyticsArray[1, 2, ] returning the first row of the second column across all of the elements within the third dimension.

主站蜘蛛池模板: 北流市| 共和县| 杭锦旗| 布拖县| 团风县| 农安县| 股票| 中山市| 利川市| 天长市| 平谷区| 合水县| 广汉市| 金溪县| 那曲县| 嘉定区| 盐津县| 景德镇市| 和林格尔县| 甘谷县| 启东市| 佛山市| 轮台县| 长治县| 清徐县| 泰安市| 瑞丽市| 绥中县| 襄城县| 连云港市| 南乐县| 兴文县| 普洱| 扬中市| 宝清县| 肥东县| 定兴县| 金坛市| 霞浦县| 资源县| 绥阳县|