官术网_书友最值得收藏!

Operations on data structures

The R environment has a rich set of options available for performing operations on data within the various data structures. These operations can be performed in a variety of ways and can be restricted according to various criteria. The focus of this section is the purpose and formats of the various apply commands.

The apply commands are used to instruct R to use a given command on specific parts of a list, vector, or array. Each data type has different versions of the apply commands that are available. Before discussing the different commands, it is important to define the notion of the margins of a table or array. The margins are defined along any dimension, and the dimension used must be specified. The margin command can be used to determine the sum of the row, columns, or the entire column of an array or table:

> A <- matrix(1:12,nrow=3,byrow=TRUE)
> A
 [,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
> margin.table(A)
[1] 78
> margin.table(A,1)
[1] 10 26 42
> margin.table(A,2)
[1] 15 18 21 24

The last two commands specify the optional margin argument. The margin.table(A,1) command specifies that the sums are in the first dimension, that is, the rows. The margin.table(A,2) command specifies that the sums are in the second dimension, that is, the columns. The idea of specifying which dimension to use in a command can be important when using the apply commands.

The apply commands

The various apply commands are used to operate on the different data structures. Each one—apply, lapply, sapply, tapply, and mapply—will be briefly discussed in order in the following sections.

apply

The apply command is used to apply a given function across a given margin of an array or table. For example, to take the sum of a row or column from a two way table, use the apply command with arguments for the table, the sum command, and which dimension to use:

> A <- matrix(1:12,nrow=3,byrow=TRUE)
> A
 [,1] [,2] [,3] [,4]
[1,] 1 2 3 4
[2,] 5 6 7 8
[3,] 9 10 11 12
> apply(A,1,sum)
[1] 10 26 42
> apply(A,2,sum)
[1] 15 18 21 24

You should be able to verify these results using the rowSums and colSums commands as well as the margin.table command discussed previously.

lapply and sapply

The lapply command is used to apply a function to each element in a list. The result is a list, where each component of the returned object is the function applied to the object in the original list with the same name:

> theList <- list(one=c(1,2,3),two=c(TRUE,FALSE,TRUE,TRUE))
> sumResult <-  lapply(theList,sum)
> sumResult
$one
[1] 6

$two
[1] 3

> typeof(sumResult)
[1] "list"
> sumResult$one
[1] 6

The sapply command is similar to the lapply command, and it performs the same operation. The difference is that the result is coerced to be a vector if possible:

> theList <- list(one=c(1,2,3),two=c(TRUE,FALSE,TRUE,TRUE))
> meanResult <- sapply(theList,mean)
> meanResult
 one two 
2.00 0.75 
> typeof(meanResult)
[1] "double"

tapply

The tapply command is used to apply a function to different parts of data within an array. The function takes at least three arguments. The first is the data to apply an operation, the second is the set of factors that defines how the data is organized with respect to the different levels, and the third is the operation to perform. In the following example, a vector is defined that has the diameter of trees. A second vector is defined, which specifies what kind of tree was measured for each observation. The goal is to find the standard deviation for each type of tree:

> diameters <- c(28.8, 27.3, 45.8, 34.8, 25.3)
> tree <- as.factor(c("pine","pine","oak","pine","oak"))
> tapply(diameters,tree,sd)
 oak pine 
14.495689 3.968627 

mapply

The last command to examine is the mapply command. The mapply command takes a function to apply and a list of arrays. The function takes the first elements of each array and applies the function to that list. It then takes the second elements of each array and applies the function. This is repeated until it goes through every element. Note that if one of the arrays has fewer elements than the others, the mapply command will reset and start at the beginning of that array to fill in the missing values:

> a <- c(1,2,3)
> b <- c(1,2,3)
> mapply(sum,a,b)
[1] 2 4 6
>
主站蜘蛛池模板: 聂拉木县| 维西| 若尔盖县| 额济纳旗| 库伦旗| 海安县| 浦江县| 连城县| 乌拉特前旗| 洮南市| 中西区| 新竹市| 株洲县| 大丰市| 襄汾县| 河西区| 平乡县| 托克逊县| 龙里县| 固阳县| 上栗县| 麻城市| 龙陵县| 松阳县| 珲春市| 酒泉市| 临江市| 吉首市| 工布江达县| 阿拉善盟| 营口市| 安阳县| 徐闻县| 双柏县| 横山县| 迁西县| 读书| 东丽区| 长海县| 盐池县| 德化县|