官术网_书友最值得收藏!

  • R Programming Fundamentals
  • Kaelen Medeiros
  • 590字
  • 2021-07-23 16:58:22

For loop

For loops are often used to go through every column or row of a dataframe in R.

Say, for example, that we're interested in the mean of all of the numeric columns of the built-in iris dataset (which is four out of the five—everything but the Species column, which is a factor variable of character strings indicating the species of each iris.) We could type, four times, mean(iris$Sepal.Length), with each input variable name changing each time. However, a far more efficient way to complete this exercise would be to use a for loop.

If we simply want to print the means to the console, we could use a for loop as follows:

for(i in seq_along(iris)){
print(mean(iris[[i]]))
}

The output will be as follows:

We'll come back to the output, especially that warning message, in a second—first, let's break down the components of the for loop. The syntax will always be as follows:

for(i in a range of numbers){
some_action
}

In this particular for loop, we chose i as our iterator variable. A for loop in R will automatically iterate this variable, which means that every time it reaches the end of the loop, it will increase i by one. You might have noticed that once the loop has finished completing, i was added to the global environment as a Value, 5L (which means it's an integer, the number 5). Our iterator will always get added to the environment when a loop concludes.

It is displayed on the screen, as shown in the following screenshot:

The R function seq_along() is very helpful for the for loops, because it automatically moves along the number of columns of the dataframe (if that's the input) or more generally, iterates along the number of items contained in whatever is input into it.

We also chose to print the mean of each column in this particular for loop. Accessing the columns is done using indexing, so when i = 1, iris[[i]] is equal to the Sepal.Length variable, which is column 1, and so on. We got an error for column 5, because it isn't numeric (the Species variable!) Species doesn't have a mean, because it's a character variable.

This is actually a great example of where we can combine for loops with an if statement. Take a look at the following code:

for(i in seq_along(iris)){
if(class(iris[[i]]) == "numeric"){
print(mean(iris[[i]]))
}
}

The if statement here will only print the mean of an iris column if the class of that column is numeric (which makes sense, since only numeric columns should have means!) The output is now only as follows:

If we're really feeling fancy, we could have even added an else statement with a different message for when the class of a column isn't numeric, such as in this loop:

for(i in seq_along(iris)){
if(class(iris[[i]]) == "numeric"){
print(mean(iris[[i]]))
}else{
print(paste("Variable", i, "isn't numeric"))
}
}

The output is as follows:

seq_along() returns a sequence of numbers and makes for loops more straightforward. However, if you need to iterate using any other function, the syntax of the for statement will change slightly. The following code will print every row of the Species column in iris:

for(i in 1:nrow(iris)){
print(iris[i, "Sepal.Width"])
}

You have to explicitly use 1:nrow(iris) in the for statement, or this loop will not run. nrow() simply returns the number of rows of iris versus the entire sequence of the number of columns that seq_along() returns as shown below:

nrow(iris)
[1] 150
seq_along(iris)
[1] 1 2 3 4 5
主站蜘蛛池模板: 紫云| 读书| 育儿| 扶沟县| 金昌市| 广灵县| 鄱阳县| 衢州市| 山西省| 和田市| 阳曲县| 曲周县| 彰化县| 乌拉特后旗| 深水埗区| 荔浦县| 庆阳市| 阿合奇县| 清远市| 任丘市| 喀喇| 房山区| 平原县| 海林市| 比如县| 和政县| 泰兴市| 石门县| 会东县| 洪洞县| 万州区| 桂东县| 阿鲁科尔沁旗| 宁国市| 无为县| 浙江省| 杂多县| 民丰县| 呼伦贝尔市| 长沙市| 崇州市|