五福临门电子游戏网站

書名： R for Data Science Cookbook
作者名： Yu Wei Chiu (David Chiu)
本章字數： 198字
更新時間： 2021-07-14 10:51:29

Dropping data

In the previous recipes, we introduced how to revise and filter datasets. Following these steps almost concludes the data preprocessing and preparation phase. However, we may still find some bad data within our dataset. Thus, we should discard this bad data or unwanted records to prevent it from generating misleading results. Here, we introduce some practical methods to remove this unnecessary data.

Getting ready

Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.

How to do it…

Perform the following steps to drop an attribute from the current dataset:

First, you can drop the last_name column by excluding last_name in our filtered subset:
```
> employees <- employees[,-5]
```
Or, you can assign NULL to the attribute you wish to drop:
```
> employees$hire_date <- NULL
```
To drop rows, you can specify the index of the row that you want to drop by assigning a negative index:
```
> employees <- employees[c(-2,-4,-6),]
```

How it works…

The idea of dropping rows is very similar to data filtering; you only need to specify the negative index of rows (or columns) that you want to drop during the filtering. Then, you can replace the original dataset with the filtered subset. Thus, as the last_name column is at the fifth index, you can remove the attribute by specifying -5 at the right-hand side of the comma within the square bracket. In addition to reassignment, you can also assign NULL to the attribute that you want to drop. As for removing rows, you can place negative indexes on the left-hand side of comma within the square bracket, and then replace the original dataset with the filtered subset.

There's more…

In addition to data filtering or assigning the specific attribute to NULL, you can use the within function to remove unwanted attributes. All you need to do is place the unwanted attribute names inside the rm function:

> within(employees, rm(birth_date, hire_date))
 emp_no first_name last_name gender
1 10001 Georgi Facello M
2 10002 Bezalel Simmel F
3 10003 Parto Bamford M
4 10004 Chirstian Koblick M
5 10005 Kyoichi Maliniak M
6 10006 Anneke Preusig F
7 10007 Tzvetan Zielinski F
8 10008 Saniya Kalloufi M
9 10009 Sumant Peac F
10 10010 Duangkaew Piveteau F

官术网_书友最值得收藏!

R for Data Science Cookbook

Dropping data

Getting ready

How to do it…

How it works…

There's more…