官术网_书友最值得收藏!

Dropping data

In the previous recipes, we introduced how to revise and filter datasets. Following these steps almost concludes the data preprocessing and preparation phase. However, we may still find some bad data within our dataset. Thus, we should discard this bad data or unwanted records to prevent it from generating misleading results. Here, we introduce some practical methods to remove this unnecessary data.

Getting ready

Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.

How to do it…

Perform the following steps to drop an attribute from the current dataset:

  1. First, you can drop the last_name column by excluding last_name in our filtered subset:
    > employees <- employees[,-5]
    
  2. Or, you can assign NULL to the attribute you wish to drop:
    > employees$hire_date <- NULL
    
  3. To drop rows, you can specify the index of the row that you want to drop by assigning a negative index:
    > employees <- employees[c(-2,-4,-6),]
    

How it works…

The idea of dropping rows is very similar to data filtering; you only need to specify the negative index of rows (or columns) that you want to drop during the filtering. Then, you can replace the original dataset with the filtered subset. Thus, as the last_name column is at the fifth index, you can remove the attribute by specifying -5 at the right-hand side of the comma within the square bracket. In addition to reassignment, you can also assign NULL to the attribute that you want to drop. As for removing rows, you can place negative indexes on the left-hand side of comma within the square bracket, and then replace the original dataset with the filtered subset.

There's more…

In addition to data filtering or assigning the specific attribute to NULL, you can use the within function to remove unwanted attributes. All you need to do is place the unwanted attribute names inside the rm function:

> within(employees, rm(birth_date, hire_date))
 emp_no first_name last_name gender
1 10001 Georgi Facello M
2 10002 Bezalel Simmel F
3 10003 Parto Bamford M
4 10004 Chirstian Koblick M
5 10005 Kyoichi Maliniak M
6 10006 Anneke Preusig F
7 10007 Tzvetan Zielinski F
8 10008 Saniya Kalloufi M
9 10009 Sumant Peac F
10 10010 Duangkaew Piveteau F
主站蜘蛛池模板: 桂东县| 祥云县| 教育| 军事| 泾川县| 英吉沙县| 隆昌县| 永和县| 溧水县| 邯郸市| 柘城县| 昌图县| 蓬安县| 长春市| 白河县| 平湖市| 安康市| 宁都县| 宁明县| 清水河县| 长白| 威海市| 年辖:市辖区| 韩城市| 普兰店市| 宜城市| 高台县| 抚松县| 秭归县| 化州市| 余庆县| 元朗区| 静海县| 赤城县| 青铜峡市| 霍山县| 五河县| 阿克陶县| 时尚| 蓬溪县| 永安市|