- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 198字
- 2021-07-14 10:51:29
Dropping data
In the previous recipes, we introduced how to revise and filter datasets. Following these steps almost concludes the data preprocessing and preparation phase. However, we may still find some bad data within our dataset. Thus, we should discard this bad data or unwanted records to prevent it from generating misleading results. Here, we introduce some practical methods to remove this unnecessary data.
Getting ready
Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees
and salaries
datasets by following the steps from the Renaming the data variable recipe.
How to do it…
Perform the following steps to drop an attribute from the current dataset:
- First, you can drop the
last_name
column by excludinglast_name
in our filtered subset:> employees <- employees[,-5]
- Or, you can assign
NULL
to the attribute you wish to drop:> employees$hire_date <- NULL
- To drop rows, you can specify the index of the row that you want to drop by assigning a negative index:
> employees <- employees[c(-2,-4,-6),]
How it works…
The idea of dropping rows is very similar to data filtering; you only need to specify the negative index of rows (or columns) that you want to drop during the filtering. Then, you can replace the original dataset with the filtered subset. Thus, as the last_name
column is at the fifth index, you can remove the attribute by specifying -5
at the right-hand side of the comma within the square bracket. In addition to reassignment, you can also assign NULL
to the attribute that you want to drop. As for removing rows, you can place negative indexes on the left-hand side of comma within the square bracket, and then replace the original dataset with the filtered subset.
There's more…
In addition to data filtering or assigning the specific attribute to NULL
, you can use the within
function to remove unwanted attributes. All you need to do is place the unwanted attribute names inside the rm
function:
> within(employees, rm(birth_date, hire_date)) emp_no first_name last_name gender 1 10001 Georgi Facello M 2 10002 Bezalel Simmel F 3 10003 Parto Bamford M 4 10004 Chirstian Koblick M 5 10005 Kyoichi Maliniak M 6 10006 Anneke Preusig F 7 10007 Tzvetan Zielinski F 8 10008 Saniya Kalloufi M 9 10009 Sumant Peac F 10 10010 Duangkaew Piveteau F
- Instant Testing with CasperJS
- iOS 9 Game Development Essentials
- 認識編程:以Python語言講透編程的本質
- OpenStack Cloud Computing Cookbook(Fourth Edition)
- Practical Windows Forensics
- C語言程序設計
- Oracle 18c 必須掌握的新特性:管理與實戰
- 從零開始學C語言
- Geospatial Development By Example with Python
- HTML+CSS+JavaScript網頁設計從入門到精通 (清華社"視頻大講堂"大系·網絡開發視頻大講堂)
- Oracle Data Guard 11gR2 Administration Beginner's Guide
- Microsoft Dynamics GP 2013 Cookbook
- Python Penetration Testing Essentials
- INSTANT Apache Maven Starter
- TensorFlow.NET實戰