- R for Data Science Cookbook
- Yu Wei Chiu (David Chiu)
- 743字
- 2021-07-14 10:51:29
Working with the date format
After we have converted each data attribute to the proper data type, we may determine that some attributes in employees
and salaries
are in the date format. Thus, we can calculate the number of years between the employees' date of birth and current year to estimate the age of each employee. Here, we will show you how to use some built-in date functions and the lubridate
package to manipulate date format data.
Getting ready
Refer to the previous recipe and convert each attribute of imported data into the correct data type. Also, you have to rename the columns of the employees
and salaries
datasets by following the steps from the Renaming the data variable recipe.
How to do it…
Perform the following steps to work with the date format in employees
and salaries
:
- We can add or subtract days on the date format attribute using the following:
> employees$hire_date + 30
- We can obtain time differences in days between
hire_date
andbirth_date
using the following:> employees$hire_date - employees$birth_date Time differences in days [1] 11985 7842 9765 11902 12653 13192 11586 13357 [9] 11993 9581
- Besides getting time differences in days, we can obtain differences in weeks using the
difftime
function:> difftime(employees$hire_date, employees$birth_date, unit="weeks") Time differences in weeks [1] 1712.143 1120.286 1395.000 1700.286 1807.571 [6] 1884.571 1655.143 1908.143 1713.286 1368.714
- In addition to built-in date operation functions, we can install and load the
lubridate
package to manipulate dates:> install.packages("lubridate") > library(lubridate)
- Next, we can convert date data to POSIX format using the
ymd
function:> ymd(employees$hire_date)
- Then, we can examine the period between
hire_date
andbirth_date
using theas.period
function:> span <- interval(ymd(employees$birth_date), ymd(employees$hire_date)) > time_period <- as.period(span)
- Furthermore, we can obtain time difference in the using with the
year
function:> year(time_period)
- Moreover, we can retrieve the current date using the
now
function:> now()
- Finally, we can now calculate the age of each employee using the following:
> span2 <- interval(now() , ymd(employees$birth_date)) > year(as.period(span2))
How it works…
After following the steps in the previous section, both employees
data and salaries
data should now be renamed, and the data type of each attribute should have already been converted to the proper data type. As some of the attributes are in the date format, we can then use some date functions to calculate the time difference in days between these attributes.
Date type data allows arithmetic operations; we can add or subtract some days from its value. Thus, we first demonstrate that we can add 30 to hire_date
. Then, we can check whether 30 more days have been added to all hire dates. Next, we can calculate the time difference in days between the birth_date
and hire_date
attributes in order to find out the age at which each employee started working at that company. However, the minus operation can only show us the time differences in days; we need to perform more calculations to change the differences in time from days to a different measurement. Thus, we can use the difftime
function to determine time differences in a different unit (for example, hours, days, and weeks). While difftime
provides more measurement choices, we still need to make some further calculations to obtain the difference in months and years.
To simplify date computation, we can use a convenient lubridate
date operation package. As the data is in year-month-date format, we can use the ymd
function to convert the data to POSIX format first. Then, we can use an interval function to calculate the time span between hire_date
and birth_date
. Subsequently, we can use the as.period
function to compute the period of the time span. This allows us to use the year function to obtain the number of years between each employee's birthday and hire date.
Finally, to calculate the age of the employee, we can use the now
function to obtain the current time. We then use interval
to obtain the time interval between the birth date of the employee and the current date. With this information, we can finally use the year
function to obtain the actual age of the employee.
There's more…
When using the lubridate
package (version 1.3.3), you might find the following error message:
Error in (function (..., deparse.level = 1) : (converted from warning) number of columns of result is not a multiple of vector length (arg 3)
This error message occurs due to a locale configuration bug. You can fix the problem by setting locale to English_United States.1252
:
> Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252") [1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
- Learning Apex Programming
- Xcode 7 Essentials(Second Edition)
- 深入淺出Android Jetpack
- 征服RIA
- C#程序設計基礎:教程、實驗、習題
- Hands-On Automation Testing with Java for Beginners
- Solr Cookbook(Third Edition)
- C++20高級編程
- QGIS 2 Cookbook
- Magento 2 Beginners Guide
- Tableau Desktop可視化高級應用
- Raspberry Pi Blueprints
- Java網絡編程實用精解
- 深度學習的數學:使用Python語言
- Isomorphic JavaScript Web Development