官术网_书友最值得收藏!

Working with the date format

After we have converted each data attribute to the proper data type, we may determine that some attributes in employees and salaries are in the date format. Thus, we can calculate the number of years between the employees' date of birth and current year to estimate the age of each employee. Here, we will show you how to use some built-in date functions and the lubridate package to manipulate date format data.

Getting ready

Refer to the previous recipe and convert each attribute of imported data into the correct data type. Also, you have to rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.

How to do it…

Perform the following steps to work with the date format in employees and salaries:

  1. We can add or subtract days on the date format attribute using the following:
    > employees$hire_date + 30
    
  2. We can obtain time differences in days between hire_date and birth_date using the following:
    > employees$hire_date - employees$birth_date
    Time differences in days
     [1] 11985 7842 9765 11902 12653 13192 11586 13357
     [9] 11993 9581
    
  3. Besides getting time differences in days, we can obtain differences in weeks using the difftime function:
    > difftime(employees$hire_date, employees$birth_date, unit="weeks")
    Time differences in weeks
     [1] 1712.143 1120.286 1395.000 1700.286 1807.571
     [6] 1884.571 1655.143 1908.143 1713.286 1368.714
    
  4. In addition to built-in date operation functions, we can install and load the lubridate package to manipulate dates:
    > install.packages("lubridate")
    > library(lubridate)
    
  5. Next, we can convert date data to POSIX format using the ymd function:
    > ymd(employees$hire_date)
    
  6. Then, we can examine the period between hire_date and birth_date using the as.period function:
    > span <- interval(ymd(employees$birth_date), ymd(employees$hire_date))
    > time_period <- as.period(span)
    
  7. Furthermore, we can obtain time difference in the using with the year function:
    > year(time_period)
    
  8. Moreover, we can retrieve the current date using the now function:
    > now()
    
  9. Finally, we can now calculate the age of each employee using the following:
    > span2 <- interval(now() , ymd(employees$birth_date))
    > year(as.period(span2))
    

How it works…

After following the steps in the previous section, both employees data and salaries data should now be renamed, and the data type of each attribute should have already been converted to the proper data type. As some of the attributes are in the date format, we can then use some date functions to calculate the time difference in days between these attributes.

Date type data allows arithmetic operations; we can add or subtract some days from its value. Thus, we first demonstrate that we can add 30 to hire_date. Then, we can check whether 30 more days have been added to all hire dates. Next, we can calculate the time difference in days between the birth_date and hire_date attributes in order to find out the age at which each employee started working at that company. However, the minus operation can only show us the time differences in days; we need to perform more calculations to change the differences in time from days to a different measurement. Thus, we can use the difftime function to determine time differences in a different unit (for example, hours, days, and weeks). While difftime provides more measurement choices, we still need to make some further calculations to obtain the difference in months and years.

To simplify date computation, we can use a convenient lubridate date operation package. As the data is in year-month-date format, we can use the ymd function to convert the data to POSIX format first. Then, we can use an interval function to calculate the time span between hire_date and birth_date. Subsequently, we can use the as.period function to compute the period of the time span. This allows us to use the year function to obtain the number of years between each employee's birthday and hire date.

Finally, to calculate the age of the employee, we can use the now function to obtain the current time. We then use interval to obtain the time interval between the birth date of the employee and the current date. With this information, we can finally use the year function to obtain the actual age of the employee.

There's more…

When using the lubridate package (version 1.3.3), you might find the following error message:

Error in (function (..., deparse.level = 1) : 
 (converted from warning) number of columns of result is not a multiple of vector length (arg 3)

This error message occurs due to a locale configuration bug. You can fix the problem by setting locale to English_United States.1252:

 > Sys.setlocale(category = "LC_ALL", locale = "English_United States.1252")
[1] "LC_COLLATE=English_United States.1252;LC_CTYPE=English_United States.1252;LC_MONETARY=English_United States.1252;LC_NUMERIC=C;LC_TIME=English_United States.1252"
主站蜘蛛池模板: 西青区| 商丘市| 黄大仙区| 吉安县| 洛隆县| 德化县| 佛冈县| 绩溪县| 神农架林区| 喀什市| 永仁县| 永宁县| 涿鹿县| 彭阳县| 三明市| 寿光市| 尼勒克县| 思南县| 景谷| 邮箱| 开原市| 吉安市| 浙江省| 延吉市| 乌兰浩特市| 衡南县| 兰溪市| 多伦县| 东乡族自治县| 元氏县| 宁强县| 沁阳市| 商都县| 岢岚县| 洛南县| 兴义市| 舞钢市| 呈贡县| 遵义县| 漳州市| 资溪县|