官术网_书友最值得收藏!

Adding new records

For those of you familiar with databases, you may already know how to perform an insert operation to append a new record to the dataset. Alternatively, you can use an alter operation to add a new column (attribute) into a table. In R, you can also perform insert and alter operations but much more easily. We will introduce the rbind and cbind function in this recipe so that you can easily append a new record or new attribute to the current dataset with R.

Getting ready

Refer to the Converting data types recipe and convert each attribute of imported data into the proper data type. Also, rename the columns of the employees and salaries datasets by following the steps from the Renaming the data variable recipe.

How to do it…

Perform the following steps to add a new record or new variable into the dataset:

  1. First, use rbind to insert a new record to employees:
    > employees <- rbind(employees, c(10011, '1960-01-01', 'Jhon', 'Doe', 'M', '1988-01-01'))
    
  2. We can then reassign the combined results of the data frame employees and new records back to employees:
    > employees <- rbind(employees, c(10011, '1960-01-01', 'Jhon', 'Doe', 'M', '1988-01-01'))
    
  3. Besides adding a new record to the original dataset, we can add a new position attribute with NA as the default value:
    > cbind(employees, position = NA)
    
  4. Furthermore, we can add a new age attribute, based on a calculation using the current date and birth_date of each employee:
    > span <- interval(ymd(employees$birth_date), now())
    > time_period <- as.period(span)
    > employees$age <- year(time_period)
    
  5. Alternatively, we can use the transform function to add multiple variables:
    > transform(employees, age = year(time_period), position = "RD", marrital = NA)
    

How it works…

Similar to database operations, we can add a new record to the data frame by the schema of the dataset (the number of attributes and data type of each attribute). Here, we first introduced how to use the rbind function to add a new record to a data frame. As the employees dataset consists of six columns, we can add a record with six values to the employees dataset with the rbind function. In the first column, emp_no is in integer format. Thus, we do not have to wrap the input value with single quotes. For the first_name and last_name attributes, we can freely input any character string as a value because we already converted their type to character type. For the last gender attribute, which is in factor type, we can only input either M or F as a value.

In addition to adding a new record to a target dataset, we can add a new variable with the cbind function. To add a new variable, we can assign a variable with a default value while calling cbind. Here, we use NA as the default value for a new position variable. We can also assign the calculated results from other columns as the value of the new variable. In this demonstration, we first computed each employee's age from the current date to their birthday. Then, we used the dollar sign to assign the computed value to a new attribute, age. Besides using the dollar sign to assign a new variable, we can use the transform function to create age, position, and marital variables in the employees dataset.

There's more…

Besides using the dollar sign and transform function, we can use the with function to create new variables:

> with(employees, year(birth_date))
 [1] 1953 1964 1959 1954 1955 1953 1957 1958 1952 1963
> employees $birth_year <- with(employees, year(birth_date))
主站蜘蛛池模板: 甘孜| 成都市| 华阴市| 白玉县| 宜昌市| 文化| 辰溪县| 原阳县| 体育| 蒙城县| 皮山县| 岢岚县| 司法| 益阳市| 嘉荫县| 冕宁县| 三明市| 延川县| 图片| 鹤峰县| 军事| 庄河市| 齐齐哈尔市| 淳化县| 乌海市| 曲阜市| 合水县| 密山市| 郁南县| 千阳县| 柳州市| 绩溪县| 邯郸市| 岐山县| 白玉县| 扶绥县| 吴川市| 玉山县| 嘉荫县| 泾川县| 临桂县|