官术网_书友最值得收藏!

Studying the Relationship between Two Numeric Variables

To understand how we can study the relationship between two numeric variables, we can leverage scatter plots. It is a 2-dimensional visualization of the data, where each variable is plotted on an axis along its length. Relationships between the variables are easily identified by studying the trend across the visualization. Let's take a look at an example in the following exercise.

Exercise 30: Studying the Relationship between Employee Variance Rate and Number of Employees

Let's study the relationship between employee variance rate and the number of employees. Ideally, the number of employees should increase as the variation rate increases.

Perform the following steps to complete the exercise:

  1. First, import the ggplot2 package using the following command:

    library(ggplot2)

  2. Create a DataFrame object, df, and use the bank-additional-full.csv file using the following command:

    df <- read.csv("/Chapter 2/Data/bank-additional/bank-additional-full.csv",sep=';')

  3. Now, plot the scatter plot using the following command:

    ggplot(data=df,aes(x=emp.var.rate,y=nr.employed)) + geom_point(size=4) +

    ggtitle("Scatterplot of Employment variation rate v/s Number of Employees")

    The output is as follows:

Figure 2.15: Scatterplot of employment variation versus the number of employees

We use the same base function, ggplot, with a new wrapper for the scatterplot. The geom_point function in ggplot provides the necessary constructs for using a scatterplot.

We can see an overall increasing trend, that is, as employment variance rate increases, we see the number of employees also increases. The fewer number of dots are due to repetitive records in nr.employed.

主站蜘蛛池模板: 怀安县| 北流市| 迁安市| 崇阳县| 安多县| 昭苏县| 微博| 望都县| 康保县| 洛隆县| 德庆县| 东平县| 长宁区| 调兵山市| 海伦市| 瑞安市| 敦煌市| 潼关县| 庆城县| 潍坊市| 宁德市| 棋牌| 汉川市| 嘉兴市| 调兵山市| 武穴市| 延川县| 克什克腾旗| 海口市| 松桃| 奉化市| 宜城市| 西乌| 巩义市| 牟定县| 长治县| 社旗县| 祥云县| 台湾省| 班戈县| 泸西县|