官术网_书友最值得收藏!

Initial Analysis of the Reason for Absence

Let's start with a simple analysis of the Reason for absence column. We will try to address questions such as, what is the most common reason for absence? Does being a drinker or smoker have some effect on the causes? Does the distance to work have some effect on the reasons? And so on. Starting with these types of questions is often important when performing data analysis, as this is a good way to obtain confidence and understanding of the data.

The first thing we are interested in is the overall distribution of the absence reasons in the data—that is, how many entries we have for a specific reason for absence in our dataset. We can easily address this question by using the countplot() function from the seaborn package:

# get the number of entries for each reason for absence

plt.figure(figsize=(10, 5))

ax = sns.countplot(data=preprocessed_data, x="Reason for absence")

ax.set_ylabel("Number of entries per reason of absence")

plt.savefig('figs/absence_reasons_distribution.png', \

            format='png', dpi=300)

The output will be as follows:

Figure 2.6: Number of entries for all reasons for absence

Note that we also used the Disease column as the hue parameter. This helps us to distinguish between disease-related reasons (listed in the ICD encoding) and those that aren't. Following Figure 2.3, we can assert that the most frequent reasons for absence are related to medical consultations (23), dental consultations (28), and physiotherapy (27). On the other hand, the most frequent reasons for absence encoded in the ICD encoding are related to diseases of the musculoskeletal system and connective tissue (13) and injury, poisoning, and certain other consequences of external causes (19).

In order to perform a more accurate and in-depth analysis of the data, we will investigate the impact of the various features on the Reason for absence and Absenteeism in hours columns in the following sections. First, we will analyze the data on social drinkers and smokers in the next section.

主站蜘蛛池模板: 泾源县| 望都县| SHOW| 腾冲县| 车致| 永丰县| 论坛| 汤阴县| 定结县| 四子王旗| 江口县| 江西省| 壶关县| 彝良县| 莫力| 嵊泗县| 五华县| 梅州市| 策勒县| 平凉市| 鸡东县| 巨野县| 凤山市| 长岭县| 灵宝市| 正宁县| 大邑县| 班戈县| SHOW| 那坡县| 肇源县| 南开区| 江川县| 冕宁县| 峨眉山市| 汕头市| 七台河市| 富源县| 罗甸县| 孟津县| 尉氏县|