- The Data Analysis Workshop
- Gururajan Govindan Shubhangi Hora Konstantin Palagachev
- 339字
- 2021-06-18 18:18:26
Initial Analysis of the Reason for Absence
Let's start with a simple analysis of the Reason for absence column. We will try to address questions such as, what is the most common reason for absence? Does being a drinker or smoker have some effect on the causes? Does the distance to work have some effect on the reasons? And so on. Starting with these types of questions is often important when performing data analysis, as this is a good way to obtain confidence and understanding of the data.
The first thing we are interested in is the overall distribution of the absence reasons in the data—that is, how many entries we have for a specific reason for absence in our dataset. We can easily address this question by using the countplot() function from the seaborn package:
# get the number of entries for each reason for absence
plt.figure(figsize=(10, 5))
ax = sns.countplot(data=preprocessed_data, x="Reason for absence")
ax.set_ylabel("Number of entries per reason of absence")
plt.savefig('figs/absence_reasons_distribution.png', \
format='png', dpi=300)
The output will be as follows:

Figure 2.6: Number of entries for all reasons for absence
Note that we also used the Disease column as the hue parameter. This helps us to distinguish between disease-related reasons (listed in the ICD encoding) and those that aren't. Following Figure 2.3, we can assert that the most frequent reasons for absence are related to medical consultations (23), dental consultations (28), and physiotherapy (27). On the other hand, the most frequent reasons for absence encoded in the ICD encoding are related to diseases of the musculoskeletal system and connective tissue (13) and injury, poisoning, and certain other consequences of external causes (19).
In order to perform a more accurate and in-depth analysis of the data, we will investigate the impact of the various features on the Reason for absence and Absenteeism in hours columns in the following sections. First, we will analyze the data on social drinkers and smokers in the next section.
- Java Web開發(fā)學習手冊
- Git Version Control Cookbook
- 程序設計與實踐(VB.NET)
- 測試驅動開發(fā):入門、實戰(zhàn)與進階
- CentOS 7 Server Deployment Cookbook
- PHP 7底層設計與源碼實現(xiàn)
- Cocos2d-x游戲開發(fā):手把手教你Lua語言的編程方法
- Python程序設計案例教程
- Flash CS6中文版應用教程(第三版)
- 人人都是網站分析師:從分析師的視角理解網站和解讀數(shù)據
- 深入RabbitMQ
- 從0到1:Python數(shù)據分析
- Access 2010中文版項目教程
- Building Dynamics CRM 2015 Dashboards with Power BI
- C++程序設計教程(第2版)