官术网_书友最值得收藏!

Introduction

In the previous chapter, we looked at some of the main techniques that are used in data analysis. We saw how hypothesis testing can be used when analyzing data, we got a brief introduction to visualizations, and finally, we explored some concepts related to time series analysis. In this chapter, we will elaborate on some of the topics we've already looked at (such as plotting and hypothesis testing) while introducing new ones coming from probability theory and data transformations.

Nowadays, work relationships are becoming more and more trust-oriented, and conservative contracts (in which working time is strictly monitored) are being replaced with more agile ones in which the employee themselves is responsible for accounting working time. This liberty may lead to unregulated absenteeism and may reflect poorly on an employee's candidature, even if absent hours can be accounted for with genuine reasons. This can significantly undermine healthy working relationships. Furthermore, unregulated absenteeism can also have a negative impact on work productivity.

In this chapter, we'll analyze absenteeism data from a Brazilian courier company, collected between July 2007 and July 2010.

Note

The original dataset can be found here: https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work.

If you're interested, take a look at the following paper, which talks about the problem from a machine learning perspective: Martiniano, A., Ferreira, R.P., Sassi, R.J., & Affonso, C. (2012). Application of neuro fuzz network on prediction of absenteeism at work. In Information Systems and Technologies (CISTI), 7th Iberian Conference on (pp. 1-4). IEEE.

This dataset can also be found on our GitHub repository here: https://packt.live/3e4rorX.

Our goal is to discover hidden patterns in the data, which might be useful for distinguishing genuine work absences from fraudulent ones. During this chapter, the following topics will be addressed:

  • Introduction to probability, conditional probability, and Bayes' theorem
  • Kolmogorov-Smirnov tests for equality of probability distributions
  • Box-Cox and Yeo-Johnson transformations

We will apply these techniques to our analysis as we try to identify the main drivers for absenteeism.

主站蜘蛛池模板: 蒲江县| 曲周县| 马尔康县| 长春市| 甘谷县| 鄂州市| 莆田市| 永胜县| 永德县| 南召县| 五大连池市| 临海市| 九龙县| 汝城县| 大关县| 大兴区| 德惠市| 唐山市| 黑山县| 罗源县| 陆河县| 大竹县| 白河县| 修武县| 仙游县| 乌拉特前旗| 牙克石市| 龙游县| 庆安县| 灌阳县| 平江县| 涡阳县| 乾安县| 甘谷县| 双峰县| 古田县| 延边| 尼木县| 肥城市| 灵丘县| 曲周县|