官术网_书友最值得收藏!

  • The Data Analysis Workshop
  • Gururajan Govindan Shubhangi Hora Konstantin Palagachev
  • 324字
  • 2021-06-18 18:18:25

Introduction

In the previous chapter, we looked at some of the main techniques that are used in data analysis. We saw how hypothesis testing can be used when analyzing data, we got a brief introduction to visualizations, and finally, we explored some concepts related to time series analysis. In this chapter, we will elaborate on some of the topics we've already looked at (such as plotting and hypothesis testing) while introducing new ones coming from probability theory and data transformations.

Nowadays, work relationships are becoming more and more trust-oriented, and conservative contracts (in which working time is strictly monitored) are being replaced with more agile ones in which the employee themselves is responsible for accounting working time. This liberty may lead to unregulated absenteeism and may reflect poorly on an employee's candidature, even if absent hours can be accounted for with genuine reasons. This can significantly undermine healthy working relationships. Furthermore, unregulated absenteeism can also have a negative impact on work productivity.

In this chapter, we'll analyze absenteeism data from a Brazilian courier company, collected between July 2007 and July 2010.

Note

The original dataset can be found here: https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work.

If you're interested, take a look at the following paper, which talks about the problem from a machine learning perspective: Martiniano, A., Ferreira, R.P., Sassi, R.J., & Affonso, C. (2012). Application of neuro fuzz network on prediction of absenteeism at work. In Information Systems and Technologies (CISTI), 7th Iberian Conference on (pp. 1-4). IEEE.

This dataset can also be found on our GitHub repository here: https://packt.live/3e4rorX.

Our goal is to discover hidden patterns in the data, which might be useful for distinguishing genuine work absences from fraudulent ones. During this chapter, the following topics will be addressed:

  • Introduction to probability, conditional probability, and Bayes' theorem
  • Kolmogorov-Smirnov tests for equality of probability distributions
  • Box-Cox and Yeo-Johnson transformations

We will apply these techniques to our analysis as we try to identify the main drivers for absenteeism.

主站蜘蛛池模板: 黄骅市| 澄城县| 湾仔区| 铁岭县| 夹江县| 白沙| 揭阳市| 纳雍县| 彝良县| 铜梁县| 蓝山县| 高雄市| 德保县| 长乐市| 岳普湖县| 龙南县| 白沙| 雷州市| 临海市| 大关县| 海安县| 松潘县| 江山市| 南阳市| 宜章县| 米林县| 平江县| 英超| 连城县| 临朐县| 小金县| 上思县| 洱源县| 井陉县| 阿城市| 兴安盟| 运城市| 海兴县| 勐海县| 波密县| 饶平县|