- The Data Analysis Workshop
- Gururajan Govindan Shubhangi Hora Konstantin Palagachev
- 324字
- 2021-06-18 18:18:25
Introduction
In the previous chapter, we looked at some of the main techniques that are used in data analysis. We saw how hypothesis testing can be used when analyzing data, we got a brief introduction to visualizations, and finally, we explored some concepts related to time series analysis. In this chapter, we will elaborate on some of the topics we've already looked at (such as plotting and hypothesis testing) while introducing new ones coming from probability theory and data transformations.
Nowadays, work relationships are becoming more and more trust-oriented, and conservative contracts (in which working time is strictly monitored) are being replaced with more agile ones in which the employee themselves is responsible for accounting working time. This liberty may lead to unregulated absenteeism and may reflect poorly on an employee's candidature, even if absent hours can be accounted for with genuine reasons. This can significantly undermine healthy working relationships. Furthermore, unregulated absenteeism can also have a negative impact on work productivity.
In this chapter, we'll analyze absenteeism data from a Brazilian courier company, collected between July 2007 and July 2010.
Note
The original dataset can be found here: https://archive.ics.uci.edu/ml/datasets/Absenteeism+at+work.
If you're interested, take a look at the following paper, which talks about the problem from a machine learning perspective: Martiniano, A., Ferreira, R.P., Sassi, R.J., & Affonso, C. (2012). Application of neuro fuzz network on prediction of absenteeism at work. In Information Systems and Technologies (CISTI), 7th Iberian Conference on (pp. 1-4). IEEE.
This dataset can also be found on our GitHub repository here: https://packt.live/3e4rorX.
Our goal is to discover hidden patterns in the data, which might be useful for distinguishing genuine work absences from fraudulent ones. During this chapter, the following topics will be addressed:
- Introduction to probability, conditional probability, and Bayes' theorem
- Kolmogorov-Smirnov tests for equality of probability distributions
- Box-Cox and Yeo-Johnson transformations
We will apply these techniques to our analysis as we try to identify the main drivers for absenteeism.
- TensorFlow Lite移動端深度學習
- AWS Serverless架構:使用AWS從傳統部署方式向Serverless架構遷移
- PyTorch Artificial Intelligence Fundamentals
- Python高級機器學習
- ArcGIS By Example
- Learn React with TypeScript 3
- Learning Hunk
- Visual C#.NET程序設計
- 大數據分析與應用實戰:統計機器學習之數據導向編程
- Getting Started with Eclipse Juno
- Learning YARN
- R數據科學實戰:工具詳解與案例分析
- Emgu CV Essentials
- R語言數據可視化:科技圖表繪制
- Mastering jQuery Mobile