官术网_书友最值得收藏!

Steps in EDA

Having understood what EDA is, and its significance, let's understand the various steps involved in data analysis. Basically, it involves four different steps. Let's go through each of them to get a brief understanding of each step:

Problem definition: Before trying to extract useful insight from the data, it is essential to define the business problem to be solved. The problem definition works as the driving force for a data analysis plan execution. The main tasks involved in problem definition are defining the main objective of the analysis, defining the main deliverables, outlining the main roles and responsibilities, obtaining the current status of the data, defining the timetable, and performing cost/benefit analysis. Based on such a problem definition, an execution plan can be created.

Data preparation: This step involves methods for preparing the dataset before actual analysis. In this step, we define the sources of data, define data schemas and tables, understand the main characteristics of the data, clean the dataset, delete non-relevant datasets, transform the data, and divide the data into required chunks for analysis.

Data analysis: This is one of the most crucial steps that deals with descriptive statistics and analysis of the data. The main tasks involve summarizing the data, finding the hidden correlation and relationships among the data, developing predictive models, evaluating the models, and calculating the accuracies. Some of the techniques used for data summarization are summary tables, graphs, descriptive statistics, inferential statistics, correlation statistics, searching, grouping, and mathematical models.

Development and representation of the results: This step involves presenting the dataset to the target audience in the form of graphs, summary tables, maps, and diagrams. This is also an essential step as the result analyzed from the dataset should be interpretable by the business stakeholders, which is one of the major goals of EDA. Most of the graphical analysis techniques include scattering plots, character plots, histograms, box plots, residual plots, mean plots, and others. We will explore several types of graphical representation in Chapter 2, Visual Aids for EDA.   

主站蜘蛛池模板: 岳西县| 卫辉市| 安庆市| 岳阳市| 布拖县| 息烽县| 吉安县| 平果县| 石楼县| 威信县| 鄂尔多斯市| 博兴县| 齐齐哈尔市| 武川县| 富宁县| 扬州市| 揭西县| 麻栗坡县| 枣阳市| 杭州市| 札达县| 青阳县| 凤冈县| 抚宁县| 丰城市| 项城市| 永登县| 黄骅市| 兴安县| 贡山| 寿光市| 道孚县| 昆山市| 昌乐县| 高碑店市| 南投县| 吉首市| 苍南县| 长乐市| 青龙| 海阳市|