官术网_书友最值得收藏!

Introduction

Chapter 1, R for Advanced Analytics, introduced to you the R language and its ecosystem for data science. We are now ready to enter a crucial part of data science and machine learning, that is, Exploratory Data Analysis (EDA), the art of understanding the data.

In this chapter, we will approach EDA with the same banking dataset used in the previous chapter, but in a more problem-centric way. We will start by defining the problem statement with industry standard artifacts, design a solution for the problem, and learn how EDA fits in the larger problem framework. We will then tackle the EDA for the direct marketing campaigns (phone calls) of a Portuguese banking institution use case using a combination of data engineering, data wrangling, and data visualization techniques in R, backed up by a business-centric approach.

In any data science use case, understanding the data consumes the bulk of the time and effort. Most data science professionals spend around 80% of their time understanding data. Given that this is the most crucial part of your journey, it is important to have a macro-view of the overall process for any data science use case.

A typical data science use case takes the path of a core business-analytics problem or a machine-learning problem. With either path approached, EDA is inevitable. Figure 2.1 demonstrates the life cycle of a basic data science use case. It starts by defining the problem statement using one or more standard frameworks, and then it delves into data gathering and reaches EDA. The majority of efforts and time in any project is consumed in EDA. Once the process of understanding the data is complete, a project may take a different path based on the scope of the use case. In most business analytics-based use cases, the next step is to assimilate all the observed patterns into meaningful insights. Though this might sound trivial, it is an iterative and arduous task. This step then evolves into story-telling, where the condensed insights are tailored into a meaningful story for the business stakeholders. Similarly, in scenarios where the objective is to develop a predictive model, the next step would be to actually develop a machine learning model and then deploy it into a production system/product.

Figure 2.1: Life cycle of a data science use case

Let's take a brief look at the first step, Defining the Problem Statement.

主站蜘蛛池模板: 铅山县| 新巴尔虎右旗| 富锦市| 铅山县| 吉木萨尔县| 永兴县| 星座| 施甸县| 新乡市| 枣强县| 新乡县| 通海县| 平和县| 台南县| 无棣县| 碌曲县| 南陵县| 沛县| 巴塘县| 宕昌县| 平乡县| 宜兰市| 桦甸市| 朔州市| 鹤岗市| 宁陵县| 宜城市| 长乐市| 泗洪县| 靖州| 雅安市| 惠安县| 繁峙县| 称多县| 巫山县| 克山县| 淄博市| 泽州县| 佛山市| 唐海县| 龙川县|