官术网_书友最值得收藏!

Introduction

In the previous chapter, we saw how to transform data and attributes obtained from raw sources into expected attributes and values through pandas. After structuring data into a tabular form, with each field containing the expected (correct and clean) values, we can say that this data is prepared for further analysis, which involves utilizing the prepared data to solve business problems. To ensure the best outcomes for a project, we need to be clear about the scope of the data, the questions we can address with it, and what problems we can solve with it before we can make any useful inference from the data.

To do that, not only do we need to understand the kind of data we have, but also the way some attributes are related to other attributes, what attributes are useful for us, and how they vary in the data provided. Performing this analysis on data and exploring ways we can use it, is not a straightforward task. We have to perform several initial exploratory tests on our data. Then, we need to interpret their results and possibly create and analyze more statistics and visualizations before we make a statement about the scope or analysis of the dataset. In data science pipelines, this process is referred to as Exploratory Data Analysis.

In this chapter, we will go through techniques to explore and analyze data by means of solving some problems critical for businesses, such as identifying attributes useful for marketing, analyzing key performance indicators, performing comparative analyses, and generating insights and visualizations. We will use the pandas, Matplotlib, and seaborn libraries in Python to solve these problems.

主站蜘蛛池模板: 盐山县| 获嘉县| 开鲁县| 进贤县| 皮山县| 行唐县| 海口市| 嘉定区| 大竹县| 汶川县| 阜阳市| 隆昌县| 天津市| 永宁县| 井冈山市| 罗源县| 垣曲县| 泰安市| 平乡县| 眉山市| 平潭县| 安塞县| 崇信县| 巫溪县| 鱼台县| 满洲里市| 黑龙江省| 文水县| 盐亭县| 南通市| 江北区| 余姚市| 保定市| 连江县| 潮州市| 陈巴尔虎旗| 五指山市| 墨脱县| 富裕县| 潞西市| 都安|