官术网_书友最值得收藏!

Introduction

In the previous chapter, we saw how to transform data and attributes obtained from raw sources into expected attributes and values through pandas. After structuring data into a tabular form, with each field containing the expected (correct and clean) values, we can say that this data is prepared for further analysis, which involves utilizing the prepared data to solve business problems. To ensure the best outcomes for a project, we need to be clear about the scope of the data, the questions we can address with it, and what problems we can solve with it before we can make any useful inference from the data.

To do that, not only do we need to understand the kind of data we have, but also the way some attributes are related to other attributes, what attributes are useful for us, and how they vary in the data provided. Performing this analysis on data and exploring ways we can use it, is not a straightforward task. We have to perform several initial exploratory tests on our data. Then, we need to interpret their results and possibly create and analyze more statistics and visualizations before we make a statement about the scope or analysis of the dataset. In data science pipelines, this process is referred to as Exploratory Data Analysis.

In this chapter, we will go through techniques to explore and analyze data by means of solving some problems critical for businesses, such as identifying attributes useful for marketing, analyzing key performance indicators, performing comparative analyses, and generating insights and visualizations. We will use the pandas, Matplotlib, and seaborn libraries in Python to solve these problems.

主站蜘蛛池模板: 襄汾县| 岳池县| 余庆县| 合水县| 元江| 东丽区| 介休市| 昔阳县| 万荣县| 宁城县| 高陵县| 澄江县| 涪陵区| 鹤山市| 青阳县| 双江| 揭东县| 宿州市| 商都县| 武穴市| 包头市| 鄯善县| 海安县| 固原市| 浦江县| 且末县| 手游| 庆元县| 郧西县| 德格县| 河间市| 昌宁县| 安国市| 安图县| 梁山县| 株洲市| 昭平县| 通海县| 丘北县| 青川县| 眉山市|