官术网_书友最值得收藏!

Summary

Data processing and wrangling is the initial, and a very important, part of the data science pipeline. It is generally helpful if people preparing data have some domain knowledge about the data, since that will help them stop at the right processing point and use their intuition to build the pipeline better and more quickly. Data processing also requires coming up with innovative solutions and hacks.

In this chapter, you learned how to structure large datasets by arranging them in a tabular form. Then, we got this tabular data into pandas and distributed it between the right columns. Once we were sure that our data was arranged correctly, we combined it with other data sources. We also got rid of duplicates and needless columns, and finally, dealt with missing data. After performing these steps, our data was made ready for analysis and could be put into a data science pipeline directly.

In the next chapter, we will deepen our understanding of pandas and talk about reshaping and analyzing DataFrames for better visualizations and summarizing data. We will also see how to directly solve generic business-critical problems efficiently.

主站蜘蛛池模板: 六盘水市| 丹江口市| 信阳市| 民县| 安远县| 安远县| 温州市| 鸡西市| 穆棱市| 保亭| 宁陕县| 姚安县| 开封县| 阳新县| 蒙城县| 绥德县| 河池市| 米易县| 汝阳县| 榆社县| 黑水县| 赣州市| 荣昌县| 平谷区| 信丰县| 屯昌县| 志丹县| 台州市| 咸宁市| 宜兰县| 南陵县| 茂名市| 永靖县| 凉城县| 湘阴县| 民勤县| 舟山市| 宜丰县| 连平县| 宣威市| 尼勒克县|