官术网_书友最值得收藏!

Summary

Data processing and wrangling is the initial, and a very important, part of the data science pipeline. It is generally helpful if people preparing data have some domain knowledge about the data, since that will help them stop at the right processing point and use their intuition to build the pipeline better and more quickly. Data processing also requires coming up with innovative solutions and hacks.

In this chapter, you learned how to structure large datasets by arranging them in a tabular form. Then, we got this tabular data into pandas and distributed it between the right columns. Once we were sure that our data was arranged correctly, we combined it with other data sources. We also got rid of duplicates and needless columns, and finally, dealt with missing data. After performing these steps, our data was made ready for analysis and could be put into a data science pipeline directly.

In the next chapter, we will deepen our understanding of pandas and talk about reshaping and analyzing DataFrames for better visualizations and summarizing data. We will also see how to directly solve generic business-critical problems efficiently.

主站蜘蛛池模板: 淮南市| 湖口县| 乌海市| 文登市| 紫金县| 镇坪县| 泽州县| 乌审旗| 乌海市| 临清市| 都匀市| 泊头市| 灵石县| 三台县| 左贡县| 赫章县| 日喀则市| 岗巴县| 陇西县| 平乐县| 牡丹江市| 乌鲁木齐市| 兖州市| 大连市| 安西县| 长丰县| 阆中市| 彭山县| 静海县| 莒南县| 德阳市| 涞水县| 宁远县| 景东| 安宁市| 济宁市| 探索| 遂平县| 肃南| 芜湖县| 琼海市|