官术网_书友最值得收藏!

Loading data into memory – viewing and managing with ease using pandas

First, we will need to load data into memory so that Python can interact with it. Pandas will be our data management and manipulation library:

# load data into Pandas
import pandas as pd
df = pd.read_csv("./data/iris.csv")

Let's use some built-in pandas features to do sanity checks on our data load and make sure that we've loaded everything properly. First, we use the .shape attribute to check the size of the data printed (as rows and columns). Next, we sanity check the contents of the DataFrame with the .head() method, which returns the first five lines in a new and smaller DataFrame for easy viewing. Finally, we can use the .describe() method to show some summary statistics for each feature. 

Pandas has many more sanity check and quick view features. For example, .tail() will return the final five lines of the data. Becoming proficient in pandas is undoubtedly worth the time investment. The dedicated chapter that appears later in the book is a good place to start, as well as the essential basic functionality (https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html) page on the pandas documentation site.
# sanity check with Pandas
print("shape of data in (rows, columns) is " + str(df.shape))
print(df.head())
print(df.describe().transpose())

You will see the following output after executing the preceding code:

主站蜘蛛池模板: 二连浩特市| 阳新县| 乌兰浩特市| 祁连县| 南和县| 金寨县| 淳安县| 漠河县| 武汉市| 驻马店市| 水富县| 鲜城| 霍林郭勒市| 丰原市| 嘉鱼县| 沅江市| 湖口县| 镇原县| 兴海县| 蚌埠市| 金溪县| 宁津县| 河曲县| 林西县| 邢台县| 塔河县| 新龙县| 安塞县| 蒙城县| 奇台县| 文水县| 宜阳县| 平昌县| 元氏县| 东方市| 广平县| 长岛县| 白河县| 黎川县| 久治县| 广丰县|