官术网_书友最值得收藏!

Loading data into memory – viewing and managing with ease using pandas

First, we will need to load data into memory so that Python can interact with it. Pandas will be our data management and manipulation library:

# load data into Pandas
import pandas as pd
df = pd.read_csv("./data/iris.csv")

Let's use some built-in pandas features to do sanity checks on our data load and make sure that we've loaded everything properly. First, we use the .shape attribute to check the size of the data printed (as rows and columns). Next, we sanity check the contents of the DataFrame with the .head() method, which returns the first five lines in a new and smaller DataFrame for easy viewing. Finally, we can use the .describe() method to show some summary statistics for each feature. 

Pandas has many more sanity check and quick view features. For example, .tail() will return the final five lines of the data. Becoming proficient in pandas is undoubtedly worth the time investment. The dedicated chapter that appears later in the book is a good place to start, as well as the essential basic functionality (https://pandas.pydata.org/pandas-docs/stable/getting_started/basics.html) page on the pandas documentation site.
# sanity check with Pandas
print("shape of data in (rows, columns) is " + str(df.shape))
print(df.head())
print(df.describe().transpose())

You will see the following output after executing the preceding code:

主站蜘蛛池模板: 井研县| 师宗县| 渭南市| 平原县| 海晏县| 永胜县| 乡宁县| 梁河县| 余姚市| 周口市| 东平县| 延津县| 合肥市| 阿拉善左旗| 偃师市| 鄂尔多斯市| 友谊县| 建水县| 兴业县| 桃江县| 永修县| 桑日县| 岳西县| 前郭尔| 文昌市| 山阴县| 汝阳县| 香格里拉县| 勃利县| 咸丰县| 高阳县| 襄樊市| 玛沁县| 普兰县| 通河县| 望城县| 富顺县| 绥德县| 台中县| 天祝| 溧水县|