官术网_书友最值得收藏!

Basics – summary, dimensions, and structure

After reading in the data, there are certain tasks that need to be performed to get the touch and feel of the data:

  • To check whether the data has read in correctly or not
  • To determine how the data looks; its shape and size
  • To summarize and visualize the data
  • To get the column names and summary statistics of numerical variables

Let us go back to the example of the Titanic dataset and import it again. The head() method is used to look at the first first few rows of the data, as shown:

import pandas as pd
data=pd.read_csv('E:/Personal/Learning/Datasets/Book/titanic3.csv')
data.head()

The result will look similar to the following screenshot:

Fig. 2.6: Thumbnail view of the Titanic dataset obtained using the head() method

In the head() method, one can also specify the number of rows they want to see. For example, head(10) will show the first 10 rows.

The next attribute of the dataset that concerns us is its dimension, that is the number of rows and columns present in the dataset. This can be obtained by typing data.shape.

The result obtained is (1310,14), indicating that the dataset has 1310 rows and 14 columns.

As discussed earlier, the column names of a data frame can be listed using data.column.values, which gives the following output as the result:

Fig. 2.7: Column names of the the Titanic dataset

Another important thing to do while glancing at the data is to create summary statistics for the numerical variables. This can be done by:

data.describe()

We get the following result:

Fig. 2.8: Summary statistics for the numerical variables in the Titanic dataset

Knowing the type each column belongs to is the key to determine their behavior under some numerical or manipulation operation. Hence, it is of critical importance to know the type of each column. This can be done as follows:

data.dtypes

We get the following result from the preceding code snippet:

Fig. 2.9: Variable types of the columns in the Titanic dataset

主站蜘蛛池模板: 荃湾区| 荣昌县| 丹凤县| 五大连池市| 罗江县| 垫江县| 辉南县| 凯里市| 潍坊市| 公主岭市| 永州市| 吉隆县| 依安县| 隆德县| 格尔木市| 登封市| 汪清县| 阜平县| 璧山县| 河曲县| 河池市| 昂仁县| 宾川县| 乌拉特中旗| 荔浦县| 乐至县| 九龙县| 石家庄市| 青冈县| 信丰县| 邵东县| 无为县| 来安县| 富阳市| 盐城市| 黄山市| 囊谦县| 新泰市| 景宁| 宣汉县| 景洪市|