- Hands-On Exploratory Data Analysis with Python
- Suresh Kumar Mukhiya Usman Ahmed
- 180字
- 2021-06-24 16:44:56
Applying descriptive statistics
Having preprocessed the dataset, let's do some sanity checking using descriptive statistics techniques.
We can implement this as shown here:
dfs.info()
The output of the preceding code is as follows:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 37554 entries, 1 to 78442
Data columns (total 6 columns):
subject 37367 non-null object
from 37554 non-null object
date 37554 non-null datetime64[ns, UTC]
to 36882 non-null object
label 36962 non-null object
thread 37554 non-null object
dtypes: datetime64[ns, UTC](1), object(5)
memory usage: 2.0+ MB
We will learn more about descriptive statistics in Chapter 5, Descriptive Statistics. Note that there are 37,554 emails, with each email containing six columns—subject, from, date, to, label, and thread. Let's check the first few entries of the email dataset:
dfs.head(10)
The output of the preceding code is as follows:
Note that our dataframe so far contains six different columns. Take a look at the from field: it contains both the name and the email. For our analysis, we only need an email address. We can use a regular expression to refactor the column.
推薦閱讀
- C語言程序設計(第2版)
- Web Development with Django Cookbook
- INSTANT Sencha Touch
- Production Ready OpenStack:Recipes for Successful Environments
- Servlet/JSP深入詳解
- 編寫高質量代碼:改善C程序代碼的125個建議
- Troubleshooting PostgreSQL
- 用戶體驗增長:數字化·智能化·綠色化
- SQL Server 2016數據庫應用與開發
- 執劍而舞:用代碼創作藝術
- Bootstrap 4 Cookbook
- Lighttpd源碼分析
- 持續輕量級Java EE開發:編寫可測試的代碼
- Python開發基礎
- Python一行流:像專家一樣寫代碼