- Hands-On Exploratory Data Analysis with Python
- Suresh Kumar Mukhiya Usman Ahmed
- 180字
- 2021-06-24 16:44:56
Applying descriptive statistics
Having preprocessed the dataset, let's do some sanity checking using descriptive statistics techniques.
We can implement this as shown here:
dfs.info()
The output of the preceding code is as follows:
<class 'pandas.core.frame.DataFrame'>
Int64Index: 37554 entries, 1 to 78442
Data columns (total 6 columns):
subject 37367 non-null object
from 37554 non-null object
date 37554 non-null datetime64[ns, UTC]
to 36882 non-null object
label 36962 non-null object
thread 37554 non-null object
dtypes: datetime64[ns, UTC](1), object(5)
memory usage: 2.0+ MB
We will learn more about descriptive statistics in Chapter 5, Descriptive Statistics. Note that there are 37,554 emails, with each email containing six columns—subject, from, date, to, label, and thread. Let's check the first few entries of the email dataset:
dfs.head(10)
The output of the preceding code is as follows:
Note that our dataframe so far contains six different columns. Take a look at the from field: it contains both the name and the email. For our analysis, we only need an email address. We can use a regular expression to refactor the column.
推薦閱讀
- Mastering JavaScript Functional Programming
- 測試驅(qū)動開發(fā):入門、實戰(zhàn)與進(jìn)階
- 造個小程序:與微信一起干件正經(jīng)事兒
- jQuery EasyUI網(wǎng)站開發(fā)實戰(zhàn)
- Python高效開發(fā)實戰(zhàn):Django、Tornado、Flask、Twisted(第2版)
- HTML5 and CSS3 Transition,Transformation,and Animation
- The Complete Coding Interview Guide in Java
- Swift Playgrounds少兒趣編程
- Linux Shell核心編程指南
- 現(xiàn)代C++編程實戰(zhàn):132個核心技巧示例(原書第2版)
- Windows Embedded CE 6.0程序設(shè)計實戰(zhàn)
- Python:Deeper Insights into Machine Learning
- 分布式數(shù)據(jù)庫原理、架構(gòu)與實踐
- Mastering PowerCLI
- Solr權(quán)威指南(下卷)