- Hands-On Exploratory Data Analysis with Python
- Suresh Kumar Mukhiya Usman Ahmed
- 215字
- 2021-06-24 16:44:56
Data refactoring
We noticed that the from field contains more information than we need. We just need to extract an email address from that field. Let's do some refactoring:
First of all, import the regular expression package:
import re
2.ext, let's create a function that takes an entire string from any column and extracts an email address:
def extract_email_ID(string):
email = re.findall(r'<(.+?)>', string)
if not email:
email = list(filter(lambda y: '@' in y, string.split()))
return email[0] if email else np.nan
The preceding function is pretty straightforward, right? We have used a regular expression to find an email address. If there is no email address, we populate the field with NaN. Well, if you are not sure about regular expressions, don't worry. Just read the Appendix.
3.ext, let's apply the function to the from column:
dfs['from'] = dfs['from'].apply(lambda x: extract_email_ID(x))
We used the lambda function to apply the function to each and every value in the column.
4.ext, we are going to refactor the label field. The logic is simple. If an email is from your email address, then it is the sent email. Otherwise, it is a received email, that is, an inbox email:
myemail = 'itsmeskm99@gmail.com'
dfs['label'] = dfs['from'].apply(lambda x: 'sent' if x==myemail else 'inbox')
The preceding code is self-explanatory.
- Java程序設計與開發
- AngularJS Testing Cookbook
- SEO智慧
- Mastering Julia
- Podman實戰
- Java EE 7 Development with NetBeans 8
- C語言程序設計實驗指導 (第2版)
- Natural Language Processing with Java and LingPipe Cookbook
- Red Hat Enterprise Linux Troubleshooting Guide
- Natural Language Processing with Python Quick Start Guide
- Machine Learning for Developers
- 大學計算機基礎實訓教程
- Mastering ASP.NET Core 2.0
- 基于MATLAB的控制系統仿真及應用
- JavaScript Mobile Application Development