官术网_书友最值得收藏!

Data cleansing

 Let's create a CSV file with only the required fields. Let's start with the following steps:

Import the csv package:

import csv

2.reate a CSV file with only the required attributes:

with open('mailbox.csv', 'w') as outputfile:
writer = csv.writer(outputfile)
writer.writerow(['subject','from','date','to','label','thread'])

for message in mbox:
writer.writerow([
message['subject'],
message['from'],
message['date'],
message['to'],
message['X-Gmail-Labels'],
message['X-GM-THRID']
]
)

The preceding output is a csv file named mailbox.csv. Next, instead of loading the mbox file, we can use the CSV file for loading, which will be smaller than the original dataset.

主站蜘蛛池模板: 随州市| 陇西县| 临湘市| 北票市| 防城港市| 抚远县| 留坝县| 德钦县| 洪泽县| 德令哈市| 开远市| 靖安县| 贺兰县| 高陵县| 睢宁县| 沾化县| 尼玛县| 蒲城县| 时尚| 汕尾市| 景泰县| 封丘县| 澎湖县| 榆社县| 垫江县| 临邑县| 象山县| 来安县| 柳林县| 务川| 安陆市| 铜川市| 新丰县| 普格县| 杭锦旗| 琼结县| 西平县| 志丹县| 漳州市| 贵南县| 宾川县|