官术网_书友最值得收藏!

Most frequently used words

One of the easiest things to analyze about your emails is the most frequently used words. We can create a word cloud to see the most frequently used words. Let's first remove the archived emails:

from wordcloud import WordCloud 

df_no_arxiv = dfs[dfs['from'] != 'no-reply@arXiv.org']
text = ' '.join(map(str, sent['subject'].values))

Next, let's plot the word cloud:

stopwords = ['Re', 'Fwd', '3A_']
wrd = WordCloud(width=700, height=480, margin=0, collocations=False)
for sw in stopwords:
wrd.stopwords.add(sw)
wordcloud = wrd.generate(text)

plt.figure(figsize=(25,15))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)

I added some extra stop words to filter out from the graph. The output for me is as follows:

This tells me what I mostly communicate about. From the analysis of emails from 2011 to 2019, the most frequently used words are new, site, project, Data, WordPress, and website. This is really good, right? What is presented in this chapter is just a starting point. You can take this further in several other directions. 

主站蜘蛛池模板: 聂拉木县| 论坛| 长兴县| 株洲县| 武乡县| 濮阳县| 错那县| 藁城市| 青海省| 布尔津县| 衡山县| 金华市| 合作市| 兴安县| 福鼎市| 同德县| 德阳市| 台北市| 都昌县| 迁安市| 应用必备| 竹山县| 霍州市| 岚皋县| 桐柏县| 久治县| 湖州市| 浦县| 黎平县| 襄汾县| 平阴县| 和静县| 巴彦淖尔市| 林甸县| 新蔡县| 额济纳旗| 遂宁市| 镇原县| 白山市| 顺昌县| 宝应县|