官术网_书友最值得收藏!

Most frequently used words

One of the easiest things to analyze about your emails is the most frequently used words. We can create a word cloud to see the most frequently used words. Let's first remove the archived emails:

from wordcloud import WordCloud 

df_no_arxiv = dfs[dfs['from'] != 'no-reply@arXiv.org']
text = ' '.join(map(str, sent['subject'].values))

Next, let's plot the word cloud:

stopwords = ['Re', 'Fwd', '3A_']
wrd = WordCloud(width=700, height=480, margin=0, collocations=False)
for sw in stopwords:
wrd.stopwords.add(sw)
wordcloud = wrd.generate(text)

plt.figure(figsize=(25,15))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)

I added some extra stop words to filter out from the graph. The output for me is as follows:

This tells me what I mostly communicate about. From the analysis of emails from 2011 to 2019, the most frequently used words are new, site, project, Data, WordPress, and website. This is really good, right? What is presented in this chapter is just a starting point. You can take this further in several other directions. 

主站蜘蛛池模板: 西乡县| 忻州市| 兰考县| 法库县| 瑞丽市| 车险| 佛学| 鹿泉市| 襄汾县| 新巴尔虎左旗| 和政县| 宁武县| 新乡市| 神农架林区| 安西县| 彰化县| 密云县| 麦盖提县| 密云县| 棋牌| 合阳县| 孙吴县| 肥乡县| 泽库县| 田东县| 阳泉市| 嘉黎县| 龙山县| 邓州市| 清原| 任丘市| 灌阳县| 辽阳市| 万年县| 澄城县| 南平市| 武功县| 微博| 崇左市| 黑山县| 酒泉市|