官术网_书友最值得收藏!

Most frequently used words

One of the easiest things to analyze about your emails is the most frequently used words. We can create a word cloud to see the most frequently used words. Let's first remove the archived emails:

from wordcloud import WordCloud 

df_no_arxiv = dfs[dfs['from'] != 'no-reply@arXiv.org']
text = ' '.join(map(str, sent['subject'].values))

Next, let's plot the word cloud:

stopwords = ['Re', 'Fwd', '3A_']
wrd = WordCloud(width=700, height=480, margin=0, collocations=False)
for sw in stopwords:
wrd.stopwords.add(sw)
wordcloud = wrd.generate(text)

plt.figure(figsize=(25,15))
plt.imshow(wordcloud, interpolation='bilinear')
plt.axis("off")
plt.margins(x=0, y=0)

I added some extra stop words to filter out from the graph. The output for me is as follows:

This tells me what I mostly communicate about. From the analysis of emails from 2011 to 2019, the most frequently used words are new, site, project, Data, WordPress, and website. This is really good, right? What is presented in this chapter is just a starting point. You can take this further in several other directions. 

主站蜘蛛池模板: 澄城县| 嵊州市| 阜阳市| 永春县| 怀安县| 友谊县| 平山县| 无棣县| 渝中区| 屯昌县| 旬邑县| 阳高县| 镇雄县| 枣庄市| 建德市| 称多县| 南汇区| 天镇县| 申扎县| 元江| 喀什市| 工布江达县| 象州县| 临安市| 溧阳市| 邹城市| 临猗县| 鄄城县| 宣威市| 五寨县| 德惠市| 长乐市| 汤阴县| 英吉沙县| 梅河口市| 上林县| 昌黎县| 会东县| 遵义市| 金湖县| 集贤县|