To answer the next question, At what times of the day do I send and receive emails with Gmail? let's create a graph. We'll take a look at sent emails and received emails:
Let's create two sub-dataframe—one for sent emails and another for received emails:
sent = dfs[dfs['label']=='sent'] received = dfs[dfs['label']=='inbox']
It is pretty obvious, right? Remember, we set a couple of labels, sent and inbox, earlier. Now, let's create a plot.
2.irst, let's import the required libraries:
import matplotlib.pyplot as plt from matplotlib.ticker import MaxNLocator from scipy import ndimage import matplotlib.gridspec as gridspec import matplotlib.patches as mpatches
3.ow, let's create a function that takes a dataframe as an input and creates a plot. See the following function:
def plot_todo_vs_year(df, ax, color='C0', s=0.5, title=''): ind = np.zeros(len(df), dtype='bool') est = pytz.timezone('US/Eastern')
df[~ind].plot.scatter('year', 'timeofday', s=s, alpha=0.6, ax=ax, color=color) ax.set_ylim(0, 24) ax.yaxis.set_major_locator(MaxNLocator(8)) ax.set_yticklabels([datetime.datetime.strptime(str(int(np.mod(ts, 24))), "%H").strftime("%I %p") for ts in ax.get_yticks()]);
By now, you should be familiar with how to create a scatter plot. We discussed doing so in detail in Chapter 2, Visual Aids for EDA. If you are confused about some terms, it is suggested that you revisit that chapter.
4.ow, let's plot both received and sent emails. Check out the code given here:
Check out the preceding graph. The higher the density of the graph data points, the higher the number of emails. Note that the number of sent emails is less than the number of received emails. I received more emails than I sent from 2018 to 2020. Note that I received most of the emails between 03:00 PM and 09:00 AM. This graph gives a nice overview of the time of day of email activity. This answers the second question.