- Feature Engineering Made Easy
- Sinan Ozdemir Divya Susarla
- 324字
- 2021-06-25 22:45:55
Plotting two columns at the interval level
One large advantage of having two columns of data at the interval level, or higher, is that it opens us up to using scatter plots where we can graph two columns of data on our axes and visualize data-points as literal points on the graph. The year and averageTemperature column of our climate change dataset are both at the interval level, as they both have meaning differences, so let's take a crack at plotting all of the monthly recorded US temperatures as a scatter plot, where the x axis will be the year and the y axis will be the temperature. We hope to notice a trending increase in temperature, as the line graph previously suggested:
x = climate_sub_us['year']
y = climate_sub_us['AverageTemperature']
fig, ax = plt.subplots(figsize=(10,5))
ax.scatter(x, y)
plt.show()
The following is the output of the preceding code:
Oof, that's not pretty. There seems to be a lot of noise, and that is to be expected. Every year has multiple towns reporting multiple average temperatures, so it makes sense that we see many vertical points at each year.
Let's employ a groupby the year column to remove much of this noise:
# Let's use a groupby to reduce the amount of noise in the US
climate_sub_us.groupby('year').mean()['AverageTemperature'].plot()
The following is the output of the preceding code:
Better! We can definitely see the increase over the years, but let's smooth it out slightly by taking a rolling mean over the years:
# A moving average to smooth it all out:
climate_sub_us.groupby('year').mean()['AverageTemperature'].rolling(10).mean().plot()
The following is the output of the preceding code:
So, our ability to plot two columns of data at the interval level has re-confirmed what the previous line graph suggested; that there does seem to be a general trend upwards in average temperature across the US.
The interval level of data provides a whole new level of understanding of our data, but we aren't done yet.
- Enterprise Integration with WSO2 ESB
- INSTANT Cytoscape Complex Network Analysis How-to
- 智能數據時代:企業大數據戰略與實戰
- 深入淺出 Hyperscan:高性能正則表達式算法原理與設計
- 高維數據分析預處理技術
- 科研統計思維與方法:SPSS實戰
- Hadoop大數據開發案例教程與項目實戰(在線實驗+在線自測)
- Hadoop集群與安全
- 大數據分析:數據倉庫項目實戰
- Learning Ansible
- SQL Server 數據庫教程(2008版)
- 一本書講透數據治理:戰略、方法、工具與實踐
- 敏捷數據分析工具箱:深入解析ADW+OAC
- Discovering Business Intelligence Using MicroStrategy 9
- 大數據處理框架Apache Spark設計與實現