- Python Data Mining Quick Start Guide
- Nathan Greeneltch
- 404字
- 2021-06-24 15:19:48
Plotting and exploring data – harnessing the power of Seaborn
Now let's start our analysis with Seaborn's canned plotting routine called pairplot to visualize pairwise feature relationships. You can use this routine to hunt down relationships, candidates for groupings, possible outliers, and an intuition for what downstream strategies to investigate for analysis. Each off-diagonal cell is a pairwise scatter plot and the diagonals are filled with univariate distributions:
# explore with Seaborn pairplot
import seaborn as sns
sns.pairplot(df,hue='species')
You will see the following output after executing the preceding code:

Sometimes, a histogram is easier to use than probability-density plots for understanding a distribution. With Seaborn, we can easily pass the diag_kind arg and re-plot it to view the histograms in the diagonals.
Also, we can change the aesthetics with palette and marker args. You can refer to the Seaborn documentation for more available args; let's do the re-plot as follows:
# add histograms to diagonals of Seaborn pairplot
sns.pairplot(df,hue='species',diag_kind='hist',
palette='bright',markers=['o','x','v'])
You will see the following output after executing the preceding code:

At this point, we can choose two variables and plot them in a scatter plot with Seaborn's lmplot. If your dataset has more than five features, important variable relationships may not be shown on the same window of the pair plot. You can use this bivariate scatter plot to isolate and view important pairings:
# plot bivariate scatter with Seaborn
sns.lmplot(x='petal length in cm', y='petal width in cm',
hue="species", data=df, fit_reg=False,
palette='bright',markers=['o','x','v'])
You will see the following output after executing the preceding code:

A popular quick-view of a single feature vector is a violin plot. Many practitioners prefer violins for understanding raw value distributions and class spreads on a single plot. Each violin is actually the univariate distribution, displayed as probability density, of the values within a given class plotted vertically like a box plot. This concept probably sounds convoluted, but one look at the plot should get the idea across with ease, and that's the idea. The more violin plots you see, the more you will learn to love them:
sns.violinplot(x='species',y='petal length in cm', data=df)
You will see the following output after executing the preceding code:

- Ansible Quick Start Guide
- 一本書玩轉數據分析(雙色圖解版)
- 大數據技術入門(第2版)
- 讓每張照片都成為佳作的Photoshop后期技法
- 80x86/Pentium微型計算機原理及應用
- 計算機系統結構
- Windows游戲程序設計基礎
- Linux:Powerful Server Administration
- 基于Xilinx ISE的FPAG/CPLD設計與應用
- Hands-On Reactive Programming with Reactor
- 零起點學西門子S7-200 PLC
- 統計挖掘與機器學習:大數據預測建模和分析技術(原書第3版)
- Visual C++項目開發案例精粹
- 機床電氣控制與PLC
- Hands-On Deep Learning with Go