官术网_书友最值得收藏!

Plotting and exploring data – harnessing the power of Seaborn

Now let's start our analysis with Seaborn's canned plotting routine called pairplot to visualize pairwise feature relationships. You can use this routine to hunt down relationships, candidates for groupings, possible outliers, and an intuition for what downstream strategies to investigate for analysis. Each off-diagonal cell is a pairwise scatter plot and the diagonals are filled with univariate distributions:

# explore with Seaborn pairplot
import seaborn as sns
sns.pairplot(df,hue='species')

You will see the following output after executing the preceding code:

Sometimes, a histogram is easier to use than probability-density plots for understanding a distribution. With Seaborn, we can easily pass the diag_kind arg and re-plot it to view the histograms in the diagonals.

Also, we can change the aesthetics with palette and marker args. You can refer to the Seaborn documentation for more available args; let's do the re-plot as follows:

# add histograms to diagonals of Seaborn pairplot
sns.pairplot(df,hue='species',diag_kind='hist',
palette='bright',markers=['o','x','v'])

You will see the following output after executing the preceding code:

At this point, we can choose two variables and plot them in a scatter plot with Seaborn's lmplot. If your dataset has more than five features, important variable relationships may not be shown on the same window of the pair plot. You can use this bivariate scatter plot to isolate and view important pairings:

# plot bivariate scatter with Seaborn
sns.lmplot(x='petal length in cm', y='petal width in cm',
hue="species", data=df, fit_reg=False,
palette='bright',markers=['o','x','v'])

You will see the following output after executing the preceding code:

A popular quick-view of a single feature vector is a violin plot. Many practitioners prefer violins for understanding raw value distributions and class spreads on a single plot. Each violin is actually the univariate distribution, displayed as probability density, of the values within a given class plotted vertically like a box plot. This concept probably sounds convoluted, but one look at the plot should get the idea across with ease, and that's the idea. The more violin plots you see, the more you will learn to love them: 

sns.violinplot(x='species',y='petal length in cm', data=df)

You will see the following output after executing the preceding code:

By default, Seaborn will add the median and interquartile range (middle 50%) to each violin in the plot. You can change this by using the inner arg. This is explained in the Seaborn online documentation for violin plots:  https://seaborn.pydata.org/generated/seaborn.violinplot.html.
主站蜘蛛池模板: 平罗县| 曲松县| 清河县| 武鸣县| 新宁县| 舞阳县| 灌南县| 兴义市| 蓬安县| 集贤县| 甘肃省| 韶关市| 涡阳县| 高清| 咸丰县| 迁西县| 台东县| 枣阳市| 北川| 宜城市| 柏乡县| 南康市| 吴堡县| 神池县| 松滋市| 鲜城| 教育| 普兰县| 堆龙德庆县| 漠河县| 无为县| 镇宁| 南部县| 汝城县| 丹寨县| 广河县| 东丽区| 乌兰县| 从江县| 政和县| 长寿区|