官术网_书友最值得收藏!

Plotting and exploring data – harnessing the power of Seaborn

Now let's start our analysis with Seaborn's canned plotting routine called pairplot to visualize pairwise feature relationships. You can use this routine to hunt down relationships, candidates for groupings, possible outliers, and an intuition for what downstream strategies to investigate for analysis. Each off-diagonal cell is a pairwise scatter plot and the diagonals are filled with univariate distributions:

# explore with Seaborn pairplot
import seaborn as sns
sns.pairplot(df,hue='species')

You will see the following output after executing the preceding code:

Sometimes, a histogram is easier to use than probability-density plots for understanding a distribution. With Seaborn, we can easily pass the diag_kind arg and re-plot it to view the histograms in the diagonals.

Also, we can change the aesthetics with palette and marker args. You can refer to the Seaborn documentation for more available args; let's do the re-plot as follows:

# add histograms to diagonals of Seaborn pairplot
sns.pairplot(df,hue='species',diag_kind='hist',
palette='bright',markers=['o','x','v'])

You will see the following output after executing the preceding code:

At this point, we can choose two variables and plot them in a scatter plot with Seaborn's lmplot. If your dataset has more than five features, important variable relationships may not be shown on the same window of the pair plot. You can use this bivariate scatter plot to isolate and view important pairings:

# plot bivariate scatter with Seaborn
sns.lmplot(x='petal length in cm', y='petal width in cm',
hue="species", data=df, fit_reg=False,
palette='bright',markers=['o','x','v'])

You will see the following output after executing the preceding code:

A popular quick-view of a single feature vector is a violin plot. Many practitioners prefer violins for understanding raw value distributions and class spreads on a single plot. Each violin is actually the univariate distribution, displayed as probability density, of the values within a given class plotted vertically like a box plot. This concept probably sounds convoluted, but one look at the plot should get the idea across with ease, and that's the idea. The more violin plots you see, the more you will learn to love them: 

sns.violinplot(x='species',y='petal length in cm', data=df)

You will see the following output after executing the preceding code:

By default, Seaborn will add the median and interquartile range (middle 50%) to each violin in the plot. You can change this by using the inner arg. This is explained in the Seaborn online documentation for violin plots:  https://seaborn.pydata.org/generated/seaborn.violinplot.html.
主站蜘蛛池模板: 涟源市| 南丰县| 临澧县| 托里县| 伊宁市| 榕江县| 柯坪县| 漳浦县| 平昌县| 乌兰察布市| 兴安县| 阜新| 黑河市| 靖安县| 郯城县| 腾冲县| 肥乡县| 朔州市| 大宁县| 育儿| 新巴尔虎左旗| 宜城市| 西林县| 九龙坡区| 彭阳县| 庄浪县| 博白县| 寻乌县| 朝阳市| 酉阳| 黄龙县| 达州市| 潜江市| 邢台市| 扶绥县| 双城市| 南开区| 平塘县| 兴化市| 扎囊县| 绩溪县|