- Python Data Analysis Cookbook
- Ivan Idris
- 435字
- 2021-07-14 11:05:44
Estimating kernel density
Often, we have an idea about the kind of distribution that is appropriate for our data. If that is not the case, we can apply a procedure called kernel density estimation. This method doesn't make any assumptions and is nonparametric. We basically smooth the data in an attempt to get a handle on the probability density. To smooth data, we can use various functions. These functions are called kernel functions in this context. The following equation defines the estimator:

In the preceding formula, K is the kernel function, a function with properties similar to a PDF. The bandwidth h parameter controls the smoothing process and can be kept fixed or varied. Some libraries use rules of thumb to calculate h, while others let you specify its value. SciPy, statsmodels, scikit-learn, and Seaborn implement kernel density estimation using different algorithms.
How to do it...
In this recipe, we will estimate bivariate kernel density using weather data:
- The imports are as follows:
import seaborn as sns import matplotlib.pyplot as plt import dautil as dl from dautil.stats import zscores import statsmodels.api as sm from sklearn.neighbors import KernelDensity import numpy as np from scipy import stats from IPython.html import widgets from IPython.core.display import display from IPython.display import HTML
- Define the following function to plot the estimated kernel density:
def plot(ax, a, b, c, xlabel, ylabel): dl.plotting.scatter_with_bar(ax, 'Kernel Density', a.values, b.values, c=c, cmap='Blues') ax.set_xlabel(xlabel) ax.set_ylabel(ylabel)
- In the following notebook cell, load the data and define widgets for the selection of weather variables:
df = dl.data.Weather.load().resample('M').dropna() columns = [str(c) for c in df.columns.values] var1 = widgets.Dropdown(options=columns, selected_label='RAIN') display(var1) var2 = widgets.Dropdown(options=columns, selected_label='TEMP') display(var2)
- In the next notebook cell, define variables using the values of the widgets we created:
x = df[var1.value] xlabel = dl.data.Weather.get_header(var1.value) y = df[var2.value] ylabel = dl.data.Weather.get_header(var2.value) X = [x, y]
- The next notebook cell does the heavy lifting with the most important lines highlighted:
# need to use zscores to avoid errors Z = [zscores(x), zscores(y)] kde = stats.gaussian_kde(Z) _, [[sp_ax, sm_ax], [sk_ax, sns_ax]] = plt.subplots(2, 2) plot(sp_ax, x, y, kde.pdf(Z), xlabel, ylabel) sp_ax.set_title('SciPy') sm_kde = sm.nonparametric.KDEMultivariate(data=X, var_type='cc', bw='normal_reference') sm_ax.set_title('statsmodels') plot(sm_ax, x, y, sm_kde.pdf(X), xlabel, ylabel) XT = np.array(X).T sk_kde = KernelDensity(kernel='gaussian', bandwidth=0.2).fit(XT) sk_ax.set_title('Scikit Learn') plot(sk_ax, x, y, sk_kde.score_samples(XT), xlabel, ylabel) sns_ax.set_title('Seaborn') sns.kdeplot(x, y, ax=sns_ax) sns.rugplot(x, color="b", ax=sns_ax) sns.rugplot(y, vertical=True, ax=sns_ax) sns_ax.set_xlabel(xlabel) sns_ax.set_ylabel(ylabel) plt.tight_layout() HTML(dl.report.HTMLBuilder().watermark())
Refer to the following screenshot for the end result (see the kernel_density_estimation.ipynb
file in this book's code bundle):

See also
- The kernel density estimation Wikipedia page at https://en.wikipedia.org/wiki/Kernel_density_estimation (retrieved August 2015)
- The related statsmodels documentation at http://statsmodels.sourceforge.net/devel/generated/statsmodels.nonparametric.kernel_density.KDEMultivariate.html (retrieved August 2015)
- The related scikit-learn documentation at http://scikit-learn.org/stable/modules/density.html (retrieved August 2015)
- The Complete Rust Programming Reference Guide
- Java逍遙游記
- Java面向對象思想與程序設計
- Offer來了:Java面試核心知識點精講(原理篇)
- Machine Learning with R Cookbook(Second Edition)
- INSTANT CakePHP Starter
- Java性能權威指南(第2版)
- Learning ELK Stack
- 零基礎入門學習Python(第2版)
- Visual C#.NET Web應用程序設計
- Java編程從入門到精通
- 深入解析Java編譯器:源碼剖析與實例詳解
- JavaEE架構與程序設計
- Python Django Web從入門到項目實戰(視頻版)
- C# 10核心技術指南