官术网_书友最值得收藏!

Sampling with probability weights

To create the nuclear bomb during the Second World War, physicists needed to perform pretty complicated calculations. Stanislaw Ulam got the idea to treat this challenge as a game of chance. Later, the method he came up with was given the code name Monte Carlo. Games of chance usually have very simple rules, but playing in an optimal way can be difficult. According to quantum mechanics, subatomic particles are also unpredictable. If we simulate many experiments with subatomic particles, we still can get an idea of how they are likely to behave. The Monte Carlo method is not deterministic, but it approaches the correct result for a complex computation for a sufficiently large number of simulations.

The statsmodels.distributions.empirical_distribution.ECDF class defines the cumulative distribution function of a data array. We can use its output to simulate a complex process. This simulation is not perfect, because we lose information in the process.

How to do it...

In this recipe, we will simulate weather processes. In particular, I am interested in annual temperature values. I am interested in finding out whether the simulated data sets also show an upward trend:

  1. The imports are as follows:
    from statsmodels.distributions.empirical_distribution import ECDF
    import dautil as dl
    import numpy as np
    import matplotlib.pyplot as plt
    from sklearn.utils import check_random_state
    from IPython.html.widgets.interaction import interact
    from IPython.core.display import HTML
  2. Define the following function to calculate the slope:
    def slope(x, y):
        return np.polyfit(x, y, 1)[0]
  3. Define the following function to generate data for a single year:
    def simulate(x, years, rs, p):
        N = len(years)
        means = np.zeros(N)
    
        for i in range(N):
            sample = rs.choice(x, size=365, p=p)
            means[i] = sample.mean()
    
        return means, np.diff(means).mean(), slope(years, means)
  4. Define the following function to run multiple simulations:
    def run_multiple(times, x, years, p):
        sims = []
        rs = check_random_state(20)
    
        for i in range(times):
            sims.append(simulate(x, years, rs, p))
    
        return np.array(sims)
  5. Define the following function, which by default loads temperature values:
    def main(var='TEMP'):
        df = dl.data.Weather.load().dropna()[var]
        cdf = ECDF(df)
        x = cdf.x[1:]
        p = np.diff(cdf.y)
    
        df = df.resample('A')
        years = df.index.year
        sims = run_multiple(500, x, years, p)
    
        sp = dl.plotting.Subplotter(2, 1, context)
        plotter = dl.plotting.CyclePlotter(sp.ax)
        plotter.plot(years, df.values, label='Data')
        plotter.plot(years, sims[0][0], label='Sim 1')
        plotter.plot(years, sims[1][0], label='Sim 2')
        header = dl.data.Weather.get_header(var)
        sp.label(title_params=header, ylabel_params=header)
        sp.ax.legend(loc='best')
    
        sp.next_ax()
        sp.label()
        sp.ax.hist(sims.T[2], normed=True)
        plt.figtext(0.2, 0.3, 'Slope of the Data {:.3f}'.format(slope(years, df.values)))
        plt.tight_layout()

The notebook stored in the sampling_weights.ipynb file in this book's code bundle gives you the option to select other weather variables too. Refer to the following screenshot for the end result:

See also

主站蜘蛛池模板: 介休市| 莲花县| 浦北县| 五莲县| 广饶县| 芮城县| 福建省| 龙江县| 监利县| 洪洞县| 烟台市| 沅江市| 抚顺县| 龙山县| 大兴区| 海伦市| 望城县| 绵阳市| 潼关县| 友谊县| 广饶县| 金平| 阳高县| 丰镇市| 西吉县| 卓尼县| 汉中市| 崇义县| 甘德县| 凌源市| 连州市| 西藏| 宿州市| 龙川县| 柯坪县| 邻水| 宜君县| 旬邑县| 三都| 福州市| 通州区|