官术网_书友最值得收藏!

Determining bias

When teaching probability, it is customary to give examples of coin tosses. Whether it is going to rain or not is more or less like a coin toss. If we have two possible outcomes, the binomial distribution is appropriate. This distribution requires two parameters: the probability and the sample size.

In statistics, there are two generally accepted approaches. In the frequentist approach, we measure the number of coin tosses and use that frequency for further analysis. Bayesian analysis is named after its founder the Reverend Thomas Bayes. The Bayesian approach is more incremental and requires a prior distribution, which is the distribution we assume before performing experiments. The posterior distribution is the distribution we are interested in and which we obtain after getting new data from experiments. Let's first have a look at the following equations:

(3.7) and (3.8) describe the probability mass function for the binomial distribution. (3.9) comes from an essay published by Bayes. The equation is about an experiment with m successes and n failures and assumes a uniform prior distribution for the probability parameter of the binomial distribution.

How to do it...

In this recipe, we will apply the frequentist and Bayesian approach to rain data:

  1. The imports are as follows:
    import dautil as dl
    from scipy import stats
    import matplotlib.pyplot as plt
    import numpy as np
    from IPython.html.widgets.interaction import interact
    from IPython.display import HTML
  2. Define the following function to load the data:
    def load():
        rainy = dl.data.Weather.rain_values() > 0
        n = len(rainy)
        nrains = np.cumsum(rainy)
    
        return n, nrains
  3. Define the following function to compute the posterior:
    def posterior(i, u, data):
        return stats.binom(i, u).pmf(data[i])
  4. Define the following function to plot the posterior for the subset of the data:
    def plot_posterior(ax, day, u, nrains):
        ax.set_title('Posterior distribution for day {}'.format(day))
        ax.plot(posterior(day, u, nrains),
                label='rainy days in period={}'.format(nrains[day]))
        ax.set_xlabel('Uniform prior parameter')
        ax.set_ylabel('Probability rain')
        ax.legend(loc='best')
  5. Define the following function to do the plotting:
    def plot(day1=1, day2=30):
        fig, [[upleft, upright], [downleft, downright]] = plt.subplots(2, 2)
        plt.suptitle('Determining bias of rain data')
        x = np.arange(n) + 1
        upleft.set_title('Frequentist Approach')
        upleft.plot(x, nrains/x, label='Probability rain')
        upleft.set_xlabel('Days')
        set_ylabel(upleft)
    
        max_p = np.zeros(n)
        u = np.linspace(0, 1, 100)
    
        for i in x - 1:
            max_p[i] = posterior(i, u, nrains).argmax()/100
    
        downleft.set_title('Bayesian Approach')
        downleft.plot(x, max_p)
        downleft.set_xlabel('Days')
        set_ylabel(downleft)
    
        plot_posterior(upright, day1, u, nrains)
        plot_posterior(downright, day2, u, nrains)
        plt.tight_layout()
  6. The following lines call the other functions and place a watermark:
    interact(plot, day1=(1, n), day2=(1, n))
    HTML(dl.report.HTMLBuilder().watermark())

Refer to the following screenshot for the end result (see the determining_bias.ipynb file in this book's code bundle):

See also

主站蜘蛛池模板: 本溪市| 正安县| 长白| 岗巴县| 东光县| 偏关县| 宁安市| 尼勒克县| 宜兰市| 桑日县| 岚皋县| 巴中市| 黄石市| 永丰县| 张家口市| 霍城县| 雅安市| 将乐县| 葵青区| 远安县| 奈曼旗| 新闻| 会昌县| 镇雄县| 伊金霍洛旗| 嘉义市| 健康| 宁安市| 德兴市| 隆林| 黄浦区| 浪卡子县| 北票市| 龙南县| 绵阳市| 达尔| 卢龙县| 澜沧| 长阳| 疏附县| 东城区|