書名： Training Systems Using Python Statistical Modeling
作者名： Curtis Miller
本章字數： 403字
更新時間： 2021-06-24 14:20:47

Comparing two proportions

Sometimes, we may want to compare two proportions from two populations. Crucially, we will assume that they are independent of each other. It's difficult to analytically compute the probability that one proportion is less than another, so we often rely on Monte Carlo methods, otherwise known as simulation or random sampling.

We randomly generate the two proportions from their respective posterior distributions, and then track how often one is less than the other. We use the frequency we observed in our simulation to estimate the desired probability.

So, let's see this in action; we have two parameters: θ_A and θ_B. These correspond to the proportion of individuals who click on an ad from format A or format B. Users are randomly assigned to one format or the other, and the website tracks how many viewers click on the ad in the different formats.

516 visitors saw format A and 108 of them clicked it. 510 visitors saw format B and 144 of them clicked it. We use the same prior for both θ_A and θ_B, which is beta (3, 3). Additionally, the posterior distribution for θ_A will be B (111, 411) and for θ_B, it will be B (147, 369). This results in the following output:

We now want to know the probability of θ_A being less than θ_B—this is difficult to compute analytically. We can randomly simulate θ_A and θ_B, and then use that to estimate this probability. So, let's randomly simulate one θ_A, as follows:

Then, randomly simulate one θ_B, as follows:

Finally, we're going to do 1,000 simulations by computing 1,000 θ_A values and 1,000 θ_B values, as follows:

This is what we end up with; here, we can see how often θ_A is less than θ_B, that is, θ_Awas 996 times less than θ_B. So, what's the average of this? Well, it is 0.996; this is the probability that θ_A is less than θ_B, or an estimate of that probability. Given this, it seems highly likely that more people clicked on the ad for format B than people who clicked on the ad for format A.

That's it for proportions. Next up, we will look at Bayesian methods for analyzing the means of quantitative data.

官术网_书友最值得收藏!

Training Systems Using Python Statistical Modeling

Comparing two proportions