官术网_书友最值得收藏!

Various forms of distribution

There are various kinds of probability distributions, and each distribution shows the probability of different outcomes for a random experiment. In this section, we'll explore the various kinds of probability distributions.

A normal distribution

A normal distribution is the most common and widely used distribution in statistics. It is also called a "bell curve" and "Gaussian curve" after the mathematician Karl Friedrich Gauss. A normal distribution occurs commonly in nature. Let's take the height example we saw previously. If you have data for the height of all the people of a particular gender in Hong Kong city, and you plot a bar chart where each bar represents the number of people at this particular height, then the curve that is obtained will look very similar to the following graph. The numbers in the plot are the standard deviation numbers from the mean, which is zero. The concept will become clearer as we proceed through the chapter.

A normal distribution

Also, if you take an hourglass and observe the way sand stacks up when the hour glass is inverted, it forms a normal distribution. This is a good example that shows how normal distribution exists in nature.

A normal distribution

Take the following figure: it shows three curves with normal distribution. The curve A has a standard deviation of 1, curve C has a standard deviation of 2, and curve B has a standard deviation of 3, which means that the curve B has the maximum spread of values, whereas curve A has the least spread of values. One more way of looking at it is if curve B represented the height of people of a country, then this country has a lot of people with diverse heights, whereas the country with the curve A distribution will have people whose heights are similar to each other.

A normal distribution

A normal distribution from a binomial distribution

Let's take a coin and flip it. The probability of getting a head or a tail is 50%. If you take the same coin and flip it six times, the probability of getting a head three times can be computed using the following formula:

A normal distribution from a binomial distribution

In the preceding formula, n is the number of times the coin is flipped, p is the probability of success, and q is (1– p), which is the probability of failure.

The SciPy package of Python provides useful functions to perform statistical computations. You can install it from http://www.scipy.org/. The following commands helps in plotting the binomial distribution:

>>> from scipy.stats import binom
>>> import matplotlib.pyplot as plt


>>> fig, ax = plt.subplots(1, 1)
>>> x = [0, 1, 2, 3, 4, 5, 6]
>>> n, p = 6, 0.5
>>> rv = binom(n, p)
>>> ax.vlines(x, 0, rv.pmf(x), colors='k', linestyles='-', lw=1,
 label='Probablity')

>>> ax.legend(loc='best', frameon=False)
>>> plt.show()

The binom function in the SciPy package helps generate binomial distributions and the necessary statistics related to it. If you observe the preceding commands, there are parts of it that are from the matplotlib, which we'll use right now to plot the binomial distribution. The matplotlib library will be covered in detail in later chapters. The plt.subplots function helps in generating multiple plots on a screen. The binom function takes in the number of attempts and the probability of success. The ax.vlines function is used to plot vertical lines and rv.pmf within it helps in calculating the probability at various values of x. The ax.legend function adds a legend to the graph, and finally, plt.show displays the graph. The result is as follows:

A normal distribution from a binomial distribution

As you can see in the graph, if the coin is flipped six times, then getting three heads has the maximum probability, whereas getting a single head or five heads has the least probability.

Now, let's increase the number of attempts and see the distribution:

>>> fig, ax = plt.subplots(1, 1)
>>> x = range(101)
>>> n, p = 100, 0.5
>>> rv = binom(n, p)
>>> ax.vlines(x, 0, rv.pmf(x), colors='k', linestyles='-', lw=1,
 label='Probablity')

>>> ax.legend(loc='best', frameon=False)
>>> plt.show()

Here, we try to flip the coin 100 times and see the distribution:

A normal distribution from a binomial distribution

When the probability of success is changed to 0.4, this is what you see:

A normal distribution from a binomial distribution

When the probability is 0.6, this is what you see:

A normal distribution from a binomial distribution

When you flip the coin 1000 times at 0.5 probability:

A normal distribution from a binomial distribution

As you can see, the binomial distribution has started to resemble a normal distribution.

A Poisson distribution

A Poisson distribution is the probability distribution of independent interval occurrences in an interval. A binomial distribution is used to determine the probability of binary occurrences, whereas, a Poisson distribution is used for count-based distributions. If lambda is the mean occurrence of the events per interval, then the probability of having a k occurrence within a given interval is given by the following formula:

A Poisson distribution

Here, e is the Euler's number, k is the number of occurrences for which the probability is going to be determined, and lambda is the mean number of occurrences.

Let's understand this with an example. The number of cars that pass through a bridge in an hour is 20. What would be the probability of 23 cars passing through the bridge in an hour?

For this, we'll use the poisson function from SciPy:

>>> from scipy.stats import poisson
>>> rv = poisson(20)
>>> rv.pmf(23)

0.066881473662401172

With the Poisson function, we define the mean value, which is 20 cars. The rv.pmf function gives the probability, which is around 6%, that 23 cars will pass the bridge.

A Bernoulli distribution

You can perform an experiment with two possible outcomes: success or failure. Success has a probability of p, and failure has a probability of 1 - p. A random variable that takes a 1 value in case of a success and 0 in case of failure is called a Bernoulli distribution. The probability distribution function can be written as:

A Bernoulli distribution

It can also be written like this:

A Bernoulli distribution

The distribution function can be written like this:

A Bernoulli distribution

Following plot shows a Bernoulli distribution:

A Bernoulli distribution

Voting in an election is a good example of the Bernoulli distribution.

A Bernoulli distribution can be generated using the bernoulli.rvs() function of the SciPy package. The following function generates a Bernoulli distribution with a probability of 0.7:

>>> from scipy import stats
>>> stats.bernoulli.rvs(0.7, size=100)
array([1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,
 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1,
 1, 0, 1, 1, 1, 0, 1, 1])])

If the preceding output is the number of votes for a candidate by people, then the candidate has 70% of the votes.

主站蜘蛛池模板: 西林县| 清流县| 郸城县| 抚宁县| 沁源县| 肥东县| 隆德县| 阿克| 左贡县| 安龙县| 平顶山市| 曲阜市| 新平| 江口县| 平顶山市| 满洲里市| 琼中| 松溪县| 乌拉特前旗| 司法| 乐山市| 东安县| 宣武区| 太和县| 黑水县| 嘉荫县| 日土县| 怀安县| 辉南县| 佛冈县| 广汉市| 青铜峡市| 东明县| 新龙县| 平昌县| 囊谦县| 德阳市| 松潘县| 武胜县| 孝昌县| 微山县|