官术网_书友最值得收藏!

Various forms of distribution

There are various kinds of probability distributions, and each distribution shows the probability of different outcomes for a random experiment. In this section, we'll explore the various kinds of probability distributions.

A normal distribution

A normal distribution is the most common and widely used distribution in statistics. It is also called a "bell curve" and "Gaussian curve" after the mathematician Karl Friedrich Gauss. A normal distribution occurs commonly in nature. Let's take the height example we saw previously. If you have data for the height of all the people of a particular gender in Hong Kong city, and you plot a bar chart where each bar represents the number of people at this particular height, then the curve that is obtained will look very similar to the following graph. The numbers in the plot are the standard deviation numbers from the mean, which is zero. The concept will become clearer as we proceed through the chapter.

A normal distribution

Also, if you take an hourglass and observe the way sand stacks up when the hour glass is inverted, it forms a normal distribution. This is a good example that shows how normal distribution exists in nature.

A normal distribution

Take the following figure: it shows three curves with normal distribution. The curve A has a standard deviation of 1, curve C has a standard deviation of 2, and curve B has a standard deviation of 3, which means that the curve B has the maximum spread of values, whereas curve A has the least spread of values. One more way of looking at it is if curve B represented the height of people of a country, then this country has a lot of people with diverse heights, whereas the country with the curve A distribution will have people whose heights are similar to each other.

A normal distribution

A normal distribution from a binomial distribution

Let's take a coin and flip it. The probability of getting a head or a tail is 50%. If you take the same coin and flip it six times, the probability of getting a head three times can be computed using the following formula:

A normal distribution from a binomial distribution

In the preceding formula, n is the number of times the coin is flipped, p is the probability of success, and q is (1– p), which is the probability of failure.

The SciPy package of Python provides useful functions to perform statistical computations. You can install it from http://www.scipy.org/. The following commands helps in plotting the binomial distribution:

>>> from scipy.stats import binom
>>> import matplotlib.pyplot as plt


>>> fig, ax = plt.subplots(1, 1)
>>> x = [0, 1, 2, 3, 4, 5, 6]
>>> n, p = 6, 0.5
>>> rv = binom(n, p)
>>> ax.vlines(x, 0, rv.pmf(x), colors='k', linestyles='-', lw=1,
 label='Probablity')

>>> ax.legend(loc='best', frameon=False)
>>> plt.show()

The binom function in the SciPy package helps generate binomial distributions and the necessary statistics related to it. If you observe the preceding commands, there are parts of it that are from the matplotlib, which we'll use right now to plot the binomial distribution. The matplotlib library will be covered in detail in later chapters. The plt.subplots function helps in generating multiple plots on a screen. The binom function takes in the number of attempts and the probability of success. The ax.vlines function is used to plot vertical lines and rv.pmf within it helps in calculating the probability at various values of x. The ax.legend function adds a legend to the graph, and finally, plt.show displays the graph. The result is as follows:

A normal distribution from a binomial distribution

As you can see in the graph, if the coin is flipped six times, then getting three heads has the maximum probability, whereas getting a single head or five heads has the least probability.

Now, let's increase the number of attempts and see the distribution:

>>> fig, ax = plt.subplots(1, 1)
>>> x = range(101)
>>> n, p = 100, 0.5
>>> rv = binom(n, p)
>>> ax.vlines(x, 0, rv.pmf(x), colors='k', linestyles='-', lw=1,
 label='Probablity')

>>> ax.legend(loc='best', frameon=False)
>>> plt.show()

Here, we try to flip the coin 100 times and see the distribution:

A normal distribution from a binomial distribution

When the probability of success is changed to 0.4, this is what you see:

A normal distribution from a binomial distribution

When the probability is 0.6, this is what you see:

A normal distribution from a binomial distribution

When you flip the coin 1000 times at 0.5 probability:

A normal distribution from a binomial distribution

As you can see, the binomial distribution has started to resemble a normal distribution.

A Poisson distribution

A Poisson distribution is the probability distribution of independent interval occurrences in an interval. A binomial distribution is used to determine the probability of binary occurrences, whereas, a Poisson distribution is used for count-based distributions. If lambda is the mean occurrence of the events per interval, then the probability of having a k occurrence within a given interval is given by the following formula:

A Poisson distribution

Here, e is the Euler's number, k is the number of occurrences for which the probability is going to be determined, and lambda is the mean number of occurrences.

Let's understand this with an example. The number of cars that pass through a bridge in an hour is 20. What would be the probability of 23 cars passing through the bridge in an hour?

For this, we'll use the poisson function from SciPy:

>>> from scipy.stats import poisson
>>> rv = poisson(20)
>>> rv.pmf(23)

0.066881473662401172

With the Poisson function, we define the mean value, which is 20 cars. The rv.pmf function gives the probability, which is around 6%, that 23 cars will pass the bridge.

A Bernoulli distribution

You can perform an experiment with two possible outcomes: success or failure. Success has a probability of p, and failure has a probability of 1 - p. A random variable that takes a 1 value in case of a success and 0 in case of failure is called a Bernoulli distribution. The probability distribution function can be written as:

A Bernoulli distribution

It can also be written like this:

A Bernoulli distribution

The distribution function can be written like this:

A Bernoulli distribution

Following plot shows a Bernoulli distribution:

A Bernoulli distribution

Voting in an election is a good example of the Bernoulli distribution.

A Bernoulli distribution can be generated using the bernoulli.rvs() function of the SciPy package. The following function generates a Bernoulli distribution with a probability of 0.7:

>>> from scipy import stats
>>> stats.bernoulli.rvs(0.7, size=100)
array([1, 1, 1, 1, 1, 0, 0, 1, 1, 0, 1, 1, 1, 0, 1, 1, 1, 1, 1, 1, 1, 1, 0,
 1, 1, 1, 0, 1, 1, 0, 1, 0, 0, 1, 0, 0, 1, 0, 1, 0, 1, 1, 1, 1, 1, 0,
 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 1, 0, 1, 0, 0, 0, 0, 0, 1, 0, 0, 0,
 1, 1, 1, 0, 1, 0, 1, 1, 1, 1, 1, 1, 0, 0, 1, 1, 1, 0, 0, 0, 1, 1, 1,
 1, 0, 1, 1, 1, 0, 1, 1])])

If the preceding output is the number of votes for a candidate by people, then the candidate has 70% of the votes.

主站蜘蛛池模板: 法库县| 麻栗坡县| 左权县| 高碑店市| 威海市| 金山区| 巍山| 万源市| 河北区| 汉川市| 河北区| 石狮市| 麻阳| 平阴县| 安达市| 康定县| 漳州市| 大竹县| 馆陶县| 泗水县| 乐都县| 常熟市| 平顶山市| 应用必备| 长兴县| 清水县| 观塘区| 集安市| 卫辉市| 辽阳县| 盖州市| 荆州市| 开封县| 宁武县| 乌拉特前旗| 乳山市| 绥江县| 南安市| 嘉兴市| 江达县| 南乐县|