官术网_书友最值得收藏!

Probability distributions

Probability distributions determine how the values of random variables are spread. For example, the set of all the possible outcomes of the tossing of a sequence of coins gives rise to binomial distribution. The means of large samples of the data population follow normal distribution, which is the most common and useful distribution.

The features of these distributions are very well known and can be used to extract inferences about the population. We are going to discuss in this chapter some of the most common probability distributions and how to compute them.

Normal distribution

Normal distribution is the most widely used probability distribution in the financial industry. It is a bell-shaped curve and mean, median mode is the same for normal distribution. It is denoted by Normal distribution where Normal distribution is the mean and Normal distribution  is the variance of the sample. If the mean is 0 and variance is 1 then the normal distribution is known as standard normal distribution N(1, 0).

Now let us discuss the main functions to compute the important features associated with normal distribution. Please note we will be using the dataset DataChap2.csv for all the calculations in this chapter. A sample is displayed in the following table. Let the imported dataset in R be Sampledata.

In the given sample, Date is the time when the data has been captured. Open, High, Low, and Close are the the opening, highest, lowest, and closing price of the day, respectively. Adj.Close is the adjusted prices of the day and return is the return calculated using the Adj.Close price of today and yesterday. Flag and Sentiments are the dummy variables created for the purpose of analysis:

norm

norm returns the height of the normal distribution and the function is defined by the following:

dnorm(x, mean, sd) 

Here, x is the vector of numbers and sd is the standard deviation.

When we execute the following code, it generates the given plot showing the height of all the points:

> y <- dnorm(Sampledata$Return, mean = mean(Sampledata$Return), sd =sd(Sampledata$Return, na.rm = FALSE))
> plot(Sampledata$Return,y) 

The graphical representation is as follows:

norm

Figure 2.1: Plot showing height of normal distribution

pnorm

pnorm is known as the cumulative distribution function and it gives the probability of a random variable less than a given value of a random variable and is given by the following:

pnorm(x, mean, sd) 

We execute the following code:

>  pnorm(.02, mean = mean(Sampledata$Return), sd = sd(Sampledata$Return, na.rm = FALSE)) 

This yields 0.159837 and can be interpreted as there is a 16% probability of getting a return greater than 2%.

qnorm

qnorm takes the probability value and returns a number for which the cumulative value matches the probability and the function is defined as follows:

qnorm(x, mean, sd)  

Here, x is the probability value.

We execute the following code:

> qnorm(0.159837, mean = mean(Sampledata$Return), sd = +sd(Sampledata$Return, na.rm = FALSE),lower.tail=FALSE) 

This gives the output 0.02, which means that for the return of greater than equal 2% the probability is 16%.

rnorm

rnorm is used to generate the random number whose distribution is normal. It is given by the following:

qnorm(x, mean, sd) 

Here, x is the number of random variables to be generated.

If we run the following code, it will generate five random values with the mean and standard deviation of the return:

>rnorm(5, mean = mean(Sampledata$Return), sd = +sd(Sampledata$Return, na.rm = FALSE)) 

When this code gets executed, it generates five normal random variables with the specified mean and standard deviation.

Lognormal distribution

In a financial time series, the lognormal distribution plays a more critical role than normal distribution. Just like normal distribution, we will be discussing the same features for lognormal distribution.

dlnorm

dlnorm is used to find the density function of the lognormal distribution. The general syntax for computing the density function is given by the following:

dlnorm(x, meanlog, sdlog) 

Let us find the density function of the volume of the sample data, which can be done by executing the following code:

> y <- dlnorm(Sampledata$Volume, meanlog = mean(Sampledata$Volume), sdlog= sd(Sampledata$Volume, na.rm = FALSE))> plot(Sampledata$Volume,y) 

The graphical representation is as follows:

dlnorm

Figure 2.2: Plot showing density function of lognormal distribution

plnorm

plnorm gives the cumulative probability distribution function of lognormal distribution. The general syntax is given here:

>dlnorm(x, meanlog, sdlog) 

Now let us find the cdf for volume, which is given by the following code:

> y <- plnorm(Sampledata$Volume, meanlog = mean(Sampledata$Volume), sdlog= sd(Sampledata$Volume, na.r=FALSE))> plot(Sampledata$Volume,y) 

This gives the cdf plot as shown here:

plnorm

Figure 2.3: Plot showing cumulative distribution function of lognormal distribution

qlnorm

qlnorm is used to generate p quantiles of the lognormal distribution, which can be done by using the following syntax:

qlnorm(p, mean, standard deviation) 
rlnorm

rlnorm generates a dataset with a given mean and standard deviation. The syntax is as follows:

rlnorm((n, mean , standard dev) 

Poisson distribution

Poisson distribution is the probability distribution of the occurrence of independent events in an interval. If Poisson distribution is the mean occurrence per interval, then the probability of having x occurrences within a given interval is given by the following:

Poisson distribution

Here, x = 0, 1, 2, 3.....

If there are, on average, 10 stocks whose return per minute is getting positive, we can find the probability of having 15 stocks whose returns are getting positive in a particular minute by using the following code:

>ppois(15, lambda=10) 

This gives the output value 0.9512596.

Hence the lower tail probability of getting returns of 15 stocks positive is 0.95.

Similarly, we can find the upper tail probability by executing the following code:

>ppois(15, lambda=10, lower=FALSE) 

Uniform distribution

Continuous uniform distribution is the probability distribution of a random number selection from the continuous interval between a and b. Its density function is given as follows:

F(x) = 1/(b-a)

Here Uniform distribution and

Uniform distribution

Now let us generate 10 random numbers between 1 and 5. It can be given by executing the following code:

>runif(10, min=1, max=5) 

This generates the following output:

3.589514 2.979528 3.454022 2.731393 4.416726 1.560019 4.592588 1.500221 4.067229 3.515988. 

Extreme value theory

Most of the commonly known statistical distributions are focused on the center of distributions and do not bother about the tails of distributions, which contain the extreme/outlier values. One of the toughest challenges for a risk manager is to develop risk models which can take care of rare and extreme events. Extreme value theory (EVT) attempts to provide the best possible estimate of the tail area of a distribution.

There are two types of models for estimating extreme values, that is, block maxima models fitted with the generalized extreme value (GEV) distribution and peaks over threshold (POT) models fitted with the generalized Pareto distribution (GPD). Generally, POT is used these days so we will be giving an example of POT in this chapter. Let us use a subset of the dataset available in the POT package as an example.

To find the tail distribution, first we need to find a threshold point, which can be done by executing the following code:

> data(ardieres) 
> abc<-ardieres[1:10000,] 
> events <- clust(abc, u = 1.5, tim.cond = 8/365, clust.max = TRUE) 
> par(mfrow = c(2, 2)) 
> mrlplot(events[, "obs"]) 
> diplot(events) 
> tcplot(events[, "obs"], which = 1) 
> tcplot(events[, "obs"], which = 2) 

This gives the following plot:

Extreme value theory

Figure 2.4: Analysis for threshold selection for EVT

After analyzing these plots, the threshold point can be set and the parameters of GPD models can be estimated. This is done by executing the following code:

>obs <- events[,"obs"] 
>ModelFit <- fitgpd(obs, thresh = 5, "pwmu") 
>ModelFit 

This gives the parameter estimates of the GPD model:

Extreme value theory

Figure 2.5: Parameter estimates of GPD model for EVT

主站蜘蛛池模板: 白城市| 濮阳县| 甘孜县| 南康市| 同德县| 金秀| 博白县| 平江县| 班玛县| 徐汇区| 沭阳县| 介休市| 贵定县| 监利县| 邹平县| 新和县| 托克托县| 东海县| 江华| 甘洛县| 五原县| 黔西县| 观塘区| 安远县| 临清市| 剑河县| 娱乐| 长葛市| 揭西县| 托里县| 怀远县| 依兰县| 奎屯市| 浮梁县| 淮阳县| 合山市| 镇宁| 云和县| 叙永县| 会泽县| 合江县|