pg电子鼠鼠福福

書名： Learning Quantitative Finance with R
作者名： Dr. Param Jeet Prashant Vats
本章字?jǐn)?shù)： 1222字
更新時(shí)間： 2021-07-09 19:06:52

Probability distributions

Probability distributions determine how the values of random variables are spread. For example, the set of all the possible outcomes of the tossing of a sequence of coins gives rise to binomial distribution. The means of large samples of the data population follow normal distribution, which is the most common and useful distribution.

The features of these distributions are very well known and can be used to extract inferences about the population. We are going to discuss in this chapter some of the most common probability distributions and how to compute them.

Normal distribution

Normal distribution is the most widely used probability distribution in the financial industry. It is a bell-shaped curve and mean, median mode is the same for normal distribution. It is denoted by where is the mean and is the variance of the sample. If the mean is 0 and variance is 1 then the normal distribution is known as standard normal distribution N(1, 0).

Now let us discuss the main functions to compute the important features associated with normal distribution. Please note we will be using the dataset DataChap2.csv for all the calculations in this chapter. A sample is displayed in the following table. Let the imported dataset in R be Sampledata.

In the given sample, Date is the time when the data has been captured. Open, High, Low, and Close are the the opening, highest, lowest, and closing price of the day, respectively. Adj.Close is the adjusted prices of the day and return is the return calculated using the Adj.Close price of today and yesterday. Flag and Sentiments are the dummy variables created for the purpose of analysis:

norm

norm returns the height of the normal distribution and the function is defined by the following:

dnorm(x, mean, sd)

Here, x is the vector of numbers and sd is the standard deviation.

When we execute the following code, it generates the given plot showing the height of all the points:

> y <- dnorm(Sampledata$Return, mean = mean(Sampledata$Return), sd =sd(Sampledata$Return, na.rm = FALSE))
> plot(Sampledata$Return,y)

The graphical representation is as follows:

norm

Figure 2.1: Plot showing height of normal distribution

pnorm

pnorm is known as the cumulative distribution function and it gives the probability of a random variable less than a given value of a random variable and is given by the following:

pnorm(x, mean, sd)

We execute the following code:

>  pnorm(.02, mean = mean(Sampledata$Return), sd = sd(Sampledata$Return, na.rm = FALSE))

This yields 0.159837 and can be interpreted as there is a 16% probability of getting a return greater than 2%.

qnorm

qnorm takes the probability value and returns a number for which the cumulative value matches the probability and the function is defined as follows:

qnorm(x, mean, sd)

Here, x is the probability value.

We execute the following code:

> qnorm(0.159837, mean = mean(Sampledata$Return), sd = +sd(Sampledata$Return, na.rm = FALSE),lower.tail=FALSE)

This gives the output 0.02, which means that for the return of greater than equal 2% the probability is 16%.

rnorm

rnorm is used to generate the random number whose distribution is normal. It is given by the following:

qnorm(x, mean, sd)

Here, x is the number of random variables to be generated.

If we run the following code, it will generate five random values with the mean and standard deviation of the return:

>rnorm(5, mean = mean(Sampledata$Return), sd = +sd(Sampledata$Return, na.rm = FALSE))

When this code gets executed, it generates five normal random variables with the specified mean and standard deviation.

Lognormal distribution

In a financial time series, the lognormal distribution plays a more critical role than normal distribution. Just like normal distribution, we will be discussing the same features for lognormal distribution.

dlnorm

dlnorm is used to find the density function of the lognormal distribution. The general syntax for computing the density function is given by the following:

dlnorm(x, meanlog, sdlog)

Let us find the density function of the volume of the sample data, which can be done by executing the following code:

> y <- dlnorm(Sampledata$Volume, meanlog = mean(Sampledata$Volume), sdlog= sd(Sampledata$Volume, na.rm = FALSE))> plot(Sampledata$Volume,y)

The graphical representation is as follows:

dlnorm

Figure 2.2: Plot showing density function of lognormal distribution

plnorm

plnorm gives the cumulative probability distribution function of lognormal distribution. The general syntax is given here:

>dlnorm(x, meanlog, sdlog)

Now let us find the cdf for volume, which is given by the following code:

> y <- plnorm(Sampledata$Volume, meanlog = mean(Sampledata$Volume), sdlog= sd(Sampledata$Volume, na.r=FALSE))> plot(Sampledata$Volume,y)

This gives the cdf plot as shown here:

plnorm

Figure 2.3: Plot showing cumulative distribution function of lognormal distribution

qlnorm

qlnorm is used to generate p quantiles of the lognormal distribution, which can be done by using the following syntax:

qlnorm(p, mean, standard deviation)

rlnorm

rlnorm generates a dataset with a given mean and standard deviation. The syntax is as follows:

rlnorm((n, mean , standard dev)

Poisson distribution

Poisson distribution is the probability distribution of the occurrence of independent events in an interval. If is the mean occurrence per interval, then the probability of having x occurrences within a given interval is given by the following:

Poisson distribution

Here, x = 0, 1, 2, 3.....

If there are, on average, 10 stocks whose return per minute is getting positive, we can find the probability of having 15 stocks whose returns are getting positive in a particular minute by using the following code:

>ppois(15, lambda=10)

This gives the output value 0.9512596.

Hence the lower tail probability of getting returns of 15 stocks positive is 0.95.

Similarly, we can find the upper tail probability by executing the following code:

>ppois(15, lambda=10, lower=FALSE)

Uniform distribution

Continuous uniform distribution is the probability distribution of a random number selection from the continuous interval between a and b. Its density function is given as follows:

F(x) = 1/(b-a)

Here Uniform distribution and

Uniform distribution

Now let us generate 10 random numbers between 1 and 5. It can be given by executing the following code:

>runif(10, min=1, max=5)

This generates the following output:

3.589514 2.979528 3.454022 2.731393 4.416726 1.560019 4.592588 1.500221 4.067229 3.515988.

Extreme value theory

Most of the commonly known statistical distributions are focused on the center of distributions and do not bother about the tails of distributions, which contain the extreme/outlier values. One of the toughest challenges for a risk manager is to develop risk models which can take care of rare and extreme events. Extreme value theory (EVT) attempts to provide the best possible estimate of the tail area of a distribution.

There are two types of models for estimating extreme values, that is, block maxima models fitted with the generalized extreme value (GEV) distribution and peaks over threshold (POT) models fitted with the generalized Pareto distribution (GPD). Generally, POT is used these days so we will be giving an example of POT in this chapter. Let us use a subset of the dataset available in the POT package as an example.

To find the tail distribution, first we need to find a threshold point, which can be done by executing the following code:

> data(ardieres) 
> abc<-ardieres[1:10000,] 
> events <- clust(abc, u = 1.5, tim.cond = 8/365, clust.max = TRUE) 
> par(mfrow = c(2, 2)) 
> mrlplot(events[, "obs"]) 
> diplot(events) 
> tcplot(events[, "obs"], which = 1) 
> tcplot(events[, "obs"], which = 2)

This gives the following plot:

Extreme value theory

Figure 2.4: Analysis for threshold selection for EVT

After analyzing these plots, the threshold point can be set and the parameters of GPD models can be estimated. This is done by executing the following code:

>obs <- events[,"obs"] 
>ModelFit <- fitgpd(obs, thresh = 5, "pwmu") 
>ModelFit

This gives the parameter estimates of the GPD model:

Extreme value theory

Figure 2.5: Parameter estimates of GPD model for EVT

官术网_书友最值得收藏!

Learning Quantitative Finance with R

Probability distributions

Normal distribution

norm

pnorm

qnorm

rnorm

Lognormal distribution

dlnorm

plnorm

qlnorm

rlnorm

Poisson distribution

Uniform distribution

Extreme value theory