- Learning Quantitative Finance with R
- Dr. Param Jeet Prashant Vats
- 454字
- 2021-07-09 19:06:53
Statistics
In a given dataset, we try to summarize the data by the central position of the data, which is known as measure of central tendency or summary statistics. There are several ways to measure the central tendency, such as mean, median, and mode. Mean is the widely used measure of central tendency. Under different scenarios, we use different measures of central tendency. Now we are going to give an example of how to compute the different measures of central tendency in R.
Mean
mean
is the equal weightage average of the sample. For example, we can compute the mean of Volume
in the dataset Sampledata
by executing the following code, which gives the arithmetic mean of the volume:
mean(Sampledata$Volume)
Median
Median is the mid value of the matrix when it is arranged in a sorted way, which can be computed by executing the following code:
median(Sampledata$Volume)
Mode
Mode is the value present in the attribute which has maximum frequency. For mode, there does not exist an inbuilt function so we will write a function to compute mode:
findmode <- function(x) { uniqx <- unique(x) uniqx[which.max(tabulate(match(x, uniqx)))] } findmode(Sampledata$return)
Executing the preceding code gives the mode of the return attribute of the dataset.
Summary
We can also generate basic statistics of a column by executing the following code:
summary(Sampledata$Volume)
This generates the mean, median, minimum, maximum, Q1, and Q2 quartiles.
Moment
Moment gives the characteristics such as variance, skewness, and so on of the population, which is computed by the following code. The code gives the third order moment of the attribute Volume
. Once can change the order to get the relevant characteristics. However before that, we need to install package e1071
:
moment(Sampledata$Volume, order=3, center=TRUE)
Kurtosis
Kurtosis measures whether the data is heavy-tailed or light-tailed relative to a normal distribution. Datasets with high kurtosis tend to have heavy tails, or outliers. Datasets with low kurtosis tend to have light tails, and fewer outliers. The computed value of kurtosis is compared with the kurtosis of normal distribution and the interpretation is made on the basis of that.
The kurtosis
of Volume
is given by the following code:
kurtosis(Sampledata$Volume)
It gives value 5.777117
, which shows the distribution of volume as leptokurtic.
Skewness
Skewness is the measure of symmetry of the distribution. If the mean of data values is less than the median then the distribution is said to be left-skewed and if the mean of the data values is greater than the median, then the distribution is said to be right-skewed.
The skewness
of Volume
is computed as follows in R:
skewness(Sampledata$Volume)
This gives the result 1.723744
, which means it is right-skewed.
Note
For computing skewness
and kurtosis, we need to install the package e1071
.
- Hands-On Linux for Architects
- 樂高創意機器人教程(中級 下冊 10~16歲) (青少年iCAN+創新創意實踐指導叢書)
- OpenStack Cloud Computing Cookbook(Second Edition)
- Splunk Operational Intelligence Cookbook
- Windows Server 2003系統安全管理
- 分析力!專業Excel的制作與分析實用法則
- 云計算和大數據的應用
- 格蠹匯編
- 嵌入式GUI開發設計
- Introduction to R for Business Intelligence
- 大數據案例精析
- Web璀璨:Silverlight應用技術完全指南
- 電腦故障排除與維護終極技巧金典
- Embedded Linux Development using Yocto Projects(Second Edition)
- Java Deep Learning Projects