官术网_书友最值得收藏!

Sampling

When building any model in finance, we may have very large datasets on which model building will be very time-consuming. Once the model is built, if we need to tweak the model again, it is going to be a time-consuming process because of the volume of data. So it is better to get the random or proportionate sample of the population data on which model building will be easier and less time-consuming. So in this section, we are going to discuss how to select a random sample and a stratified sample from the data. This will play a critical role in building the model on sample data drawn from the population data.

Random sampling

Select the sample where all the observation in the population has an equal chance. It can be done in two ways, one without replacement and the other with replacement.

A random sample without replacement can be done by executing the following code:

> RandomSample <- Sampledata[sample(1:nrow(Sampledata), 10,  
>+ replace=FALSE),] 

This generates the following output:

Random sampling

Figure 2.6: Table shows random sample without replacement

A random sample with replacement can be done by executing the following code. Replacement means that an observation can be drawn more than once. So if a particular observation is selected, it is again put into the population and it can be selected again:

> RandomSample <- Sampledata[sample(1:nrow(Sampledata), 10,  
>+ replace=TRUE),] 

This generates the following output:

Random sampling

Figure 2.7: Table showing random sampling with replacement

Stratified sampling

In stratified sampling, we pide the population into separate groups, called strata. Then, a probability sample (often a simple random sample) is drawn from each group. Stratified sampling has several advantages over simple random sampling. With stratified sampling, it is possible to reduce the sample size in order to get better precision.

Now let us see how many groups exist by using Flag and Sentiments as given in the following code:

>library(sampling) 
>table(Sampledata$Flag,Sampledata$Sentiments)

The output is as follows:

Stratified sampling

Figure 2.8: Table showing the frequencies across different groups

Now you can select the sample from the different groups according to your requirement:

>Stratsubset=strata(Sampledata,c("Flag","Sentiments"),size=c(6,5, >+4,3), method="srswor") 
> Stratsubset 

The output is as follows:

Stratified sampling

Figure 2.9: Table showing output for stratified sampling

主站蜘蛛池模板: 济南市| 大姚县| 建湖县| 宜兰市| 宁城县| 栖霞市| 华安县| 南城县| 邹平县| 乌兰察布市| 仁寿县| 上林县| 湘潭县| 莱芜市| 广水市| 江西省| 湘西| 昂仁县| 云霄县| 鄯善县| 永福县| 松阳县| 繁峙县| 安多县| 南投县| 大余县| 财经| 仪陇县| 临汾市| 成都市| 依安县| 萝北县| 屏山县| 鲁甸县| 浦江县| 弥勒县| 紫金县| 湖州市| 江西省| 玉树县| 奉新县|