官术网_书友最值得收藏!

Logistic regression model

The logistic regression model is a binary classification model, and it is a member of the exponential family which belongs to the class of generalized linear models. Now, let Logistic regression modeldenote the binary label:

Logistic regression model

Using the information contained in the explanatory vector Logistic regression model we are trying to build a model that will help in this task. The logistic regression model is the following:

Logistic regression model

Here, Logistic regression model is the vector of regression coefficients. Note that the logit function Logistic regression model is linear in the regression coefficients and hence the name for the model is a logistic regression model. A logistic regression model can be equivalently written as follows:

Logistic regression model

Here, Logistic regression model is the binary error term that follows a Bernoulli distribution. For more information, refer to Chapter 17 of Tattar, et al. (2016). The estimation of the parameters of the logistic regression requires the iterative reweighted least squares (IRLS) algorithm, and we would use the glm R function to get this task done. We will use the Hypothyroid dataset in this section. In the previous section, the training and test datasets and formulas were already created, and we will carry on from that point.

Logistic regression for hypothyroid classification

For the hypothyroid dataset, we had HT2_Train as the training dataset. The test dataset is split as the covariate matrix in HT2_TestX and the outputs of the test dataset in HT2_TestY, while the formula for the logistic regression model is available in HT2_Formula. First, the logistic regression model is fitted to the training dataset using the glm function and the fitted model is christened LR_fit, and then we inspect it for model fit summaries using summary(LR_fit). The fitted model is then applied to the covariate data in the test part using the predict function to create LR_Predict. The predicted probabilities are then labeled in LR_Predict_Bin, and these labels are compared with the actual testY_numeric and overall accuracy is obtained:

> ntr <- nrow(HT2_Train) # Training size
> nte <- nrow(HT2_TestX) # Test size
> p <- ncol(HT2_TestX)
> testY_numeric <- as.numeric(HT2_TestY)
> LR_fit <- glm(HT2_Formula,data=HT2_Train,family = binomial())
Warning message:
glm.fit: fitted probabilities numerically 0 or 1 occurred 
> summary(LR_fit)
Call:
glm(formula = HT2_Formula, family = binomial(), data = HT2_Train)
Deviance Residuals: 
    Min       1Q   Median       3Q      Max  
-3.6390   0.0076   0.0409   0.1068   3.5127  
Coefficients:
             Estimate Std. Error z value Pr(>|z|)    
(Intercept) -8.302025   2.365804  -3.509 0.000449 ***
Age         -0.024422   0.012145  -2.011 0.044334 *  
GenderMALE  -0.195656   0.464353  -0.421 0.673498    
TSH         -0.008457   0.007530  -1.123 0.261384    
T3           0.480986   0.347525   1.384 0.166348    
TT4         -0.089122   0.028401  -3.138 0.001701 ** 
T4U          3.932253   1.801588   2.183 0.029061 *  
FTI          0.197196   0.035123   5.614 1.97e-08 ***
---
Signif. codes:  0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1

(Dispersion parameter for binomial family taken to be 1)

    Null deviance: 609.00  on 1363  degrees of freedom
Residual deviance: 181.42  on 1356  degrees of freedom
AIC: 197.42
Number of Fisher Scoring iterations: 9
> LR_Predict <- predict(LR_fit,newdata=HT2_TestX,type="response")
> LR_Predict_Bin <- ifelse(LR_Predict>0.5,2,1)
> LR_Accuracy <- sum(LR_Predict_Bin==testY_numeric)/nte
> LR_Accuracy
[1] 0.9732704

It can be seen from the summary of the fitted GLM (the output following the line summary(LR_fit)) that we are having four significant variables in Age, TT4, T4U, and FTI. Using the predict function, we apply the fitted model on unknown test cases in HT2_TestX, compare it with the actuals, and find the accuracy to be 97.33%. Consequently, logistic regression is easily deployed in the R software.

主站蜘蛛池模板: 抚顺县| 阿城市| 藁城市| 德安县| 年辖:市辖区| 民勤县| 咸阳市| 泾源县| 保山市| 和田县| 明溪县| 观塘区| 肥东县| 五指山市| 孙吴县| 阜康市| 芜湖县| 枣阳市| 景谷| 象山县| 甘泉县| 固安县| 繁峙县| 宜兴市| 广饶县| 黎川县| 凉山| 梁河县| 林西县| 永昌县| 赤壁市| 原平市| 沂南县| 都兰县| 南江县| 阿城市| 博罗县| 富裕县| 慈溪市| 五莲县| 鹤山市|