官术网_书友最值得收藏!

Logistic regression – introduction and advantages

Logistic regression applies maximum likelihood estimation after transforming the dependent variable into a logit variable (natural log of the odds of the dependent variable occurring or not) with respect to independent variables. In this way, logistic regression estimates the probability of a certain event occurring. In the following equation, log of odds changes linearly as a function of explanatory variables:

One can simply ask, why odds, log(odds) and not probability? In fact, this is interviewers favorite question in analytics interviews.

The reason is as follows:

By converting probability to log(odds), we have expanded the range from [0, 1] to [- ∞, +∞ ]. By fitting model on probability we will encounter a restricted range problem, and also by applying log transformation, we cover-up the non-linearity involved and we can just fit with a linear combination of variables.

One more question one ask is what will happen if someone fit the linear regression on a 0-1 problem rather than on logistic regression?

A brief explanation is provided with the following image:

  • Error terms will tend to be large at the middle values of X (independent variable) and small at the extreme values, which is the violation of linear regression assumptions that errors should have zero mean and should be normally distributed
  • Generates nonsensical predictions of greater than 1 and less than 0 at end values of X
  • The ordinary least squares (OLS) estimates are inefficient and standard errors are biased
  • High error variance in the middle values of X and low variance at ends

All the preceding issues are solved by using logistic regression.

主站蜘蛛池模板: 汤阴县| 瑞安市| 清河县| 凤山市| 松桃| 聂拉木县| 南昌县| 土默特右旗| 襄樊市| 华池县| 阿克陶县| 晋州市| 天镇县| 海南省| 青龙| 侯马市| 民勤县| 登封市| 石柱| 改则县| 铁力市| 钟山县| 孟津县| 永吉县| 介休市| 临洮县| 正阳县| 渭源县| 鄯善县| 禄劝| 老河口市| 济阳县| 石屏县| 华宁县| 古浪县| 个旧市| 保定市| 安庆市| 黑水县| 乌鲁木齐市| 大邑县|