官术网_书友最值得收藏!

Logistic regression – introduction and advantages

Logistic regression applies maximum likelihood estimation after transforming the dependent variable into a logit variable (natural log of the odds of the dependent variable occurring or not) with respect to independent variables. In this way, logistic regression estimates the probability of a certain event occurring. In the following equation, log of odds changes linearly as a function of explanatory variables:

One can simply ask, why odds, log(odds) and not probability? In fact, this is interviewers favorite question in analytics interviews.

The reason is as follows:

By converting probability to log(odds), we have expanded the range from [0, 1] to [- ∞, +∞ ]. By fitting model on probability we will encounter a restricted range problem, and also by applying log transformation, we cover-up the non-linearity involved and we can just fit with a linear combination of variables.

One more question one ask is what will happen if someone fit the linear regression on a 0-1 problem rather than on logistic regression?

A brief explanation is provided with the following image:

  • Error terms will tend to be large at the middle values of X (independent variable) and small at the extreme values, which is the violation of linear regression assumptions that errors should have zero mean and should be normally distributed
  • Generates nonsensical predictions of greater than 1 and less than 0 at end values of X
  • The ordinary least squares (OLS) estimates are inefficient and standard errors are biased
  • High error variance in the middle values of X and low variance at ends

All the preceding issues are solved by using logistic regression.

主站蜘蛛池模板: 阳江市| 菏泽市| 大竹县| 桂阳县| 长泰县| 杂多县| 达拉特旗| 天水市| 宁阳县| 平潭县| 磴口县| 西林县| 蓝山县| 清苑县| 桓台县| 裕民县| 古浪县| 湾仔区| 澎湖县| 河北区| 保德县| 兰考县| 苍南县| 长顺县| 桐梓县| 彰化市| 富民县| 上蔡县| 修文县| 巢湖市| 东乌| 得荣县| 遵义县| 汉源县| 红原县| 响水县| 和田县| 霍州市| 邵阳县| 泊头市| 麻城市|