官术网_书友最值得收藏!

  • Advanced Machine Learning with R
  • Cory Lesmeister Dr. Sunil Kumar Chinnamgari
  • 395字
  • 2021-06-24 14:24:36

Logistic regression

As previously discussed, our classification problem is best modeled with the probabilities that are bound by 0 and 1. We can do this for all of our observations with some different functions, but here we'll focus on the logistic function. The logistic function used in logistic regression is as follows:

If you've ever placed a friendly wager on horse races or the World Cup, you may understand the concept better as odds. The logistic function can be turned to odds with the formulation of Probability (Y) / 1 - Probability (Y). For instance, if the probability of Brazil winning the World Cup is 20 percent, then the odds are 0.2 / 1 - 0.2, which is equal to 0.25, translating to odds of one in four.

To translate the odds back to probability, take the odds and divide by one plus the odds. The World Cup example is hence 0.25 / 1 + 0.25, which is equal to 20 percent. Additionally, let's consider the odds ratio. Assume that the odds of Germany winning the Cup are 0.18. We can compare the odds of Brazil and Germany with the odds ratio. In this example, the odds ratio would be the odds of Brazil divided by the odds of Germany. We'll end up with an odds ratio equal to 0.25/0.18, which is equal to 1.39. Here, we'll say that Brazil is 1.39 times more likely than Germany to win the World Cup.

One way to look at the relationship of logistic regression with linear regression is to show logistic regression as the log odds or log (P(Y)/1 - P(Y)) is equal to Bo + B1x. The coefficients are estimated using a maximum likelihood instead of the OLS. The intuition behind the maximum likelihood is that we're calculating the estimates for Bo and B1, which will create a predicted probability for an observation that's as close as possible to the actual observed outcome of Y, a so-called likelihood. The R language does what other software packages do for the maximum likelihood, which is to find the optimal combination of beta values that maximize the likelihood.

With these facts in mind, logistic regression is a potent technique to predict the problems involving classification and is often the starting point for model creation in such problems. Therefore, in this chapter, we'll attack the future problem with logistic regression first.

主站蜘蛛池模板: 南皮县| 南昌市| 泰安市| 凌海市| 洛阳市| 信丰县| 永泰县| 高平市| 六盘水市| 奇台县| 外汇| 牟定县| 白银市| 峨眉山市| 桐梓县| 增城市| 广丰县| 左权县| 澄城县| 郯城县| 凉城县| 上思县| 额尔古纳市| 石景山区| 开远市| 江山市| 金塔县| 徐州市| 武宣县| 安顺市| 商都县| 蒙阴县| 类乌齐县| 英山县| 卓尼县| 成安县| 连城县| 嵩明县| 基隆市| 盘山县| 额尔古纳市|