官术网_书友最值得收藏!

Linear and logistic regression

Regression algorithms are a type of supervised algorithm that uses features of the input data to predict a value, such as the cost of a house, given certain features, such as size, age, number of bathrooms, number of floors, and location. Regression analysis tries to find the value of the parameters for the function that best fits an input dataset.

In a linear-regression algorithm, the goal is to minimize a cost function by finding appropriate parameters for the function, over the input data that best approximates the target values. A cost function is a function of the error, that is, how far we are from getting a correct result. A popular cost function is the mean square error (MSE), where we take the square of the difference between the expected value and the predicted result. The sum over all the input examples gives us the error of the algorithm and represents the cost function.

Say we have a 100-square-meter house that was built 25 years ago with 3 bathrooms and 2 floors. Let's also assume that the city is divided into 10 different neighborhoods, which we'll denote with integers from 1 to 10, and say this house is located in the area denoted by 7. We can parameterize this house with a five-dimensional vector, x = (100, 25, 3, 2, 7). Say that we also know that this house has an estimated value of €100,000. What we want is to create a function, f, such that f(x) = 100000.

In linear regression, this means finding a vector of weights, w= (w1, w2, w3, w4, w5)such that the dot product of the vectors, x ? w = 10000, would be 100*w1 + 25*w2 + 3*w3 + 2*w4 + 7*w5 = 100000 or  . If we had 1,000 houses, we could repeat the same process for every house, and ideally we would like to find a single vector, wthat can predict the correct value that is close enough for every house. The most common way to train a linear regression model can be seen in the following pseudocode block:

First, we iterate over the training data to compute the cost function, MSE. Once we know the value of MSE, we'll use the gradient-descent algorithm to update w. To do this, we'll calculate the derivatives of the cost function with respect to each weight, wi . In this way, we'll know how the cost function changes (increase or decrease) with respect to wi . Then we'll update its value accordingly. In Chapter 2, Neural Networks, we will see that training neural networks and linear/logistic regressions have a lot in common.

We demonstrated how to solve a regression problem with linear regression. Let's take another task: trying to determine whether a house is overvalued or undervalued. In this case, the target data would be categorical [1, 0] - 1 for overvalued, 0 for undervalued, if the price of the house will be an input parameter instead of target value as before. To solve the task, we'll use logistic regression. This is similar to linear regression but with one difference: in linear regression, the output is . However, here the output will be a special logistic function (https://en.wikipedia.org/wiki/Logistic_function),  . This will squash the value of  in the (0:1) interval. You can think of the logistic function as a probability, and the closer the result is to 1, the more chance there is that the house is overvalued, and vice versa. Training is the same as with linear regression, but the output of the function is in the (0:1) interval and the labels is either 0 or 1.

Logistic regression is not a classification algorithm, but we can turn it into one. We just have to introduce a rule that determines the class based on the logistic function output. For example, we can say that a house is overvalued if the value of  and undervalued otherwise.

主站蜘蛛池模板: 亚东县| 岚皋县| 毕节市| 平凉市| 白城市| 孝昌县| 峨眉山市| 融水| 锦州市| 闽清县| 南陵县| 海林市| 通榆县| 汽车| 大理市| 华池县| 上思县| 渝北区| 松潘县| 韶关市| 高平市| 山丹县| 大英县| 蛟河市| 房山区| 伊川县| 东乌珠穆沁旗| 青田县| 长兴县| 金华市| 察隅县| 黑山县| 浦城县| 宁强县| 兴城市| 苍南县| 嘉黎县| 彩票| 如东县| 株洲县| 临邑县|