官术网_书友最值得收藏!

Logistic regression as a neural network

Logistic regression is a classifier algorithm. Here, we try to predict the probability of the output classes. The class with the highest probability becomes the predicted output. The error between the actual and predicted output is calculated using cross-entropy and minimized through backpropagation. Check the following diagram for binary logistic regression and multi-class logistic regression. The difference is based on the problem statement. If the unique number of output classes is two then it's called binary classification, if it's more than two then it's called multi-class classification. If there are no hidden layers, we use the sigmoid function for the binary classification and we get the architecture for binary logistic regression. Similarly, if there are no hidden layers and we use use the softmax function for the multi-class classification, we get the architecture for multi-class logistic regression.

Now a question arises, why not use the sigmoid function for multi-class logistic regression ?

The answer, which is true for all predicted output layers of any neural network, is that the predicted outputs should follow a probability distribution. In normal terms, say the output has N classes. This will result in N probabilities for an input data having, say, d dimensions. Thus, the sum of the N probabilities for this one input data should be 1 and each of those probabilities should be between 0 and 1 inclusive.

On the one hand, the summation of the sigmoid function for N different classes may not be 1 in the majority of cases. Therefore, in case of binary, the sigmoid function is applied to obtain the probability of one class, that is, p(y = 1|x), and for the other class the probability, that is, p(y = 0|x) = 1 ? p(y = 1|x). On the other hand, the output of a softmax function is values satisfying the probability distribution properties. In the diagram, refers to the sigmoid function:

A follow-up question might also arise: what if we use softmax in binary logistic regression?

As mentioned previously, as long as your predicted output follows the rules of probability distribution, everything is fine. Later, we will discuss cross entropy and the importance of probability distribution as a building block for any machine learning problem especially dealing with classification tasks.

A probability distribution is valid if the probabilities of all the values in the distribution are between 0 and 1, inclusive, and the sum of those probabilities must be 1.

Logistic regression can be viewed in a very small neural network. Let's try to go through a step-by-step process to implement a binary logistic regression, as shown here:

主站蜘蛛池模板: 儋州市| 佛坪县| 城口县| 湖口县| 阳城县| 襄垣县| 水城县| 香港| 巴马| 新津县| 博罗县| 太谷县| 渝中区| 古交市| 泰兴市| 鲁山县| 松潘县| 鞍山市| 财经| 华容县| 娄底市| 洛川县| 昌乐县| 吉首市| 荔波县| 米脂县| 华宁县| 城市| 蓬溪县| 海伦市| 休宁县| 深圳市| 武清区| 堆龙德庆县| 广宁县| 贵港市| 舒兰市| 太康县| 龙游县| 广宗县| 崇州市|