書名： Supervised Machine Learning with Python
作者名： Taylor Smith
本章字數： 588字
更新時間： 2021-06-24 14:01:05

Loss functions

First, we'll cover loss functions, and then, prior to diving into hill climbing and descent, we'll take a quick math refresher.

There's going to be some math in this lesson, and while we try to shy away from the purely theoretical concepts, this is something that we simply have to get through in order to understand the guts of most of these algorithms. There will be a brief applied section at the end. Don't panic if you can't remember some of the calculus; just simply try to grasp what is happening behind the black box.

So, as mentioned before, a machine learning algorithm has to measure how close it is to some objective. We define this as a cost function, or a loss function. Sometimes, we hear it referred to as an objective function. Although not all machine learning algorithms are designed to directly minimize a loss function, we're going to learn the rule here rather than the exception. The point of a loss function is to determine the goodness of a model fit. It is typically evaluated over the course of a model's learning procedure and converges when the model has maximized its learning capacity.

A typical loss function computes a scalar value which is given by the true labels and the predicted labels. That is, given our actual y and our predicted y, which is . This notation might be cryptic, but all it means is that some function, L, which we're going to call our loss function, is going to accept the ground truth, which is y and the predictions, , and return some scalar value. The typical formula for the loss function is given as follows:

So, I've listed several common loss functions here, which may or may not look familiar. Sum of Squared Error (SSE) is a metric that we're going to be using for our regression models:

Cross entropy is a very commonly used classification metric:

In the following diagram, the L function on the left is simply indicating that it is our loss function over y and given parameter theta. So, for any algorithm, we want to find the set of the theta parameters that minimize the loss. That is, if we're predicting the cost of a house, for example, we may want to estimate the cost per square foot as accurately as possible so as to minimize how wrong we are.

Parameters are often in a much higher dimensional space than can be represented visually. So, the big question we're concerned with is the following: How can we minimize the cost? It is typically not feasible for us to attempt every possible value to determine the true minimum of a problem. So, we have to find a way to descend this nebulous hill of loss. The tough part is that, at any given point, we don't know whether the curve goes up or down without some kind of evaluation. And that's precisely what we want to avoid, because it's very expensive:

We can describe this problem as waking up in a pitch-black room with an uneven floor and trying to find the lowest point in the room. You don't know how big the room is. You don't know how deep or how high it gets. Where do you step first? One thing we can do is to examine exactly where we stand and determine which direction around us slopes downward. To do that, we have to measure the slope of the curve.

官术网_书友最值得收藏!

Supervised Machine Learning with Python

Loss functions