官术网_书友最值得收藏!

Loss functions

First, we'll cover loss functions, and then, prior to diving into hill climbing and descent, we'll take a quick math refresher.

There's going to be some math in this lesson, and while we try to shy away from the purely theoretical concepts, this is something that we simply have to get through in order to understand the guts of most of these algorithms. There will be a brief applied section at the end. Don't panic if you can't remember some of the calculus; just simply try to grasp what is happening behind the black box.

So, as mentioned before, a machine learning algorithm has to measure how close it is to some objective. We define this as a cost function, or a loss function. Sometimes, we hear it referred to as an objective function. Although not all machine learning algorithms are designed to directly minimize a loss function, we're going to learn the rule here rather than the exception. The point of a loss function is to determine the goodness of a model fit. It is typically evaluated over the course of a model's learning procedure and converges when the model has maximized its learning capacity.

A typical loss function computes a scalar value which is given by the true labels and the predicted labels. That is, given our actual y and our predicted y, which is . This notation might be cryptic, but all it means is that some function, L, which we're going to call our loss function, is going to accept the ground truth, which is y and the predictions, , and return some scalar value. The typical formula for the loss function is given as follows:

So, I've listed several common loss functions here, which may or may not look familiar. Sum of Squared Error (SSE) is a metric that we're going to be using for our regression models:

Cross entropy is a very commonly used classification metric:

In the following diagram, the L function on the left is simply indicating that it is our loss function over y and  given parameter theta. So, for any algorithm, we want to find the set of the theta parameters that minimize the loss. That is, if we're predicting the cost of a house, for example, we may want to estimate the cost per square foot as accurately as possible so as to minimize how wrong we are.

Parameters are often in a much higher dimensional space than can be represented visually. So, the big question we're concerned with is the following: How can we minimize the cost? It is typically not feasible for us to attempt every possible value to determine the true minimum of a problem. So, we have to find a way to descend this nebulous hill of loss. The tough part is that, at any given point, we don't know whether the curve goes up or down without some kind of evaluation. And that's precisely what we want to avoid, because it's very expensive:

We can describe this problem as waking up in a pitch-black room with an uneven floor and trying to find the lowest point in the room. You don't know how big the room is. You don't know how deep or how high it gets. Where do you step first? One thing we can do is to examine exactly where we stand and determine which direction around us slopes downward. To do that, we have to measure the slope of the curve.

主站蜘蛛池模板: 鹤岗市| 周至县| 谢通门县| 襄城县| 晋江市| 喀喇沁旗| 图们市| 韶山市| 闽侯县| 鲜城| 山东| 大荔县| 含山县| 漠河县| 红桥区| 嫩江县| 叶城县| 方正县| 荆门市| 六枝特区| 东兰县| 岳普湖县| 高淳县| 台前县| 扬州市| 青冈县| 廊坊市| 雷山县| 萍乡市| 金昌市| 绵竹市| 凤冈县| 丹江口市| 长宁县| 张北县| 肥城市| 梓潼县| 潮州市| 朔州市| 景洪市| 延安市|