官术网_书友最值得收藏!

Mathematical optimization – how learning works

The magic behind the learning process is delivered by the branch of mathematics called mathematical optimization. Sometimes it's also somewhat misleading being referred to as mathematical programming; the term coined long before widespread computer programming and is not directly related to it. Optimization is the science of choosing the best option among available alternatives; for example, choosing the best ML model.

Mathematically speaking, ML models are functions. You as an engineer chose the function family depending on your preferences: linear models, trees, neural networks, support vector machines, and so on. Learning is a process of picking from the family the function which serves your goals the best. This notion of the best model is often defined by another function, the loss function. It estimates a goodness of the model according to some criteria; for instance, how good the model fits the data, how complex it is, and so on. You can think of the loss function as a judge at a competition whose role is to assess the models. The objective of the learning is to find such a model that delivers a minimum to the loss function (minimize the loss), so the whole learning process is formalized in mathematical terms as a task of function minimization.

Function minimum can be found in two ways: analytically (calculus) or numerically (iterative methods). In ML , we often go for the numerical optimization because the loss functions get too complex for analytical solutions.

A nice interactive tutorial on numerical optimization can be found here: http://www.benfrederickson.com/numerical-optimization/.

From the programmer's point of view, learning is an iterative process of adjusting model parameters until the optimal solution is found. In practice, after a number of iterations, the algorithm stops improving because it is stuck in a local optimum or has reached the global optimum (see the following diagram). If the algorithm always finds the local or global optimum, we say that it converges. On the other hand, if you see your algorithm oscillating more and more and never approaching a useful result, it diverges:

Figure 1.4: Learner represented as a ball on a complex surface: it's possible for him to fall in a local minimum and never reach the global one
主站蜘蛛池模板: 陵川县| 商南县| 柘荣县| 晋州市| 普兰店市| 乌兰察布市| 汤阴县| 武隆县| 钟祥市| 辰溪县| 光泽县| 龙江县| 积石山| 岳池县| 淮安市| 庆城县| 卫辉市| 开远市| 宁津县| 大丰市| 钟祥市| 武鸣县| 佛学| 丹棱县| 哈尔滨市| 开鲁县| 莱西市| 百色市| 镇原县| 格尔木市| 安丘市| 岑巩县| 平山县| 长阳| 基隆市| 巴塘县| 隆尧县| 连江县| 长汀县| 尚义县| 兴海县|