官术网_书友最值得收藏!

Summary

In this chapter, we have introduced some main concepts about machine learning. We started with some basic mathematical definitions, to have a clear view about data formats, standards, and kind of functions. This notation will be adopted in all the other chapters and it's also the most diffused in technical publications. We discussed how scikit-learn seamlessly works with multi-class problems, and when a strategy is preferable to another.

The next step was the introduction of some fundamental theoretical concepts about learnability. The main questions we tried to answer were: how can we decide if a problem can be learned by an algorithm and what is the maximum precision we can achieve. PAC learning is a generic but powerful definition that can be adopted when defining the boundaries of an algorithm. A PAC learnable problem, in fact, is not only manageable by a suitable algorithm but is also fast enough to be computed in polynomial time. Then we introduced some common statistical learning concepts, in particular, MAP and maximum likelihood learning approaches. The former tries to pick the hypothesis which maximizes the a posteriori probability, while the latter works the likelihood, looking for the hypothesis that best fits the data. This strategy is one of the most diffused in many machine learning problems because it's not affected by Apriori probabilities and it's very easy to implement in many different contexts. We also gave a physical interpretation of a loss function as an energy function. The goal of a training algorithm is to always try to find the global minimum point, which corresponds to the deepest valley in the error surface. At the end of this chapter, there was a brief introduction to information theory and how we can reinterpret our problems in terms of information gain and entropy. Every machine learning approach should work to minimize the amount of information needed to start from prediction and recover original (desired) outcomes.

In the next chapter, we're going to discuss the fundamental concepts of feature engineering, which is the first step in almost every machine learning pipeline. We're going to show how to manage different kinds of data (numerical and categorical) and how it's possible to reduce the dimensionality without a dramatic loss of information.

主站蜘蛛池模板: 惠来县| 孝义市| 广灵县| 双牌县| 顺昌县| 河北省| 青铜峡市| 汝城县| 柳林县| 隆子县| 铁岭市| 惠州市| 石柱| 张家界市| 康马县| 清徐县| 蕉岭县| 西昌市| 革吉县| 孙吴县| 当雄县| 延吉市| 扶余县| 遂宁市| 喜德县| 古浪县| 扎兰屯市| 安图县| 嘉荫县| 盱眙县| 新丰县| 赤壁市| 平远县| 盐城市| 喀什市| 商洛市| 泸水县| 东阳市| 澜沧| 比如县| 永春县|