官术网_书友最值得收藏!

  • Machine Learning Algorithms
  • Giuseppe Bonaccorso
  • 268字
  • 2021-07-02 18:53:32

Summary

Feature selection is the first (and sometimes the most important) step in a machine learning pipeline. Not all the features are useful for our purposes and some of them are expressed using different notations, so it's often necessary to preprocess our dataset before any further operations. 

We saw how to split the data into training and test sets using a random shuffle and how to manage missing elements. Another very important section covered the techniques used to manage categorical data or labels, which are very common when a certain feature assumes only a discrete set of values.

Then we analyzed the problem of dimensionality. Some datasets contain many features which are correlated with each other, so they don't provide any new information but increase the computational complexity and reduce the overall performances. Principal component analysis is a method to select only a subset of features which contain the largest amount of total variance. This approach, together with its variants, allows to decorrelate the features and reduce the dimensionality without a drastic loss in terms of accuracy. Dictionary learning is another technique used to extract a limited number of building blocks from a dataset, together with the information needed to rebuild each sample. This approach is particularly useful when the dataset is made up of different versions of similar elements (such as images, letters, or digits).

In the next chapter, we're going to discuss linear regression, which is the most diffused and simplest supervised approach to predict continuous values. We'll also analyze how to overcome some limitations and how to solve non-linear problems using the same algorithms.

主站蜘蛛池模板: 嘉祥县| 南部县| 仁寿县| 田阳县| 兴隆县| 上蔡县| 达日县| 绥中县| 泰宁县| 广南县| 和政县| 武强县| 左权县| 沂源县| 南京市| 安乡县| 保靖县| 九江市| 松溪县| 津南区| 广安市| 北票市| 思茅市| 鄢陵县| 托克托县| 拉孜县| 西华县| 蒙阴县| 二手房| 宜宾县| 左权县| 建湖县| 三原县| 周宁县| 喀什市| 东港市| 陆河县| 达拉特旗| 云阳县| 黄石市| 剑河县|