官术网_书友最值得收藏!

Unsupervised learning

How would you summarize and group a dataset if the labels were not given? Probably, you'll try to answer this question by finding the underlying structure of a dataset and measuring the statistical properties such as frequency distribution, mean, standard deviation, and so on. If the question is how would you effectively represent data in a compressed format? You'll probably reply saying that you'll use some software for doing the compression, although you might have no idea how that software would do it. The following diagram shows the typical workflow of an unsupervised learning task:

These are exactly two of the main goals of unsupervised learning, which is largely a data-driven process. We call this type of learning unsupervised because you will have to deal with unlabeled data. The following quote comes from Yann LeCun, director of AI research (source: Predictive Learning, NIPS 2016, Yann LeCun, Facebook Research):

"Most of human and animal learning is unsupervised learning. If intelligence was a cake, unsupervised learning would be the cake, supervised learning would be the icing on the cake, and reinforcement learning would be the cherry on the cake. We know how to make the icing and the cherry, but we don't know how to make the cake. We need to solve the unsupervised learning problem before we can even think of getting to true AI".

The two most widely used unsupervised learning tasks include the following:

  • Clustering: Grouping data points based on similarity (or statistical properties). For example, a company such as Airbnb often groups its apartments and houses into neighborhoods so that customers can navigate the listed ones more easily.
  • Dimensionality reduction: Compressing the data with the structure and statistical properties preserved as much as possible. For example, often the number of dimensions of the dataset needs to be reduced for the modeling and visualization.
  • Anomaly detection: Useful in several applications such as identification of credit card fraud detection, identifying faulty pieces of hardware in an industrial engineering process, and identifying outliers in large-scale datasets.
  • Association rule mining: Often used in market basket analysis, for example, asking which items are brought together and frequently.
主站蜘蛛池模板: 镇坪县| 太仆寺旗| 景宁| 吉林省| 明水县| 乌苏市| 法库县| 阜城县| 华宁县| 顺昌县| 修武县| 巴彦淖尔市| 栾川县| 隆尧县| 冕宁县| 电白县| 涞水县| 揭阳市| 永济市| 衡南县| 小金县| 贡山| 茌平县| 抚顺县| 宜春市| 阜阳市| 嘉峪关市| 诸城市| 资阳市| 伊宁市| 海林市| 大悟县| 紫金县| 措勤县| 南郑县| 田阳县| 吐鲁番市| 介休市| 枞阳县| 雷波县| 措美县|