官术网_书友最值得收藏!

Unsupervised learning

In unsupervised learning, you don't have the labels for the cases in your dataset. Types of tasks to solve with unsupervised learning are: clustering, anomaly detection, dimensionality reduction, and association rule learning.

Sometimes you don't have the labels for your data points but you still want to group them in some meaningful way. You may or may not know the exact number of groups. This is the setting where clustering algorithms are used. The most obvious example is clustering users into some groups, like students, parents, gamers, and so on. The important detail here is that a group's meaning is not predefined from the very beginning; you name it only after you've finished grouping your samples. Clustering also can be useful to extract additional features from the data as a preliminary step for supervised learning. We will discuss clustering in Chapter 4, K-Means Clustering.

Outlier/anomaly detection algorithms are used when the goal is to find some anomalous patterns in the data, weird data points. This can be especially useful for automated fraud or intrusion detection. Outlier analysis is also an important detail of data cleansing.

Dimensionality reduction is a way to distill data to the most informative and, at the same time, compact representation of it. The goal is to reduce a number of features without losing important information. It can be used as a preprocessing step before supervised learning or data visualization.

Association rule learning looks for repeated patterns of user behavior and peculiar co-occurrences of items. An example from retail practice: if a customer buys milk, isn't it more probable that he will also buy cereal? If yes, then perhaps it's better to move shelves, with the cereals closer to the shelf with the milk. Having rules like this, owners of businesses can make informed decisions and adapt their services to customers' needs. In the context of software development, this can empower anticipatory design—when the app seemingly knows what you want to do next and provides suggestions accordingly. In Chapter 5, Association Rule Learning we will implement a priori one of the most well-known rule learning algorithms:

Figure 1.2: Datasets for three types of learning: supervised, unsupervised, and semi-supervised
Labeling data manually is usually a costly thing, especially if special qualification is required. Semi-supervised learning can help when only some of your samples are labeled and others are not (see the following diagram). It is a hybrid of supervised and unsupervised learning. At first, it looks for unlabeled instances, similar to the labeled ones in an unsupervised manner, and includes them in the training dataset. After this, the algorithm can be trained on this expanded dataset in a typical supervised manner.
主站蜘蛛池模板: 永善县| 鄂伦春自治旗| 禄丰县| 汉阴县| 大庆市| 大石桥市| 开平市| 凉山| 纳雍县| 上高县| 建昌县| 弥勒县| 安多县| 沈丘县| 威海市| 娄烦县| 荃湾区| 中牟县| 五寨县| 祥云县| 江源县| 西林县| 喀什市| 澎湖县| 武功县| 万山特区| 新宾| 霍林郭勒市| 黑河市| 瓦房店市| 集贤县| 海兴县| 永泰县| 冷水江市| 莎车县| 连江县| 育儿| 惠水县| 五常市| 运城市| 周宁县|