jdb电子哪个游戏好赢钱

書名： Applied Unsupervised Learning with Python
作者名： Benjamin Johnston Aaron Jones Christopher Kruger
本章字數： 405字
更新時間： 2021-06-11 13:23:55

Clustering Refresher

Chapter 1, Introduction to Clustering, covered both the high-level intuition and in-depth details of one of the most basic clustering algorithms: k-means. While it is indeed a simple approach, do not discredit it; it will be a valuable addition to your toolkit as you continue your exploration of the unsupervised learning world. In many real-world use cases, companies experience groundbreaking discoveries through the simplest methods, such as k-means or linear regression (for supervised learning). As a refresher, let's quickly walk through what clusters are and how k-means works to find them:

Figure 2.1: The attributes that separate supervised and unsupervised problems

If you were given a random collection of data without any guidance, you would likely start your exploration using basic statistics – for example, what the mean, median, and mode values are of each of the features. Remember that, from a high-level data model that simply exists, knowing whether it is supervised or unsupervised learning is ascribed by the data goals that you have set for yourself or that were set by your manager. If you were to determine that one of the features was actually a label and you wanted to see how the remaining features in the dataset influence it, this would become a supervised learning problem. However, if after initial exploration you realize that the data you have is simply a collection of features without a target in mind (such as a collection of health metrics, purchase invoices from a web store, and so on), then you could analyze it through unsupervised methods.

A classic example of unsupervised learning is finding clusters of similar customers in a collection of invoices from a web store. Your hypothesis is that by understanding which people are most similar, you can create more granular marketing campaigns that appeal to each cluster's interests. One way to achieve these clusters of similar users is through k-means.

k-means Refresher

k-means clustering works by finding "k" number clusters in your data through pairwise Euclidean distance calculations. "K" points (also called centroids) are randomly initialized in your data and the distance is calculated from each data point to each of the centroids. The minimum of these distances designates which cluster a data point belongs to. Once every point has been assigned to a cluster, the mean intra-cluster data point is calculated as the new centroid. This process is repeated until the newly-calculated cluster centroid no longer changes position.

官术网_书友最值得收藏!

Applied Unsupervised Learning with Python

Clustering Refresher

k-means Refresher