- The Unsupervised Learning Workshop
- Aaron Jones Christopher Kruger Benjamin Johnston
- 465字
- 2021-06-18 18:12:50
Clustering Refresher
Chapter 1, Introduction to Clustering, covered both the high-level concepts and in-depth details of one of the most basic clustering algorithms: k-means. While it is indeed a simple approach, do not discredit it; it will be a valuable addition to your toolkit as you continue your exploration of the unsupervised learning world. In many real-world use cases, companies experience valuable discoveries through the simplest methods, such as k-means or linear regression (for supervised learning). An example of this is evaluating a large selection of customer data – if you were to evaluate it directly in a table, it would be unlikely that you'd find anything helpful. However, even a simple clustering algorithm can identify where groups within the data are similar and dissimilar. As a refresher, let's quickly walk through what clusters are and how k-means works to find them:

Figure 2.1: The attributes that separate supervised and unsupervised problems
If you were given a random collection of data without any guidance, you would probably start your exploration using basic statistics – for example, the mean, median, and mode values for each of the features. Given a dataset, choosing supervised or unsupervised learning as an approach to derive insights is dependent on the data goals that you have set for yourself. If you were to determine that one of the features was actually a label and you wanted to see how the remaining features in the dataset influence it, this would become a supervised learning problem. However, if, after initial exploration, you realized that the data you have is simply a collection of features without a target in mind (such as a collection of health metrics, purchase invoices from a web store, and so on), then you could analyze it through unsupervised methods.
A classic example of unsupervised learning is finding clusters of similar customers in a collection of invoices from a web store. Your hypothesis is that by finding out which people are the most similar, you can create more granular marketing campaigns that appeal to each cluster's interests. One way to achieve these clusters of similar users is through k-means.
The k-means Refresher
The k-means clustering works by finding "k" number of clusters in your data through certain distance calculations such as Euclidean, Manhattan, Hamming, Minkowski, and so on. "K" points (also called centroids) are randomly initialized in your data and the distance is calculated from each data point to each of the centroids. The minimum of these distances designates which cluster a data point belongs to. Once every point has been assigned to a cluster, the mean intra-cluster data point is calculated as the new centroid. This process is repeated until the newly calculated cluster centroid no longer changes position or until the maximum limit of iterations is reached.
- Learning Cocos2d-x Game Development
- FPGA從入門到精通(實戰(zhàn)篇)
- 電腦常見故障現(xiàn)場處理
- Hands-On Machine Learning with C#
- 嵌入式系統(tǒng)中的模擬電路設(shè)計
- VCD、DVD原理與維修
- 微軟互聯(lián)網(wǎng)信息服務(wù)(IIS)最佳實踐 (微軟技術(shù)開發(fā)者叢書)
- 筆記本電腦維修300問
- 超大流量分布式系統(tǒng)架構(gòu)解決方案:人人都是架構(gòu)師2.0
- Managing Data and Media in Microsoft Silverlight 4:A mashup of chapters from Packt's bestselling Silverlight books
- 新編電腦組裝與硬件維修從入門到精通
- FL Studio Cookbook
- 電腦橫機使用與維修
- 嵌入式系統(tǒng)原理及應(yīng)用:基于ARM Cortex-M4體系結(jié)構(gòu)
- USB應(yīng)用分析精粹:從設(shè)備硬件、固件到主機端程序設(shè)計