- Applied Unsupervised Learning with Python
- Benjamin Johnston Aaron Jones Christopher Kruger
- 405字
- 2021-06-11 13:23:55
Clustering Refresher
Chapter 1, Introduction to Clustering, covered both the high-level intuition and in-depth details of one of the most basic clustering algorithms: k-means. While it is indeed a simple approach, do not discredit it; it will be a valuable addition to your toolkit as you continue your exploration of the unsupervised learning world. In many real-world use cases, companies experience groundbreaking discoveries through the simplest methods, such as k-means or linear regression (for supervised learning). As a refresher, let's quickly walk through what clusters are and how k-means works to find them:

Figure 2.1: The attributes that separate supervised and unsupervised problems
If you were given a random collection of data without any guidance, you would likely start your exploration using basic statistics – for example, what the mean, median, and mode values are of each of the features. Remember that, from a high-level data model that simply exists, knowing whether it is supervised or unsupervised learning is ascribed by the data goals that you have set for yourself or that were set by your manager. If you were to determine that one of the features was actually a label and you wanted to see how the remaining features in the dataset influence it, this would become a supervised learning problem. However, if after initial exploration you realize that the data you have is simply a collection of features without a target in mind (such as a collection of health metrics, purchase invoices from a web store, and so on), then you could analyze it through unsupervised methods.
A classic example of unsupervised learning is finding clusters of similar customers in a collection of invoices from a web store. Your hypothesis is that by understanding which people are most similar, you can create more granular marketing campaigns that appeal to each cluster's interests. One way to achieve these clusters of similar users is through k-means.
k-means Refresher
k-means clustering works by finding "k" number clusters in your data through pairwise Euclidean distance calculations. "K" points (also called centroids) are randomly initialized in your data and the distance is calculated from each data point to each of the centroids. The minimum of these distances designates which cluster a data point belongs to. Once every point has been assigned to a cluster, the mean intra-cluster data point is calculated as the new centroid. This process is repeated until the newly-calculated cluster centroid no longer changes position.
- 深度實踐OpenStack:基于Python的OpenStack組件開發
- Microsoft Application Virtualization Cookbook
- Mastering Entity Framework
- Apache Spark Graph Processing
- Java加密與解密的藝術
- Flutter跨平臺開發入門與實戰
- 深入理解Elasticsearch(原書第3版)
- Tableau 10 Bootcamp
- Python項目實戰從入門到精通
- HTML5+CSS3+jQuery Mobile APP與移動網站設計從入門到精通
- Everyday Data Structures
- Sails.js Essentials
- 絕密原型檔案:看看專業產品經理的原型是什么樣
- Go語言編程之旅:一起用Go做項目
- Python輕松學:爬蟲、游戲與架站