- The Data Science Workshop
- Anthony So Thomas V. Joseph Robert Thas John Andrew Worsley Dr. Samuel Asare
- 283字
- 2021-06-11 18:27:26
Introduction
The previous chapters introduced you to very popular and extremely powerful machine learning algorithms. They all have one thing in common, which is that they belong to the same category of algorithms: supervised learning. This kind of algorithm tries to learn patterns based on a specified outcome column (target variable) such as sales, employee churn, or class of customer.
But what if you don't have such a variable in your dataset or you don't want to specify a target variable? Will you still be able to run some machine learning algorithms on it and find interesting patterns? The answer is yes, with the use of clustering algorithms that belong to the unsupervised learning category.
Clustering algorithms are very popular in the data science industry for grouping similar data points and detecting outliers. For instance, clustering algorithms can be used by banks for fraud detection by identifying unusual clusters from the data. They can also be used by e-commerce companies to identify groups of users with similar browsing behaviors, as in the following figures:

Figure 5.1: Example of data on customers with similar browsing behaviors without clustering analysis performed
Clustering analysis performed on this data would uncover natural patterns by grouping similar data points such that you may get the following result:

Figure 5.2: Clustering analysis performed on the data on customers with similar browsing behaviors
The data is now segmented into three customer groups depending on their recurring visits and time spent on the website, and different marketing plans can then be used for each of these groups in order to maximize sales.
In this chapter, you will learn how to perform such analysis using a very famous clustering algorithm called k-means.
- Node.js Design Patterns
- Mastering AWS Lambda
- Linux C/C++服務器開發實踐
- Developing Middleware in Java EE 8
- Oracle 12c中文版數據庫管理、應用與開發實踐教程 (清華電腦學堂)
- UI智能化與前端智能化:工程技術、實現方法與編程思想
- 軟件架構:Python語言實現
- JavaScript:Moving to ES2015
- Learning Vaadin 7(Second Edition)
- Express Web Application Development
- Mastering Akka
- 快速入門與進階:Creo 4·0全實例精講
- .NET 4.0面向對象編程漫談:應用篇
- LabVIEW數據采集
- 實驗編程:PsychoPy從入門到精通