官术网_书友最值得收藏!

Introduction

The previous chapters introduced you to very popular and extremely powerful machine learning algorithms. They all have one thing in common, which is that they belong to the same category of algorithms: supervised learning. This kind of algorithm tries to learn patterns based on a specified outcome column (target variable) such as sales, employee churn, or class of customer.

But what if you don't have such a variable in your dataset or you don't want to specify a target variable? Will you still be able to run some machine learning algorithms on it and find interesting patterns? The answer is yes, with the use of clustering algorithms that belong to the unsupervised learning category.

Clustering algorithms are very popular in the data science industry for grouping similar data points and detecting outliers. For instance, clustering algorithms can be used by banks for fraud detection by identifying unusual clusters from the data. They can also be used by e-commerce companies to identify groups of users with similar browsing behaviors, as in the following figures:

Figure 5.1: Example of data on customers with similar browsing behaviors without clustering analysis performed

Clustering analysis performed on this data would uncover natural patterns by grouping similar data points such that you may get the following result:

Figure 5.2: Clustering analysis performed on the data on customers with similar browsing behaviors

The data is now segmented into three customer groups depending on their recurring visits and time spent on the website, and different marketing plans can then be used for each of these groups in order to maximize sales.

In this chapter, you will learn how to perform such analysis using a very famous clustering algorithm called k-means.

主站蜘蛛池模板: 利津县| 墨脱县| 辛集市| 鄄城县| 柘荣县| 筠连县| 南昌县| 米脂县| 治县。| 恭城| 崇义县| 潜山县| 宣化县| 阳春市| 张家界市| 太保市| 武山县| 兴业县| 同德县| 龙泉市| 灵宝市| 宁安市| 南阳市| 大连市| 含山县| 蒙阴县| 黑龙江省| 安化县| 黄冈市| 光泽县| 托里县| 德州市| 七台河市| 水城县| 综艺| 濉溪县| 新宁县| 青浦区| 衡阳市| 高要市| 惠来县|