官术网_书友最值得收藏!

Unsupervised learning example – marketing segments

Suppose we are given a large (one million rows) dataset where each row/observation is a single person with basic demographic information (age, gender, and so on) as well as the number of items purchased, which represents how many items this person has bought from a particular store:

This is a sample of our marketing dataset where each row represents a single customer with three basic attributes about each person. Our goal will be to segment this dataset into types or clusters of people so that the company performing the analysis can understand the customer profiles much better.

Now, of course, We’ve only shown 8 out of one million rows, which can be daunting. Of course, we can perform basic descriptive statistics on this dataset and get averages, standard deviations, and so on of our numerical columns; however, what if we wished to segment these one million people into different types so that the marketing department can have a much better sense of the types of people who shop and create more appropriate advertisements for each segment?

Each type of customer would exhibit particular qualities that make that segment unique. For example, they may find that 20% of their customers fall into a category they like to call young and wealthy that are generally younger and purchase several items.

This type of analysis and the creation of these types can fall under a specific type of unsupervised learning called clustering. We will discuss this machine learning algorithm in further detail later on in this book, but for now, clustering will create a new feature that separates out the people into distinct types or clusters:

This shows our customer dataset after a clustering algorithm has been applied. Note the new column at the end called cluster that represents the types of people that the algorithm has identified. The idea is that the people who belong to similar clusters behave similarly in regards to the data (have similar ages, genders, purchase behaviors). Perhaps cluster six might be renamed as young buyers.

This example of clustering shows us why sometimes we aren’t concerned with predicting anything, but instead wish to understand our data on a deeper level by adding new and interesting features, or even removing irrelevant features.

Note that we are referring to every column as a feature because there is no response in unsupervised learning since there is no prediction occurring.

It’s all starting to make sense now, isn’t it? These features that we talk about repeatedly are what this book is primarily concerned with. Feature engineering involves the understanding and transforming of features in relation to both unsupervised and supervised learning.

主站蜘蛛池模板: 五寨县| 信宜市| 于田县| 霍林郭勒市| 左贡县| 荣昌县| 乐安县| 青田县| 榆林市| 扎兰屯市| 肃南| 正阳县| 和林格尔县| 加查县| 光泽县| 和硕县| 沅江市| 苏尼特左旗| 尼勒克县| 冷水江市| 永寿县| 廊坊市| 新乡市| 阿鲁科尔沁旗| 加查县| 静安区| 东方市| 正定县| 平阴县| 渝中区| 南陵县| 张家界市| 金川县| 军事| 肇源县| 泰来县| 西乌珠穆沁旗| 新蔡县| 湾仔区| 迭部县| 江津市|