官术网_书友最值得收藏!

Unsupervised learning example – marketing segments

Suppose we are given a large (one million rows) dataset where each row/observation is a single person with basic demographic information (age, gender, and so on) as well as the number of items purchased, which represents how many items this person has bought from a particular store:

This is a sample of our marketing dataset where each row represents a single customer with three basic attributes about each person. Our goal will be to segment this dataset into types or clusters of people so that the company performing the analysis can understand the customer profiles much better.

Now, of course, We’ve only shown 8 out of one million rows, which can be daunting. Of course, we can perform basic descriptive statistics on this dataset and get averages, standard deviations, and so on of our numerical columns; however, what if we wished to segment these one million people into different types so that the marketing department can have a much better sense of the types of people who shop and create more appropriate advertisements for each segment?

Each type of customer would exhibit particular qualities that make that segment unique. For example, they may find that 20% of their customers fall into a category they like to call young and wealthy that are generally younger and purchase several items.

This type of analysis and the creation of these types can fall under a specific type of unsupervised learning called clustering. We will discuss this machine learning algorithm in further detail later on in this book, but for now, clustering will create a new feature that separates out the people into distinct types or clusters:

This shows our customer dataset after a clustering algorithm has been applied. Note the new column at the end called cluster that represents the types of people that the algorithm has identified. The idea is that the people who belong to similar clusters behave similarly in regards to the data (have similar ages, genders, purchase behaviors). Perhaps cluster six might be renamed as young buyers.

This example of clustering shows us why sometimes we aren’t concerned with predicting anything, but instead wish to understand our data on a deeper level by adding new and interesting features, or even removing irrelevant features.

Note that we are referring to every column as a feature because there is no response in unsupervised learning since there is no prediction occurring.

It’s all starting to make sense now, isn’t it? These features that we talk about repeatedly are what this book is primarily concerned with. Feature engineering involves the understanding and transforming of features in relation to both unsupervised and supervised learning.

主站蜘蛛池模板: 丽水市| 南雄市| 宁晋县| 同仁县| 陆良县| 枣强县| 大兴区| 高尔夫| 西宁市| 钦州市| 灵宝市| 鸡泽县| 屏东市| 克拉玛依市| 邯郸市| 汝南县| 桦川县| 门源| 尉氏县| 河间市| 卓资县| 洱源县| 马鞍山市| 大荔县| 通州市| 彭州市| 海盐县| 兰坪| 安福县| 勐海县| 太仆寺旗| 安丘市| 龙山县| 那坡县| 桂林市| 九江市| 长泰县| 汝阳县| 睢宁县| 历史| 河西区|