- Feature Engineering Made Easy
- Sinan Ozdemir Divya Susarla
- 456字
- 2021-06-25 22:45:50
Unsupervised learning example – marketing segments
Suppose we are given a large (one million rows) dataset where each row/observation is a single person with basic demographic information (age, gender, and so on) as well as the number of items purchased, which represents how many items this person has bought from a particular store:

This is a sample of our marketing dataset where each row represents a single customer with three basic attributes about each person. Our goal will be to segment this dataset into types or clusters of people so that the company performing the analysis can understand the customer profiles much better.
Now, of course, We’ve only shown 8 out of one million rows, which can be daunting. Of course, we can perform basic descriptive statistics on this dataset and get averages, standard deviations, and so on of our numerical columns; however, what if we wished to segment these one million people into different types so that the marketing department can have a much better sense of the types of people who shop and create more appropriate advertisements for each segment?
Each type of customer would exhibit particular qualities that make that segment unique. For example, they may find that 20% of their customers fall into a category they like to call young and wealthy that are generally younger and purchase several items.
This type of analysis and the creation of these types can fall under a specific type of unsupervised learning called clustering. We will discuss this machine learning algorithm in further detail later on in this book, but for now, clustering will create a new feature that separates out the people into distinct types or clusters:

This shows our customer dataset after a clustering algorithm has been applied. Note the new column at the end called cluster that represents the types of people that the algorithm has identified. The idea is that the people who belong to similar clusters behave similarly in regards to the data (have similar ages, genders, purchase behaviors). Perhaps cluster six might be renamed as young buyers.
This example of clustering shows us why sometimes we aren’t concerned with predicting anything, but instead wish to understand our data on a deeper level by adding new and interesting features, or even removing irrelevant features.
Note that we are referring to every column as a feature because there is no response in unsupervised learning since there is no prediction occurring.
It’s all starting to make sense now, isn’t it? These features that we talk about repeatedly are what this book is primarily concerned with. Feature engineering involves the understanding and transforming of features in relation to both unsupervised and supervised learning.
- 數據分析實戰:基于EXCEL和SPSS系列工具的實踐
- Spark快速大數據分析(第2版)
- Effective Amazon Machine Learning
- 大數據算法
- 商業分析思維與實踐:用數據分析解決商業問題
- 數據庫應用基礎教程(Visual FoxPro 9.0)
- 數據要素五論:信息、權屬、價值、安全、交易
- 大數據時代下的智能轉型進程精選(套裝共10冊)
- Sybase數據庫在UNIX、Windows上的實施和管理
- INSTANT Cytoscape Complex Network Analysis How-to
- IPython Interactive Computing and Visualization Cookbook(Second Edition)
- SIEMENS數控技術應用工程師:SINUMERIK 840D-810D數控系統功能應用與維修調整教程
- 利用Python進行數據分析(原書第2版)
- 大數據測試技術:數據采集、分析與測試實踐(在線實驗+在線自測)
- Delphi High Performance