- Advanced Machine Learning with R
- Cory Lesmeister Dr. Sunil Kumar Chinnamgari
- 259字
- 2021-06-24 14:24:37
Model training and evaluation
As mentioned previously, we'll be predicting customer satisfaction. The data is based on a former online competition. I've taken the training portion of the data and cleaned it up for our use.
This is an excellent dataset for a classification problem for many reasons. Like so much customer data, it's very messy— especially before I removed a bunch of useless features (there was something like four dozen zero variance features). As discussed in the prior two chapters, I addressed missing values, linear dependencies, and highly correlated pairs. I also found the feature names lengthy and useless, so I coded them V1 through V142. The resulting data deals with what's usually a difficult thing to measure: satisfaction. Because of proprietary methods, no description or definition of satisfaction is given.
Having worked previously in the world of banking, I can assure you that it's a somewhat challenging proposition and fraught with measurement error. As such, there's quite a bit of noise relative to the signal and you can expect model performance to be rather poor. Also, the outcome of interest, customer dissatisfaction, is relatively rare when compared to customers not dissatisfied. The classic problem is that you end up with quite a few false positives when trying to classify the minority labels.
As always, you can find the data on GitHub: https://github.com/PacktPublishing/Advanced-Machine-Learning-with-R/blob/master/Data/santander_prepd.RData.
So, let's start by first loading the data and training a logistic regression algorithm.
- Instant uTorrent
- Getting Started with Qt 5
- Building 3D Models with modo 701
- 基于Proteus仿真的51單片機應用
- 深入理解序列化與反序列化
- 新編電腦組裝與硬件維修從入門到精通
- 單片微機原理及應用
- FPGA實戰訓練精粹
- Corona SDK Mobile Game Development:Beginner's Guide
- The Machine Learning Workshop
- 新編計算機組裝與維護
- Hands-On Embedded Programming with C++17
- Hands-On Unsupervised Learning with Python
- SOA架構:服務和微服務分析及設計(原書第2版)
- Hands-On Python Deep Learning for the Web