- Mastering Machine Learning with R(Second Edition)
- Cory Lesmeister
- 344字
- 2021-07-09 18:23:56
Logistic Regression and Discriminant Analysis
"The true logic of this world is the calculus of probabilities."
- James Clerk Maxwell, Scottish physicist
In the previous chapter, we took a look at using Ordinary Least Squares (OLS) to predict a quantitative outcome, or in other words, linear regression. It is now time to shift gears somewhat and examine how we can develop algorithms to predict qualitative outcomes. Such outcome variables could be binary (male versus female, purchase versus does not purchase, tumor is benign versus malignant) or multinomial categories (education level or eye color). Regardless of whether the outcome of interest is binary or multinomial, the task of the analyst is to predict the probability of an observation belonging to a particular category of the outcome variable. In other words, we develop an algorithm in order to classify the observations.
To begin exploring classification problems, we will discuss why applying the OLS linear regression is not the correct technique and how the algorithms introduced in this chapter can solve these issues. We will then look at a problem of predicting whether or not a biopsied tumor mass is classified as benign or malignant. The dataset is the well-known and widely available Wisconsin Breast Cancer Data. To tackle this problem, we will begin by building and interpreting logistic regression models. We will also begin examining methods so as to select features and the most appropriate model. Next, we will discuss both linear and quadratic discriminant analyses and compare and contrast these with logistic regression. Then, building predictive models on the breast cancer data will follow. Finally, we will wrap it up by looking at multivariate regression splines and ways to select the best overall algorithm in order to address the question at hand. These methods (creating test/train datasets and cross-validation) will set the stage for more advanced machine learning methods in subsequent chapters.
- 大規(guī)模數(shù)據(jù)分析和建模:基于Spark與R
- 輕松學大數(shù)據(jù)挖掘:算法、場景與數(shù)據(jù)產(chǎn)品
- 正則表達式必知必會
- MySQL基礎(chǔ)教程
- 數(shù)據(jù)結(jié)構(gòu)與算法(C語言版)
- 大數(shù)據(jù)時代下的智能轉(zhuǎn)型進程精選(套裝共10冊)
- 大話Oracle Grid:云時代的RAC
- 大數(shù)據(jù)營銷:如何讓營銷更具吸引力
- Starling Game Development Essentials
- 白話大數(shù)據(jù)與機器學習
- SQL Server深入詳解
- 中國云存儲發(fā)展報告
- 數(shù)據(jù)賦能
- Cognitive Computing with IBM Watson
- ECharts數(shù)據(jù)可視化:入門、實戰(zhàn)與進階