官术网_书友最值得收藏!

Regression and Classification Problems

We see classification and regression problems all around us in our daily life. The chances of rain from https://weather.com, our emails getting filtered into the spam mailbox and inbox, our personal and home loans getting accepted or rejected, deciding to pick our next holiday destination, exploring the options for buying a new house, investment decisions to gain short- and long-term benefits, purchasing the next book from Amazon; the list goes on and on. The world around us today is increasingly being run by algorithms that help us with our choices (which is not always a good thing).

As discussed in Chapter 2, Exploratory Analysis of Data, we will use the Minto Pyramid principle called Situation–Complication–Question (SCQ) to define our problem statement. The following table shows the SCQ approach for Beijing's PM2.5 problem:

Figure 3.3: Applying SCQ on Beijing's PM2.5 problem.

Now, in the SCQ construct described in the previous table, we can do a simple correlation analysis to establish the factors affecting the PM2.5 levels or create a predictive problem (prediction means finding an approximate function that maps from input variables to an output) that estimates the PM2.5 levels using all the factors. For the clarity of terminology, we will refer to factors as input variables. Then, PM2.5 becomes the dependent variable (often referred to as output variable). The dependent variable could be either categorical or continuous.

For example, in the email classification into SPAM/NOT SPAM problem, the dependent variable is categorical. The following table highlights some critical differences between regression and classification problems:

Figure 3.4: Difference between regression and classification problems.

主站蜘蛛池模板: 峨边| 定陶县| 雷波县| 托克逊县| 合肥市| 咸阳市| 喀什市| 城步| 厦门市| 合水县| 思南县| 延长县| 辉县市| 神木县| 云梦县| 江津市| 肃南| 扬中市| 恭城| 隆昌县| 扬中市| 大厂| 萍乡市| 莆田市| 札达县| 中卫市| 兴文县| 宣汉县| 平乐县| 丽江市| 合肥市| 沂水县| 荔浦县| 丹江口市| 拜泉县| 长阳| 周口市| 民勤县| 乌拉特前旗| 安达市| 赣榆县|