- Hands-On Machine Learning with Microsoft Excel 2019
- Julio Cesar Rodriguez Martino
- 573字
- 2021-06-24 15:10:57
Learning by example – the linear regression model
Imagine that you and a friend own a small ice cream shop. You are discussing how many kilograms (kg) of ice cream to produce each day and you both agree on the fact that the hotter the weather is, the more ice cream will be sold. You add that this is not the only factor to take into account, but there are other variables that can also affect the number of sales. As rational people and good analysts, you decide to run a small experiment by recording the mean temperature during the shop opening hours and the amount of ice cream that is sold. The summer turns out to be particularly rainy, and the temperature variation is high, which helps you to achieve a good range for the variables. The final dataset looks like the following table:

Your model states that the amount of ice cream sold is (directly) proportional to the mean temperature. In order to test this hypothesis, we can make a scatter plot of the collected data:
- Select the full range of cells containing the table, click on Insert menu, and select Charts:

- Now, click on Scatter, as follows:

After writing the names of the axis titles, you should get a chart that is similar to the following chart:

We see that there is indeed a linear correlation and that it is positive (the larger the temperature value, the more ice cream you sell). We can then represent the model using a linear equation, as follows:
IC = a * T + b (1)
Here, IC is the amount of ice cream sold, T is the mean temperature, and a and b are constant values to be calculated by a linear regression.
To obtain the values of a and b, we can use Excel's Analysis ToolPak data analysis add-in. If you have not enabled it, refer link https://support.office.com/en-ie/article/use-the-analysis-toolpak-to-perform-complex-data-analysis-6c67ccf0-f4a9-487c-8dec-bdb5a2cefab6 for instructions on how to do it.
- Select the data range in your worksheet, go to Data in the main menu and then select Data Analysis:

- In the pop-up menu, select Regression and click on OK:

- Make sure that the x and y ranges are correct (x is temperature and y is ice cream amount). Select Line Fit Plots to see the regression line on top of the data points in a new diagram:

Looking at the output, we see that the line that best fits the data can be written as follows:
IC = 1.5* T + 6 (2)
There is a standard error for a of ±0.2, and for b of ±4. The R2 value is 0.78, which means that the fit is not very good and only 78% of the variation in ice cream sales can be explained by the mean temperature. So, you and your friend were both right!
The following diagram shows the fitted line:

It is clear that the line represents the data quite well, but some points are a little bit off, showing that you need to take other factors into consideration when predicting ice cream consumption. In any case, given the mean forecasted temperature for one day, you can use equation (2) to have a rough estimation of how much ice cream to produce to cover the possible demand.
Keep the rest of the linear regression results to hand, as we are going to use some of them in the following sections.
- Word 2010中文版完全自學(xué)手冊(cè)
- MySQL從入門到精通(第3版)
- Oracle高性能SQL引擎剖析:SQL優(yōu)化與調(diào)優(yōu)機(jī)制詳解
- 計(jì)算機(jī)視覺
- Mastering ROS for Robotics Programming(Second Edition)
- Oracle 11g+ASP.NET數(shù)據(jù)庫系統(tǒng)開發(fā)案例教程
- 數(shù)據(jù)分析思維:產(chǎn)品經(jīng)理的成長(zhǎng)筆記
- 精通Neo4j
- C# 7 and .NET Core 2.0 High Performance
- 工業(yè)大數(shù)據(jù)分析實(shí)踐
- 敏捷數(shù)據(jù)分析工具箱:深入解析ADW+OAC
- 信息技術(shù)導(dǎo)論
- 創(chuàng)新求索錄:第二集(精裝版)
- 21天學(xué)通Oracle(第2版)
- SQL面試寶典:圖解數(shù)據(jù)庫求職題