官术网_书友最值得收藏!

Learning by example – the linear regression model

Imagine that you and a friend own a small ice cream shop. You are discussing how many kilograms (kg) of ice cream to produce each day and you both agree on the fact that the hotter the weather is, the more ice cream will be sold. You add that this is not the only factor to take into account, but there are other variables that can also affect the number of sales. As rational people and good analysts, you decide to run a small experiment by recording the mean temperature during the shop opening hours and the amount of ice cream that is sold. The summer turns out to be particularly rainy, and the temperature variation is high, which helps you to achieve a good range for the variables. The final dataset looks like the following table:

Your model states that the amount of ice cream sold is (directly) proportional to the mean temperature. In order to test this hypothesis, we can make a scatter plot of the collected data:

  1. Select the full range of cells containing the table, click on Insert menu, and select Charts:

  1. Now, click on Scatter, as follows:

After writing the names of the axis titles, you should get a chart that is similar to the following chart:

We see that there is indeed a linear correlation and that it is positive (the larger the temperature value, the more ice cream you sell). We can then represent the model using a linear equation, as follows:

IC = a * T + b (1)

Here, IC is the amount of ice cream sold, T is the mean temperature, and a and b are constant values to be calculated by a linear regression.

To obtain the values of a and b, we can use Excel's Analysis ToolPak data analysis add-in. If you have not enabled it, refer link https://support.office.com/en-ie/article/use-the-analysis-toolpak-to-perform-complex-data-analysis-6c67ccf0-f4a9-487c-8dec-bdb5a2cefab6 for instructions on how to do it.

  1. Select the data range in your worksheet, go to Data in the main menu and then select Data Analysis:
  1. In the pop-up menu, select Regression and click on OK:
  1. Make sure that the x and y ranges are correct (x is temperature and y is ice cream amount). Select Line Fit Plots to see the regression line on top of the data points in a new diagram:

Looking at the output, we see that the line that best fits the data can be written as follows:

IC = 1.5* T + 6 (2)

There is a standard error for a of ±0.2, and for b of ±4. The R2 value is 0.78, which means that the fit is not very good and only 78% of the variation in ice cream sales can be explained by the mean temperature. So, you and your friend were both right!

The following diagram shows the fitted line:

It is clear that the line represents the data quite well, but some points are a little bit off, showing that you need to take other factors into consideration when predicting ice cream consumption. In any case, given the mean forecasted temperature for one day, you can use equation (2) to have a rough estimation of how much ice cream to produce to cover the possible demand.

Keep the rest of the linear regression results to hand, as we are going to use some of them in the following sections.

主站蜘蛛池模板: 盐亭县| 来宾市| 抚顺市| 建始县| 济南市| 秦皇岛市| 贵南县| 永仁县| 日喀则市| 邵东县| 图们市| 壶关县| 晴隆县| 济南市| 金沙县| 夏邑县| 印江| 邹平县| 西吉县| 章丘市| 富阳市| 汕头市| 工布江达县| 莱芜市| 大安市| 祁东县| 从化市| 米易县| 清涧县| 克什克腾旗| 织金县| 井研县| 樟树市| 丹巴县| 龙州县| 偃师市| 和平县| 汕头市| 长岭县| 鹤峰县| 乐亭县|