官术网_书友最值得收藏!

LR for predicting insurance severity claims

As you have already seen, the loss to be predicted contains continuous values, that is, it will be a regression task. So in using regression analysis here, the goal is to predict a continuous target variable, whereas another area called classification predicts a label from a finite set.

Logistic regression (LR) belongs to the family of regression algorithms. The goal of regression is to find relationships and dependencies between variables. It models the relationship between a continuous scalar dependent variable y (that is, label or target) and one or more (a D-dimensional vector) explanatory variable (also independent variables, input variables, features, observed data, observations, attributes, dimensions, and data points) denoted as x using a linear function:

Figure 9: A regression graph separates data points (in red dots) and the blue line is regression

LR models the relationship between a dependent variable y, which involves a linear combination of interdependent variables xi. The letters A and B represent constants that describe the y axis intercept and the slope of the line respectively:

y = A+Bx

Figure 9, Regression graph separates data points (in red dots) and the blue line is regression shows an example of simple LR with one independent variable—that is, a set of data points and a best fit line, which is the result of the regression analysis itself. It can be observed that the line does not actually pass through all of the points.

The distance between any data points (measured) and the line (predicted) is called the regression error. Smaller errors contribute to more accurate results in predicting unknown values. When the errors are reduced to their smallest levels possible, the line of best fit is created for the final regression error. Note that there are no single metrics in terms of regression errors; there are several as follows:

  • Mean Squared Error (MSE): It is a measure of how close a fitted line is to data points. The smaller the MSE, the closer the fit is to the data.
  • Root Mean Squared Error (RMSE): It is the square root of the MSE but probably the most easily interpreted statistic, since it has the same units as the quantity plotted on the vertical axis.
  • R-squared: R-squared is a statistical measure of how close the data is to the fitted regression line. R-squared is always between 0 and 100%. The higher the R-squared, the better the model fits your data.
  • Mean Absolute Error (MAE): MAE measures the average magnitude of the errors in a set of predictions without considering their direction. It's the average over the test sample of the absolute differences between prediction and actual observation where all individual differences have equal weight.
  • Explained variance: In statistics, explained variation measures the proportion to which a mathematical model accounts for the variation of a given dataset.
主站蜘蛛池模板: 湘潭市| 万宁市| 洱源县| 广河县| 漠河县| 黄梅县| 策勒县| 嘉荫县| 临泉县| 天台县| 揭阳市| 华坪县| 奇台县| 万全县| 余姚市| 霍林郭勒市| 营山县| 蓬安县| 武定县| 乌鲁木齐市| 得荣县| 柞水县| 商水县| 霸州市| 阳山县| 通榆县| 山东| 潼南县| 安吉县| 定结县| 宁海县| 留坝县| 峡江县| 洛南县| 连平县| 包头市| 荣成市| 琼结县| 门源| 新干县| 柘荣县|