官术网_书友最值得收藏!

SQL Server and machine learning

SQL Server has integrated machine learning with the version of 2016, when the first R services were introduced and with the 2017 version, when Python language support was added to SQL Server, and the feature was renamed machine-learning services. Machine-learning services can be used to solve complex problems and the tasks and algorithms are chosen based on the data and the expected prediction.

There are two main flavors of machine learning:

  • Supervised
  • Unsupervised

The main difference between the two is that supervised learning is performed by having a ground truth available. This means that the predictive model is building the capabilities based on the prior knowledge of what the output values for a sample should be. Unsupervised learning, on the other hand, does not have any sort of this outputs and the goal is to find natural structures present in the datasets. This can mean grouping datapoints into clusters or finding different ways of looking at complex data so that it appears simpler or more organized.

Each type of machine learning is using different algorithms to solve the problem. Typical supervised learning algorithms would include the following:

  • Classification
  • Regression

When the data is being used to predict a category, supervised learning is also called classification. This is the case when assigning an image as a picture of either a car or a boat, for example. When there are only two options, it's called two-class classification. When there are more categories, such as when predicting the winner of the World Cup, this problem is known as multi-class classification.

The major algorithms used here would include the following:

  • Logistic regression
  • Decision tree/forest/jungle
  • Support vector machine
  • Bayes point machine

Regression algorithms can be used for both types of variables, continuous and discrete, to predict a value, where for continuous values you'll apply Linear Regression and on discrete values, Logistic Regression. In linear regression, the relationship between the input variable (x) and output variable (y) is expressed as an equation of the form y = a + bx. Thus, the goal of linear regression is to find out the values of coefficients a and b and fit a line that is nearest to most of the points. Some good rules of thumb when using this technique are to remove variables that are very similar (correlated) and to remove noise from your data, if possible. It is a fast and simple technique and a good first algorithm to try. 

Logistic regression is best suited for binary classification (datasets where y = 0 or 1, where 1 denotes the default class. So, if you're predicting whether an event will occur, the instance of event occurring will be classified as 1. It measures the relationship between the categorical dependent variable and one or more independent variables by estimating probabilities using a logistic function.

There are numerous kinds of regressions available, depending on the Python or R package that is used; for example, we could use any of the following:

  • Bayesian linear regression
  • Linear regression
  • Poisson regression
  • Decision forest regression
  • Many others

We'll focus on implementing the machine-learning algorithms and models in the following chapters with SQL Server Machine Learning Services to train and predict the values from the dataset.

主站蜘蛛池模板: 邯郸县| 张家界市| 安庆市| 星座| 岚皋县| 淅川县| 应城市| 广丰县| 福建省| 安平县| 乌拉特前旗| 彭山县| 青铜峡市| 襄垣县| 密云县| 思茅市| 渝中区| 高要市| 汉中市| 沂水县| 濮阳市| 三河市| 晋城| 邻水| 曲周县| 缙云县| 延长县| 阳西县| 清水县| 巴中市| 杂多县| 循化| 张家界市| 太白县| 上饶县| 镇雄县| 永和县| 钟祥市| 鸡东县| 牟定县| 阿拉尔市|