官术网_书友最值得收藏!

Supervised learning algorithms

There are a lot of algorithms at our disposal for supervised learning. We choose the algorithm based on the task and the data we have at our disposal. If we don't have much data and there is already some knowledge around our problem, deep learning is probably not the best approach to start with. We should rather try simpler algorithms and come up with relevant features based on the knowledge we have.

Starting simple is always a good practice; for example, for categorization, a good starting point can be a decision tree. A simple decision tree algorithm that is difficult to overfit is random forest. It also gives good results out of the box. For regression problems, linear regression is still very popular, especially in domains, where it's necessary to justify the decision taken. For other problems, such as recommender systems, a good starting point can be Matrix Factorization. Each domain has a standard algorithm that is better to start with.

A simple example of a task could be to predict the price of a house for sale, given the location and some information about the house. This is a regression problem, and there are a set of algorithms in scikit-learn that can perform the task. If we want to use a liner regression, we can do the following:

from sklearn.datasets.california_housing import fetch_california_housing
from sklearn.linear_model import LinearRegression

# Using a standard dataset that we can find in scikit-learn
cal_house = fetch_california_housing()

cal_house_X_train
= cal_house.data[:-20] cal_house_X_test = cal_house.data[-20:] # Split the targets into training/testing sets cal_house_y_train = cal_house.target[:-20] cal_house_y_test = cal_house.target[-20:] # Create linear regression object regr = linear_model.LinearRegression() # Train the model using the training sets regr.fit(cal_house_X_train, cal_house_y_train)
# Calculating the predictions
predictions = regr.predict(cal_house_X_test)
# Calculating the loss
print('MSE: {:.2f}'.format(mean_squared_error(cal_house_y_test, predictions)))

It's possible to run the file after activated our virtual environment (or conda environment) and saved the code in a file named house_LR.py. Then from where you placed your file run the following command line:

 python house_LR.py

The interesting part about NNs is that they can be used instead of any of the tasks mentioned previously, provided that enough data is available. Moreover, when a neural network is trained it means that we have a way to do feature engineering, and part of the network itself can be used to do the feature engineering for similar tasks. This method is called transfer learning (TL), and we will dedicate a chapter to it later on.

主站蜘蛛池模板: 齐河县| 康保县| 宁晋县| 南开区| 鄂伦春自治旗| 河池市| 枣庄市| 德江县| 中牟县| 高安市| 吉林省| 阳城县| 武强县| 隆德县| 昆明市| 垦利县| 鄂托克前旗| 鹤山市| 古丈县| 崇礼县| 宁乡县| 德化县| 台中县| 辰溪县| 门头沟区| 万州区| 垣曲县| 东安县| 灌云县| 广安市| 泊头市| 贵港市| 福安市| 会宁县| 平阴县| 通州区| 醴陵市| 思南县| 南宁市| 上饶县| 汕头市|