書名： Machine Learning with Swift
作者名： Alexander Sosnovshchenko
本章字數： 144字
更新時間： 2021-06-24 18:54:56

Splitting the data

Finally, we want to split our data into training and test sets. We will train our classifier only on the training set, so it will never see the test set until we want to evaluate its performance. This is a very important step, because as we will see in the future, the quality of predictions on the test set can differ dramatically from the quality measured on the training set. Data splitting is an operation specific to machine learning tasks, so we will import scikit-learn (a machine learning package) and use some functions from it:

In []: 
from sklearn.model_selection import train_test_split 
X_train, X_test, y_train, y_test = train_test_split(features, labels, test_size=0.3, random_state=42) 
X_train.shape, y_train.shape, X_test.shape, y_test.shape 
Out[]: 
 ((700, 6), (700,), (300, 6), (300,))

Now we have 700 training samples with 6 features each, and 300 test samples with the same number of features.

官术网_书友最值得收藏!

Machine Learning with Swift

Splitting the data