- Machine Learning for OpenCV
- Michael Beyeler
- 113字
- 2021-07-02 19:47:25
Splitting the data into training and test sets
We learned in the previous chapter that it is essential to keep training and test data separate. We can easily split the data using one of scikit-learn's many helper functions:
In [11]: X_train, X_test, y_train, y_test = model_selection.train_test_split(
... data, target, test_size=0.1, random_state=42
... )
Here we want to split the data into 90 percent training data and 10 percent test data, which we specify with test_size=0.1. By inspecting the return arguments, we note that we ended up with exactly 90 training data points and 10 test data points:
In [12]: X_train.shape, y_train.shape
Out[12]: ((90, 4), (90,))
In [13]: X_test.shape, y_test.shape
Out[13]: ((10, 4), (10,))
推薦閱讀