官术网_书友最值得收藏!

Splitting the data into training and test sets

We learned in the previous chapter that it is essential to keep training and test data separate. We can easily split the data using one of scikit-learn's many helper functions:

In [11]: X_train, X_test, y_train, y_test = model_selection.train_test_split(
... data, target, test_size=0.1, random_state=42
... )

Here we want to split the data into 90 percent training data and 10 percent test data, which we specify with test_size=0.1. By inspecting the return arguments, we note that we ended up with exactly 90 training data points and 10 test data points:

In [12]: X_train.shape, y_train.shape
Out[12]: ((90, 4), (90,))
In [13]: X_test.shape, y_test.shape
Out[13]: ((10, 4), (10,))
主站蜘蛛池模板: 寿宁县| 长白| 明溪县| 张家口市| 百色市| 玉田县| 安丘市| 马龙县| 舟山市| 长武县| 屯留县| 肃北| 呼伦贝尔市| 万全县| 津市市| 申扎县| 吴堡县| 贡觉县| 遵义县| 开平市| 石台县| 平乡县| 长阳| 张家界市| 天等县| 繁昌县| 安仁县| 封丘县| 凌海市| 广河县| 五华县| 武乡县| 阳东县| 平谷区| 屏山县| 榆社县| 当阳市| 东方市| 南投县| 武城县| 延长县|