官术网_书友最值得收藏!

Splitting the data into training and test sets

We learned in the previous chapter that it is essential to keep training and test data separate. We can easily split the data using one of scikit-learn's many helper functions:

In [11]: X_train, X_test, y_train, y_test = model_selection.train_test_split(
... data, target, test_size=0.1, random_state=42
... )

Here we want to split the data into 90 percent training data and 10 percent test data, which we specify with test_size=0.1. By inspecting the return arguments, we note that we ended up with exactly 90 training data points and 10 test data points:

In [12]: X_train.shape, y_train.shape
Out[12]: ((90, 4), (90,))
In [13]: X_test.shape, y_test.shape
Out[13]: ((10, 4), (10,))
主站蜘蛛池模板: 始兴县| 鄂伦春自治旗| 增城市| 鹤庆县| 高雄市| 阿巴嘎旗| 济南市| 于都县| 临清市| 武宣县| 驻马店市| 休宁县| 常州市| 东丰县| 浦东新区| 海伦市| 高阳县| 腾冲县| 大名县| 托克逊县| 云龙县| 宁阳县| 收藏| 台山市| 屏边| 灵山县| 仙居县| 梁平县| 临沭县| 大竹县| 安国市| 漳平市| 阿克苏市| 美姑县| 任丘市| 满洲里市| 乐平市| 烟台市| 眉山市| 夏津县| 本溪市|