官术网_书友最值得收藏!

Splitting the data into training and test sets

We learned in the previous chapter that it is essential to keep training and test data separate. We can easily split the data using one of scikit-learn's many helper functions:

In [11]: X_train, X_test, y_train, y_test = model_selection.train_test_split(
... data, target, test_size=0.1, random_state=42
... )

Here we want to split the data into 90 percent training data and 10 percent test data, which we specify with test_size=0.1. By inspecting the return arguments, we note that we ended up with exactly 90 training data points and 10 test data points:

In [12]: X_train.shape, y_train.shape
Out[12]: ((90, 4), (90,))
In [13]: X_test.shape, y_test.shape
Out[13]: ((10, 4), (10,))
主站蜘蛛池模板: 泰安市| 托克逊县| 巴马| 迁西县| 兴国县| 夏津县| 五台县| 安康市| 武宁县| 太谷县| 称多县| 冷水江市| 大同市| 三原县| 绥江县| 抚顺县| 贵定县| 萝北县| 河津市| 莒南县| 藁城市| 泽州县| 治县。| 卢龙县| 阿尔山市| 彭山县| 大竹县| 改则县| 寻甸| 铅山县| 龙陵县| 延边| 清徐县| 开原市| 永康市| 余江县| 彭阳县| 河西区| 邢台市| 抚顺市| 乐东|