官术网_书友最值得收藏!

How to do it...

The following steps demonstrate how to take a dataset, consisting of features X and labels y, and split these into a training and testing subset:

  1. Start by importing the train_test_split module and the pandas library, and read your features into X and labels into y:
from sklearn.model_selection import train_test_split
import pandas as pd

df = pd.read_csv("north_korea_missile_test_database.csv")
y = df["Missile Name"]
X = df.drop("Missile Name", axis=1)
  1. Next, randomly split the dataset and its labels into a training set consisting 80% of the size of the original dataset and a testing set 20% of the size:
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=31
)
  1.  We apply the train_test_split method once more, to obtain a validation set, X_val and y_val:
X_train, X_val, y_train, y_val = train_test_split(
X_train, y_train, test_size=0.25, random_state=31
)
  1. We end up with a training set that's 60% of the size of the original data, a validation set of 20%, and a testing set of 20%.

The following screenshot shows the output:

主站蜘蛛池模板: 政和县| 香格里拉县| 仙桃市| 东丰县| 淮阳县| 朝阳县| 巴彦县| 滦平县| 顺平县| 霍州市| 碌曲县| 延长县| 克什克腾旗| 凤台县| 抚宁县| 拉孜县| 钟祥市| 买车| 东兰县| 赤峰市| 高阳县| 漯河市| 前郭尔| 定南县| 洛扎县| 阿荣旗| 当涂县| 乐陵市| 家居| 建瓯市| 平南县| 务川| 太湖县| 鹤壁市| 开封市| 崇阳县| 常宁市| 白朗县| 莎车县| 中阳县| 石河子市|