- Machine Learning for Cybersecurity Cookbook
- Emmanuel Tsukerman
- 170字
- 2021-06-24 12:28:55
How to do it...
The following steps demonstrate how to take a dataset, consisting of features X and labels y, and split these into a training and testing subset:
- Start by importing the train_test_split module and the pandas library, and read your features into X and labels into y:
from sklearn.model_selection import train_test_split
import pandas as pd
df = pd.read_csv("north_korea_missile_test_database.csv")
y = df["Missile Name"]
X = df.drop("Missile Name", axis=1)
- Next, randomly split the dataset and its labels into a training set consisting 80% of the size of the original dataset and a testing set 20% of the size:
X_train, X_test, y_train, y_test = train_test_split(
X, y, test_size=0.2, random_state=31
)
- We apply the train_test_split method once more, to obtain a validation set, X_val and y_val:
X_train, X_val, y_train, y_val = train_test_split(
X_train, y_train, test_size=0.25, random_state=31
)
- We end up with a training set that's 60% of the size of the original data, a validation set of 20%, and a testing set of 20%.
The following screenshot shows the output:

推薦閱讀
- Clojure Data Analysis Cookbook
- 32位嵌入式系統(tǒng)與SoC設(shè)計(jì)導(dǎo)論
- Photoshop CS4經(jīng)典380例
- 21天學(xué)通ASP.NET
- 完全掌握AutoCAD 2008中文版:綜合篇
- 傳感器與新聞
- 電氣控制與PLC技術(shù)應(yīng)用
- 運(yùn)動(dòng)控制系統(tǒng)
- SMS 2003部署與操作深入指南
- Salesforce Advanced Administrator Certification Guide
- 筆記本電腦電路分析與故障診斷
- Dreamweaver+Photoshop+Flash+Fireworks網(wǎng)站建設(shè)與網(wǎng)頁(yè)設(shè)計(jì)完全實(shí)用
- 運(yùn)動(dòng)控制系統(tǒng)(第2版)
- 步步驚“芯”
- 常用傳感器技術(shù)及應(yīng)用(第2版)