- Machine Learning Quick Reference
- Rahul Kumar
- 298字
- 2021-08-20 10:05:07
K-fold cross-validation
Let's walk through the steps of k-fold cross-validation:
- The data is divided into k-subsets.
- One set is kept for testing/development and the model is built on the rest of the data (k-1). That is, the rest of the data forms the training data.
- Step 2 is repeated k-times. That is, once the preceding step has been performed, we move on to the second set and it forms a test set. The rest of the (k-1) data is then available for building the model:

4. An error is calculated and an average is taken over all k-trials.
Every subset gets one chance to be a validation/test set since most of the data is used as a training set. This helps in reducing bias. At the same time, almost all the data is being used as validation set, which reduces variance.
As shown in the preceding diagram, k = 5 has been selected. This means that we have to divide the whole dataset into five subsets. In the first iteration, subset 5 becomes the test data and the rest becomes the training data. Likewise, in the second iteration, subset 4 turns into the test data and the rest becomes the training data. This goes on for five iterations.
Now, let's try to do this in Python by splitting the train and test data using the K neighbors classifier:
from sklearn.datasets import load_breast_cancer # importing the dataset
from sklearn.cross_validation import train_test_split,cross_val_score # it will help in splitting train & test
from sklearn.neighbors import KNeighborsClassifier
from sklearn import metrics
BC =load_breast_cancer()
X = BC.data
y = BC.target
X_train, X_test, y_train, y_test = train_test_split(X, y, random_state=4)
knn = KNeighborsClassifier(n_neighbors=5)
knn.fit(X_train, y_train)
y_pred = knn.predict(X_test)
print(metrics.accuracy_score(y_test, y_pred))
knn = KNeighborsClassifier(n_neighbors=5)
scores = cross_val_score(knn, X, y, cv=10, scoring='accuracy')
print(scores)
print(scores.mean())
- 工業(yè)機(jī)器人虛擬仿真實(shí)例教程:KUKA.Sim Pro(全彩版)
- Dreamweaver CS3+Flash CS3+Fireworks CS3創(chuàng)意網(wǎng)站構(gòu)建實(shí)例詳解
- Introduction to DevOps with Kubernetes
- 腦動力:C語言函數(shù)速查效率手冊
- Hadoop 2.x Administration Cookbook
- 讓每張照片都成為佳作的Photoshop后期技法
- 自動生產(chǎn)線的拆裝與調(diào)試
- WordPress Theme Development Beginner's Guide(Third Edition)
- 21天學(xué)通Java Web開發(fā)
- DevOps:Continuous Delivery,Integration,and Deployment with DevOps
- Blender Compositing and Post Processing
- 四向穿梭式自動化密集倉儲系統(tǒng)的設(shè)計與控制
- JSP從入門到精通
- Storm應(yīng)用實(shí)踐:實(shí)時事務(wù)處理之策略
- 電氣控制與PLC技術(shù)應(yīng)用