- Statistics for Machine Learning
- Pratap Dangeti
- 223字
- 2021-07-02 19:05:55
Train and test data
In practice, data usually will be split randomly 70-30 or 80-20 into train and test datasets respectively in statistical modeling, in which training data utilized for building the model and its effectiveness will be checked on test data:

In the following code, we split the original data into train and test data by 70 percent - 30 percent. An important point to consider here is that we set the seed values for random numbers in order to repeat the random sampling every time we create the same observations in training and testing data. Repeatability is very much needed in order to reproduce the results:
# Train & Test split >>> import pandas as pd >>> from sklearn.model_selection import train_test_split >>> original_data = pd.read_csv("mtcars.csv")
In the following code, train size is 0.7, which means 70 percent of the data should be split into the training dataset and the remaining 30% should be in the testing dataset. Random state is seed in this process of generating pseudo-random numbers, which makes the results reproducible by splitting the exact same observations while running every time:
>>> train_data,test_data = train_test_split(original_data,train_size = 0.7,random_state=42)
The R code for the train and test split for statistical modeling is as follows:
full_data = read.csv("mtcars.csv",header=TRUE) set.seed(123) numrow = nrow(full_data) trnind = sample(1:numrow,size = as.integer(0.7*numrow)) train_data = full_data[trnind,] test_data = full_data[-trnind,]
- Python爬蟲開發:從入門到實戰(微課版)
- Magento 2 Theme Design(Second Edition)
- Android 9 Development Cookbook(Third Edition)
- 實用防銹油配方與制備200例
- 信息安全技術
- Xamarin.Forms Projects
- 大學計算機基礎實驗指導
- Creating Stunning Dashboards with QlikView
- 基于ARM Cortex-M4F內核的MSP432 MCU開發實踐
- Advanced Express Web Application Development
- Hands-On Nuxt.js Web Development
- LabVIEW數據采集
- 從零開始構建深度前饋神經網絡:Python+TensorFlow 2.x
- Java從入門到精通(視頻實戰版)
- Building Microservices with .NET Core 2.0(Second Edition)