官术网_书友最值得收藏!

How it works...

We start by reading in our dataset, consisting of historical and continuing missile experiments in North Korea. We aim to predict the type of missile based on remaining features, such as facility and time of launch. This concludes step 1. In step 2, we apply scikit-learn's train_test_split method to subdivide X and y into a training set, X_train and y_train, and also a testing set, X_test and y_test. The test_size = 0.2 parameter means that the testing set consists of 20% of the original data, while the remainder is placed in the training set. The random_state parameter allows us to reproduce the same randomly generated split. Next, concerning step 3, it is important to note that, in applications, we often want to compare several different models. The danger of using the testing set to select the best model is that we may end up overfitting the testing set. This is similar to the statistical sin of data fishing. In order to combat this danger, we create an additional dataset, called the validation set. We train our models on the training set, use the validation set to compare them, and finally use the testing set to obtain an accurate indicator of the performance of the model we have chosen. So, in step 3, we choose our parameters so that, mathematically speaking, the end result consists of a training set of 60% of the original dataset, a validation set of 20%, and a testing set of 20%. Finally, we double-check our assumptions by employing the len function to compute the length of the arrays (step 4).

主站蜘蛛池模板: 巨鹿县| 长子县| 汶上县| 格尔木市| 林甸县| 巴林右旗| 沽源县| 肃南| 美姑县| 兴业县| 泰宁县| 徐汇区| 大田县| 镇沅| 罗平县| 从化市| 顺义区| 富民县| 肇州县| 双牌县| 武义县| 宁强县| 长岭县| 石景山区| 建德市| 福贡县| 三河市| 平舆县| 黔江区| 横山县| 民勤县| 巴楚县| 大名县| 新泰市| 张家港市| 辽源市| 山西省| 海晏县| 大同市| 墨玉县| 财经|