官术网_书友最值得收藏!

How it works...

We start by reading in our dataset, consisting of historical and continuing missile experiments in North Korea. We aim to predict the type of missile based on remaining features, such as facility and time of launch. This concludes step 1. In step 2, we apply scikit-learn's train_test_split method to subdivide X and y into a training set, X_train and y_train, and also a testing set, X_test and y_test. The test_size = 0.2 parameter means that the testing set consists of 20% of the original data, while the remainder is placed in the training set. The random_state parameter allows us to reproduce the same randomly generated split. Next, concerning step 3, it is important to note that, in applications, we often want to compare several different models. The danger of using the testing set to select the best model is that we may end up overfitting the testing set. This is similar to the statistical sin of data fishing. In order to combat this danger, we create an additional dataset, called the validation set. We train our models on the training set, use the validation set to compare them, and finally use the testing set to obtain an accurate indicator of the performance of the model we have chosen. So, in step 3, we choose our parameters so that, mathematically speaking, the end result consists of a training set of 60% of the original dataset, a validation set of 20%, and a testing set of 20%. Finally, we double-check our assumptions by employing the len function to compute the length of the arrays (step 4).

主站蜘蛛池模板: 同德县| 潮州市| 本溪| 嘉荫县| 会东县| 长乐市| 玛曲县| 乌兰浩特市| 鄂伦春自治旗| 东丽区| 毕节市| 宣武区| 衡山县| 兴文县| 舞阳县| 扶余县| 平阴县| 祁连县| 仁布县| 辽阳县| 兴仁县| 民乐县| 长沙县| 宁津县| 蒙阴县| 财经| 冀州市| 南宫市| 肃北| 鹤山市| 铜鼓县| 西贡区| 宁强县| 泗水县| 曲沃县| 绵竹市| 丹寨县| 海丰县| 涿鹿县| 宾阳县| 黎平县|