官术网_书友最值得收藏!

Assessment

When a data scientist evaluates a model or data science process for performance, this is referred to as assessment. Performance can be defined in several ways, including the model's growth of learning or the model's ability to improve (with) learning (to obtain a better score) with additional experience (for example, more rounds of training with additional samples of data) or accuracy of its results.

One popular method of assessing a model or processes performance is called bootstrap sampling. This method examines performance on certain subsets of data, repeatedly generating results that can be used to calculate an estimate of accuracy (performance).

The bootstrap sampling method takes a random sample of data, splits it into three files--a training file, a testing file, and a validation file. The model or process logic is developed based on the data in the training file and then evaluated (or tested) using the testing file. This tune and then test process is repeated until the data scientist is comfortable with the results of the tests. At that point, the model or process is again tested, this time using the validation file, and the results should provide a true indication of how it will perform.

You can imagine using the bootstrap sampling method to develop program logic by analyzing test data to determine logic flows and then running (or testing) your logic against the test data file. Once you are satisfied that your logic handles all of the conditions and exceptions found in your testing data, you can run a final test on a new, never-before-seen data file for a final validation test.
主站蜘蛛池模板: 肥乡县| 普兰店市| 神农架林区| 闵行区| 满洲里市| 剑阁县| 梅河口市| 万宁市| 铁岭县| 青州市| 白水县| 霸州市| 宜黄县| 镇平县| 廉江市| 竹溪县| 青河县| 会理县| 浏阳市| 滕州市| 扬州市| 洮南市| 蕲春县| 监利县| 鄂托克前旗| 蒙自县| 呼玛县| 钟山县| 云龙县| 饶河县| 涿州市| 祥云县| 西城区| 三都| 休宁县| 谢通门县| 台东市| 新巴尔虎左旗| 宜阳县| 奈曼旗| 香格里拉县|