官术网_书友最值得收藏!

Assessment

When a data scientist evaluates a model or data science process for performance, this is referred to as assessment. Performance can be defined in several ways, including the model's growth of learning or the model's ability to improve (with) learning (to obtain a better score) with additional experience (for example, more rounds of training with additional samples of data) or accuracy of its results.

One popular method of assessing a model or processes performance is called bootstrap sampling. This method examines performance on certain subsets of data, repeatedly generating results that can be used to calculate an estimate of accuracy (performance).

The bootstrap sampling method takes a random sample of data, splits it into three files--a training file, a testing file, and a validation file. The model or process logic is developed based on the data in the training file and then evaluated (or tested) using the testing file. This tune and then test process is repeated until the data scientist is comfortable with the results of the tests. At that point, the model or process is again tested, this time using the validation file, and the results should provide a true indication of how it will perform.

You can imagine using the bootstrap sampling method to develop program logic by analyzing test data to determine logic flows and then running (or testing) your logic against the test data file. Once you are satisfied that your logic handles all of the conditions and exceptions found in your testing data, you can run a final test on a new, never-before-seen data file for a final validation test.
主站蜘蛛池模板: 广灵县| 佛教| 新巴尔虎右旗| 科技| 张掖市| 浦北县| 江都市| 原平市| 大足县| 扎兰屯市| 西和县| 绥滨县| 盈江县| 隆林| 万州区| 巴里| 含山县| 河池市| 安图县| 塔河县| 上蔡县| 永胜县| 呼图壁县| 灌云县| 临颍县| 洛隆县| 云林县| 客服| 乌审旗| 儋州市| 布拖县| 镇康县| 嵩明县| 汉中市| 墨竹工卡县| 宾川县| 乐安县| 厦门市| 高尔夫| 璧山县| 固镇县|