官术网_书友最值得收藏!

Validating/testing

Software engineers are familiar with testing and debugging software source code, but how should ML models be tested? Pieces of algorithms and data input/output routines can be unit tested, but often it is unclear how to ensure that the ML model itself, which presents as a black box, is correct.

The first step to ensuring correctness and sufficient accuracy of an ML model is validation. This means applying the model to predict or classify the validation data subset, and measuring the resulting accuracy against project objectives. Because the training data subset was already seen by the algorithm, it cannot be used to validate correctness, as the model could suffer from poor generalizability (also known as overfitting). To take a nonsensical example, imagine an ML model that consists of a hash map that memorizes each input sample and maps it to the corresponding training output sample. The model would have 100% accuracy on a training data subset, which was previously memorized, but very low accuracy on any data subset, and therefore it would not solve the problem it was intended for. Validation tests against this phenomenon.

In addition, it is a good idea to validate model outputs against user acceptance criteria. For example, if building a recommender system for TV series, you may wish to ensure that the recommendations made to children are never rated PG-13 or higher. Rather than trying to encode this into the model, which will have a non-zero failure rate, it is better to push this constraint into the application itself, because the cost of not enforcing it would be too high. Such constraints and business rules should be captured at the start of the project.

主站蜘蛛池模板: 樟树市| 南雄市| 当阳市| 青川县| 界首市| 阿巴嘎旗| 石柱| 南城县| 吉木乃县| 百色市| 姜堰市| 尼勒克县| 忻城县| 扶沟县| 广平县| 红安县| 石渠县| 长丰县| 惠东县| 河津市| 西华县| 织金县| 蓬莱市| 清镇市| 通海县| 库尔勒市| 泗阳县| 遂平县| 江北区| 洛南县| 麻江县| 额尔古纳市| 临朐县| 茌平县| 沙坪坝区| 湾仔区| 剑河县| 泸西县| 永年县| 承德市| 区。|