官术网_书友最值得收藏!

Model monitoring and feedback

It is critically important to monitor the performance of our machine learning system in production. Once we deploy our optimal-trained model, we wish to understand how it is doing in the "wild". Is it performing as we expect on new, unseen data? Is its accuracy good enough? The reality is, regardless of how much model selection and tuning we try to do in the earlier phases, the only way to measure true performance is to observe what happens in our production system.

In addition to the batch mode model creation, there are also models built with Spark streaming which are real-time in nature.

Also, bear in mind that model accuracy and predictive performance is only one aspect of a real-world system. Usually, we are concerned with other metrics related to business performance (for example, revenue and profitability) or user experience (such as the time spent on our site and how active our users are overall). In most cases, we cannot easily map model-predictive performance to these business metrics. The accuracy of a recommendation or targeting system might be important, but it relates only indirectly to the true metrics we are concerned about, namely, whether we are improving user experience, activity, and ultimately, revenue.

So, in real-world systems, we should monitor both model-accuracy metrics as well as business metrics. If possible, we should be able to experiment with different models running in production to allow us to optimize against these business metrics by making changes to the models. This is often done using live split tests. However, doing this correctly is not an easy task, and live testing and experimentation is expensive, in the sense that mistakes, poor performance, and using baseline models (they provide a control against which we test our production models) can negatively impact user experience and revenue.

Another important aspect of this phase is model feedback. This is the process where the predictions of our model feed through into user behavior; this, in turn, feeds through into our model. In a real-world system, our models are essentially influencing their own future training data by impacting decision-making and potential user behavior.

For example, if we have deployed a recommendation system, then, by making recommendations, we might be influencing user behavior because we are only allowing users a limited selection of choices. We hope that this selection is relevant for our model; however, this feedback loop, in turn, can influence our model's training data. This, in turn, feeds back into real-world performance. It is possible to get into an ever-narrowing feedback loop; ultimately, this can negatively affect both model accuracy and our important business metrics.

Fortunately, there are mechanisms by which we can try to limit the potential negative impact of this feedback loop. These include providing some unbiased training data by having a small portion of data coming from users who are not exposed to our models or by being principled in the way we balance exploration, to learn more about our data, and exploitation, to use what we have learned to improve our system's performance.

We will briefly cover in Chapter 11, Real-time Machine Learning with Spark Streaming.

主站蜘蛛池模板: 大石桥市| 株洲县| 黑水县| 抚宁县| 遂宁市| 宁夏| 钦州市| 高清| 仁化县| 乌什县| 临城县| 镇雄县| 定南县| 灵璧县| 常宁市| 宣城市| 清丰县| 永济市| 呼图壁县| 枣强县| 普定县| 云梦县| 广灵县| 赤水市| 恩施市| 泊头市| 比如县| 江川县| 电白县| 南康市| 昌吉市| 永兴县| 涞水县| 龙山县| 聂拉木县| 蒙城县| 淅川县| 白河县| 平阳县| 汝城县| 江孜县|