官术网_书友最值得收藏!

Bucketization

Bucketing input data is an important concept to understand in ML. Set with a key parameter at the job level called bucket_span, the input data from the datafeed (described next) is collected into mini batches for processing. Think of the bucket span as a pre-analysis aggregation interval—the window of time in which a portion of the data is aggregated over for the purposes of analysis. The shorter the duration of the bucket_span, the more granular the analysis, but also the higher the potential for noisy artifacts in the data.

The following graph shows the same dataset aggregated over three different intervals:

Aggregations of the same data over three different time intervals

Notice that the prominent anomalous spike seen in the version aggregated over the 5-minute interval becomes all but lost if the data is aggregated over a 60-minute interval due to the fact of the spike's short (<2 minute) duration. In fact, at this 60-minute interval, the spike doesn't even seem that anomalous anymore.

This is a practical consideration for the choice of bucket_span. On one hand, having a shorter aggregation period is helpful because it will increase the frequency of the analysis (and thus reduce the interval of notification on if there is something anomalous), but making it too short may highlight features in the data that you don't really care about. If the brief spike that's shown in the preceding data is a meaningful anomaly for you, then the 5-minute view of the data is sufficient. If, however, a perturbation of the data that's very brief seems like an unnecessary distraction, then avoid a low value of bucket_span.

Some additional practical considerations can be found on Elastic's blog: https://www.elastic.co/blog/explaining-the-bucket-span-in-machine-learning-for-elasticsearch.

主站蜘蛛池模板: 抚州市| 友谊县| 清远市| 青海省| 烟台市| 周宁县| 德安县| 苍南县| 塔河县| 虎林市| 米林县| 屏南县| 石嘴山市| 甘谷县| 衢州市| 天等县| 闻喜县| 旺苍县| 彰武县| 本溪市| 安吉县| 瑞丽市| 吴江市| 葫芦岛市| 且末县| 小金县| 双鸭山市| 米易县| 缙云县| 无为县| 老河口市| 大埔县| 南漳县| 隆安县| 昌邑市| 中方县| 孙吴县| 山丹县| 壤塘县| 讷河市| 萨迦县|