官术网_书友最值得收藏!

  • Mastering MongoDB 3.x
  • Alex Giamas
  • 468字
  • 2021-08-20 10:10:56

Troubleshooting MapReduce

Throughout the years, one of the major shortcomings of MapReduce frameworks has been the inherent difficulty in troubleshooting as opposed to simpler non-distributed patterns. Most of the time, the most effective tool is debugging using log statements to verify that output values match our expected values. In the mongo shell, this being a JavaScript shell, this is as simple as outputting using the console.log() function.

Diving deeper into MapReduce in MongoDB we can debug both in the map and the reduce phase by overloading the output values.

Debugging the mapper phase, we can overload the emit() function to test what the output key values are:

> var emit = function(key, value) {
print("debugging mapper's emit");
print("key: " + key + " value: " + tojson(value));
}

We can then call it manually on a single document to verify that we get back the key-value pair that we would expect:

> var myDoc = db.orders.findOne( { _id: ObjectId("50a8240b927d5d8b5891743c") } );
> mapper.apply(myDoc);

The reducer function is somewhat more complicated. A MapReduce reducer function must meet the following criteria:

  • It must be idempotent
  • The order of values coming from the mapper function should not matter for the reducer's result
  • The reduce function must return the same type of result as the mapper function

We will dissect these following requirements to understand what they really mean:

  • It must be idempotent: MapReduce by design may call the reducer multiple times for the same key with multiple values from the mapper phase. It also doesn't need to reduce single instances of a key as it's just added to the set. The final value should be the same no matter the order of execution. This can be verified by writing our own "verifier" function forcing the reducer to re-reduce or by executing the reducer many, many times:
reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )
  • It must be commutative: Again, because multiple invocations of the reducer may happen for the same key, if it has multiple values, the following should hold:
reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )
  • The order of values coming from the mapper function should not matter for the reducer's result: We can test that the order of values from the mapper doesn't change the output for the reducer by passing in documents to the mapper in a different order and verifying that we get the same results out:
reduce( key, [ A, B ] ) == reduce( key, [ B, A ] )
  • The reduce function must return the same type of result as the mapper function: Hand-in-hand with the first requirement, the type of object that the reduce function returns should be the same as the output of the mapper function.
主站蜘蛛池模板: 阜康市| 深泽县| 乌拉特后旗| 九龙县| 华宁县| 金溪县| 灯塔市| 徐水县| 东乡族自治县| 洛扎县| 崇明县| 武穴市| 广平县| 肇庆市| 崇左市| 五寨县| 醴陵市| 眉山市| 固镇县| 新丰县| 永德县| 晋中市| 讷河市| 清水县| 嫩江县| 静海县| 绍兴县| 泌阳县| 防城港市| 日土县| 永平县| 南乐县| 祥云县| 古丈县| 定南县| 稻城县| 沾化县| 商洛市| 吴旗县| 福鼎市| 新闻|