- Mastering MongoDB 3.x
- Alex Giamas
- 468字
- 2021-08-20 10:10:56
Troubleshooting MapReduce
Throughout the years, one of the major shortcomings of MapReduce frameworks has been the inherent difficulty in troubleshooting as opposed to simpler non-distributed patterns. Most of the time, the most effective tool is debugging using log statements to verify that output values match our expected values. In the mongo shell, this being a JavaScript shell, this is as simple as outputting using the console.log() function.
Diving deeper into MapReduce in MongoDB we can debug both in the map and the reduce phase by overloading the output values.
Debugging the mapper phase, we can overload the emit() function to test what the output key values are:
> var emit = function(key, value) {
print("debugging mapper's emit");
print("key: " + key + " value: " + tojson(value));
}
We can then call it manually on a single document to verify that we get back the key-value pair that we would expect:
> var myDoc = db.orders.findOne( { _id: ObjectId("50a8240b927d5d8b5891743c") } );
> mapper.apply(myDoc);
The reducer function is somewhat more complicated. A MapReduce reducer function must meet the following criteria:
- It must be idempotent
- The order of values coming from the mapper function should not matter for the reducer's result
- The reduce function must return the same type of result as the mapper function
We will dissect these following requirements to understand what they really mean:
- It must be idempotent: MapReduce by design may call the reducer multiple times for the same key with multiple values from the mapper phase. It also doesn't need to reduce single instances of a key as it's just added to the set. The final value should be the same no matter the order of execution. This can be verified by writing our own "verifier" function forcing the reducer to re-reduce or by executing the reducer many, many times:
reduce( key, [ reduce(key, valuesArray) ] ) == reduce( key, valuesArray )
- It must be commutative: Again, because multiple invocations of the reducer may happen for the same key, if it has multiple values, the following should hold:
reduce(key, [ C, reduce(key, [ A, B ]) ] ) == reduce( key, [ C, A, B ] )
- The order of values coming from the mapper function should not matter for the reducer's result: We can test that the order of values from the mapper doesn't change the output for the reducer by passing in documents to the mapper in a different order and verifying that we get the same results out:
reduce( key, [ A, B ] ) == reduce( key, [ B, A ] )
- The reduce function must return the same type of result as the mapper function: Hand-in-hand with the first requirement, the type of object that the reduce function returns should be the same as the output of the mapper function.