- Hadoop Beginner's Guide
- Garry Turkington
- 507字
- 2021-07-29 16:51:40
Time for action – fixing WordCount to work with a combiner
Let's make the necessary modifications to WordCount to correctly use a combiner.
Copy WordCount2.java
to a new file called WordCount3.java
and change the reduce
method as follows:
public void reduce(Text key, Iterable<IntWritable> values, Context context) throws IOException, InterruptedException { int total = 0 ; for (IntWritable val : values)) { total+= val.get() ; } context.write(key, new IntWritable(total)); }
Remember to also change the class name to WordCount3
and then compile, create the JAR file, and run the job as before.
What just happened?
The output is now as expected. Any map-side invocations of the combiner performs successfully and the reducer correctly produces the overall output value.
Tip
Would this have worked if the original reducer was used as the combiner and the new reduce implementation as the reducer? The answer is no, though our test example would not have demonstrated it. Because the combiner may be invoked multiple times on the map output data, the same errors would arise in the map output if the dataset was large enough, but didn't occur here due to the small input size. Fundamentally, the original reducer was incorrect, but this wasn't immediately obvious; watch out for such subtle logic flaws. This sort of issue can be really hard to debug as the code will reliably work on a development box with a subset of the data set and fail on the much larger operational cluster. Carefully craft your combiner classes and never rely on testing that only processes a small sample of the data.
Reuse is your friend
In the previous section we took the existing job class file and made changes to it. This is a small example of a very common Hadoop development workflow; use an existing job file as the starting point for a new one. Even if the actual mapper and reducer logic is very different, it's often a timesaver to take an existing working job as this helps you remember all the required elements of the mapper, reducer, and driver implementations.
Pop quiz – MapReduce mechanics
Q1. What do you always have to specify for a MapReduce job?
- The classes for the mapper and reducer.
- The classes for the mapper, reducer, and combiner.
- The classes for the mapper, reducer, partitioner, and combiner.
- None; all classes have default implementations.
Q2. How many times will a combiner be executed?
- At least once.
- Zero or one times.
- Zero, one, or many times.
- It's configurable.
Q3. You have a mapper that for each key produces an integer value and the following set of reduce operations:
- Reducer A: outputs the sum of the set of integer values.
- Reducer B: outputs the maximum of the set of values.
- Reducer C: outputs the mean of the set of values.
- Reducer D: outputs the difference between the largest and smallest values in the set.
Which of these reduce operations could safely be used as a combiner?
- All of them.
- A and B.
- A, B, and D.
- C and D.
- None of them.
- 21天學通PHP
- 大數據專業英語
- 3D Printing with RepRap Cookbook
- PIC單片機C語言非常入門與視頻演練
- Learning Apache Cassandra(Second Edition)
- JMAG電機電磁仿真分析與實例解析
- Python Algorithmic Trading Cookbook
- 智能工業報警系統
- Windows內核原理與實現
- PVCBOT機器人控制技術入門
- Building a BeagleBone Black Super Cluster
- Photoshop CS5圖像處理入門、進階與提高
- Cortex-M3嵌入式處理器原理與應用
- 樂高創意機器人教程(中級 上冊 10~16歲) (青少年iCAN+創新創意實踐指導叢書)
- PostgreSQL 10 High Performance