- Hadoop Beginner's Guide
- Garry Turkington
- 266字
- 2021-07-29 16:51:40
Time for action – WordCount the easy way
Let's revisit WordCount, but this time use some of these predefined map
and reduce
implementations:
- Create a new
WordCountPredefined.java
file containing the following code:import org.apache.hadoop.conf.Configuration ; import org.apache.hadoop.fs.Path; import org.apache.hadoop.io.IntWritable; import org.apache.hadoop.io.Text; import org.apache.hadoop.mapreduce.Job; import org.apache.hadoop.mapreduce.lib.input.FileInputFormat; import org.apache.hadoop.mapreduce.lib.output.FileOutputFormat; import org.apache.hadoop.mapreduce.lib.map.TokenCounterMapper ; import org.apache.hadoop.mapreduce.lib.reduce.IntSumReducer ; public class WordCountPredefined { public static void main(String[] args) throws Exception { Configuration conf = new Configuration(); Job job = new Job(conf, "word count1"); job.setJarByClass(WordCountPredefined.class); job.setMapperClass(TokenCounterMapper.class); job.setReducerClass(IntSumReducer.class); job.setOutputKeyClass(Text.class); job.setOutputValueClass(IntWritable.class); FileInputFormat.addInputPath(job, new Path(args[0])); FileOutputFormat.setOutputPath(job, new Path(args[1])); System.exit(job.waitForCompletion(true) ? 0 : 1); } }
- Now compile, create the JAR file, and run it as before.
- Don't forget to delete the output directory before running the job, if you want to use the same location. Use the
hadoop fs -rmr
output, for example.
What just happened?
Given the ubiquity of WordCount as an example in the MapReduce world, it's perhaps not entirely surprising that there are predefined Mapper
and Reducer
implementations that together realize the entire WordCount solution. The TokenCounterMapper
class simply breaks each input line into a series of (token, 1)
pairs and the IntSumReducer
class provides a final count by summing the number of values for each key.
There are two important things to appreciate here:
- Though WordCount was doubtless an inspiration for these implementations, they are in no way specific to it and can be widely applicable
- This model of having reusable mapper and reducer implementations is one thing to remember, especially in combination with the fact that often the best starting point for a new MapReduce job implementation is an existing one
推薦閱讀
- 電氣自動化專業英語(第3版)
- 中文版Photoshop CS5數碼照片處理完全自學一本通
- 工業機器人產品應用實戰
- MCSA Windows Server 2016 Certification Guide:Exam 70-741
- 數據挖掘實用案例分析
- 大數據挑戰與NoSQL數據庫技術
- Photoshop CS3特效處理融會貫通
- 機器學習流水線實戰
- Spark大數據技術與應用
- Implementing Oracle API Platform Cloud Service
- Apache Superset Quick Start Guide
- 在實戰中成長:C++開發之路
- 未來學徒:讀懂人工智能飛馳時代
- 案例解說Delphi典型控制應用
- 計算機應用基礎實訓·職業模塊