- Mastering Hadoop
- Sandeep Karanth
- 475字
- 2021-08-06 19:53:01
MapReduce job counters
Counters are entities that can collect statistics at a job level. They can help in quality control, performance monitoring, and problem identification in Hadoop MapReduce jobs. Since they are global in nature, unlike logs, they need not be aggregated to be analyzed. Counters are grouped into logical groups using the CounterGroup
class. There are sets of built-in counters for each MapReduce job.
The following example illustrates the creation of simple custom counters to categorize lines into lines having zero words, lines with less than or equal to five words, and lines with more than five words. The program when run on the grant proposal subset files gives the following output:
14/04/13 23:27:00 INFO mapreduce.Job: Counters: 23 File System Counters FILE: Number of bytes read=446021466 FILE: Number of bytes written=114627807 FILE: Number of read operations=0 FILE: Number of large read operations=0 FILE: Number of write operations=0 HDFS: Number of bytes read=535015319 HDFS: Number of bytes written=52267476 HDFS: Number of read operations=391608 HDFS: Number of large read operations=0 HDFS: Number of write operations=195363 Map-Reduce Framework Map input records=27862 Map output records=27862 Input split bytes=56007 Spilled Records=0 Failed Shuffles=0 Merged Map outputs=0 GC time elapsed (ms)=66 Total committed heap usage (bytes)=62037426176 MasteringHadoop.MasteringHadoopCounters$WORDS_IN_LINE_COUNTER LESS_THAN_FIVE_WORDS=8449 MORE_THAN_FIVE_WORDS=19413 ZERO_WORDS=6766 File Input Format Counters Bytes Read=1817707 File Output Format Counters Bytes Written=189102
The first step in creating a counter is to define a Java enum with the names of the counters. The enum
type name is the counter group as shown in the following snippet:
public static enum WORDS_IN_LINE_COUNTER{ ZERO_WORDS, LESS_THAN_FIVE_WORDS, MORE_THAN_FIVE_WORDS };
When a condition is encountered to increment the counter, it can be retrieved by passing the name of the counter to the getCounter()
call in the context object of the task. Counters support an increment()
method call to globally increment the value of the counter.
The getCounter()
method in the context has a couple of other overloads. It can be used to create a dynamic counter by specifying a group and counter name at runtime.
The Mapper
class, as given in the following code snippet, illustrates incrementing the WORDS_IN_LINE_COUNTER
group counters based on the number of words in each sentence of a grant proposal:
public static class MasteringHadoopCountersMap extends Mapper<LongWritable, Text, LongWritable, IntWritable> { private IntWritable countOfWords = new IntWritable(0); @Override protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException { StringTokenizer tokenizer = new StringTokenizer(value.toString()); int words = tokenizer.countTokens(); if(words == 0) context.getCounter(WORDS_IN_LINE_COUNTER.ZERO_WORDS).increment(1); if(words > 0 && words <= 5) context.getCounter(WORDS_IN_LINE_COUNTER.LESS_THAN_FIVE_WORDS).increment(1); else context.getCounter(WORDS_IN_LINE_COUNTER.MORE_THAN_FIVE_WORDS).increment(1); countOfWords.set(words); context.write(key, countOfWords); } }
Counters are global variables in a distributed setting and have to be used prudently. The higher the number of counters, the more are the overheads on the framework that keeps track of them. Counters should not be used to aggregate very fine-grained statistics of an application.
- 樂高機器人:WeDo編程與搭建指南
- Dreamweaver CS3+Flash CS3+Fireworks CS3創意網站構建實例詳解
- PostgreSQL 11 Server Side Programming Quick Start Guide
- SCRATCH與機器人
- Linux Mint System Administrator’s Beginner's Guide
- 運動控制器與交流伺服系統的調試和應用
- 21天學通C語言
- 高維聚類知識發現關鍵技術研究及應用
- Grome Terrain Modeling with Ogre3D,UDK,and Unity3D
- 從零開始學SQL Server
- PowerMill 2020五軸數控加工編程應用實例
- 計算智能算法及其生產調度應用
- Hands-On Artificial Intelligence for Beginners
- Apache Spark Machine Learning Blueprints
- 數據庫技術:Access 2003計算機網絡技術