官术网_书友最值得收藏!

Hadoop-specific data types

Up to this point we've glossed over the actual data types used as the input and output of the map and reduce classes. Let's take a look at them now.

The Writable and WritableComparable interfaces

If you browse the Hadoop API for the org.apache.hadoop.io package, you'll see some familiar classes such as Text and IntWritable along with others with the Writable suffix.

This package also contains the Writable interface specified as follows:

import java.io.DataInput ;
import java.io.DataOutput ;
import java.io.IOException ;

public interface Writable
{
void write(DataOutput out) throws IOException ;
void readFields(DataInput in) throws IOException ;
}

The main purpose of this interface is to provide mechanisms for the serialization and deserialization of data as it is passed across the network or read and written from the disk. Every data type to be used as a value input or output from a mapper or reducer (that is, V1, V2, or V3) must implement this interface.

Data to be used as keys (K1, K2, K3) has a stricter requirement: in addition to Writable,it must also provide an implementation of the standard Java Comparable interface. This has the following specifications:

public interface Comparable
{
public int compareTO( Object obj) ;
}

The compare method returns -1, 0, or 1 depending on whether the compared object is less than, equal to, or greater than the current object.

As a convenience interface, Hadoop provides the WritableComparable interface in the org.apache.hadoop.io package.

public interface WritableComparable extends Writable, Comparable
{}

Introducing the wrapper classes

Fortunately, you don't have to start from scratch; as you've already seen, Hadoop provides classes that wrap the Java primitive types and implement WritableComparable. They are provided in the org.apache.hadoop.io package.

Primitive wrapper classes

These classes are conceptually similar to the primitive wrapper classes, such as Integer and Long found in java.lang. They hold a single primitive value that can be set either at construction or via a setter method.

  • BooleanWritable
  • ByteWritable
  • DoubleWritable
  • FloatWritable
  • IntWritable
  • LongWritable
  • VIntWritable – a variable length integer type
  • VLongWritable – a variable length long type

Array wrapper classes

These classes provide writable wrappers for arrays of other Writable objects. For example, an instance of either could hold an array of IntWritable or DoubleWritable but not arrays of the raw int or float types. A specific subclass for the required Writable class will be required. They are as follows:

  • ArrayWritable
  • TwoDArrayWritable

Map wrapper classes

These classes allow implementations of the java.util.Map interface to be used as keys or values. Note that they are defined as Map<Writable, Writable> and effectively manage a degree of internal-runtime-type checking. This does mean that compile type checking is weakened, so be careful.

  • AbstractMapWritable: This is a base class for other concrete Writable map implementations
  • MapWritable: This is a general purpose map mapping Writable keys to Writable values
  • SortedMapWritable: This is a specialization of the MapWritable class that also implements the SortedMap interface
主站蜘蛛池模板: 南宫市| 夏邑县| 西宁市| 乐山市| 大足县| 新丰县| 商丘市| 连江县| 蓬莱市| 体育| 盐津县| 读书| 中牟县| 新密市| 临漳县| 登封市| 南郑县| 石狮市| 洞口县| 卢龙县| 北碚区| 巩留县| 鹤岗市| 池州市| 桐梓县| 栾城县| 桂平市| 聂拉木县| 阜平县| 锦州市| 永修县| 罗平县| 临潭县| 庆安县| 石狮市| 临江市| 高邮市| 禹城市| 五华县| 古丈县| 十堰市|