官术网_书友最值得收藏!

Chapter 3. Advanced Pig

Running Java MapReduce jobs on Hadoop provides the most flexibility with the least abstraction. However, abstractions are necessary to infer patterns, accomplish common data manipulation tasks, reduce complexity, and flatten the learning curve. Pig is a platform that provides a framework and high-level abstractions to build MapReduce programs for Hadoop. It has a scripting language called Pig Latin. Pig Latin can be compared to SQL in terms of operator capabilities.

Developed at Yahoo! around the year 2006, Pig was used as a framework to specify ad hoc MapReduce workflows. In the following year, it was moved to Apache Software Foundation. The latest release of Pig is 0.12.1.

Tip

The official release of Pig is currently incompatible with Hadoop 2.2.0. It expects libraries from Hadoop 1.2.1. Running any Pig script fails, with the following exception:

Unexpected System Error Occured: java.lang.IncompatibleClassChangeError: Found interface org.apache.hadoop.mapreduce.JobContext, but class was expected.

Fixing this requires a recompile of the Pig binaries. Run the following command and replace the newly generated pig.jar and pig-withouthadoop.jar files:

ant clean jar-all -Dhadoopversion=23

In this chapter, we will look at the advanced features of Pig by:

  • Looking at how Pig is different when compared to SQL
  • Analyzing how Pig Latin scripts are translated to MapReduce programs
  • Delving into the advanced relational operators that Pig supports; we will delve deep into these relational operators and look at their applications with examples
  • Studying ways to extend Pig beyond its off-the-shelf capabilities using User-defined Functions or UDFs that can implement a variety of interfaces; we will examine some of these interfaces
主站蜘蛛池模板: 讷河市| 宝鸡市| 邵东县| 金堂县| 清苑县| 亳州市| 孙吴县| 沙田区| 巴塘县| 灵石县| 江山市| 定襄县| 通州区| 广元市| 天等县| 茶陵县| 新安县| 叶城县| 自治县| 新蔡县| 兴宁市| 三门峡市| 沙湾县| 平泉县| 诸城市| 丹寨县| 辉南县| 武山县| 克拉玛依市| 五华县| 宿松县| 安福县| 大港区| 柳江县| 正镶白旗| 宜州市| 米泉市| 恩平市| 静乐县| 赤壁市| 裕民县|