官术网_书友最值得收藏!

Saving compressed data in HDFS

In this recipe, we are going to take a look at how to store and process compressed data in HDFS.

Getting ready

To perform this recipe, you should already have a running Hadoop.

How to do it...

It's always good to use compression while storing data in HDFS. HDFS supports various types of compression algorithms such as LZO, BIZ2, Snappy, GZIP, and so on. Every algorithm has its own pros and cons when you consider the time taken to compress and decompress and the space efficiency. These days people prefer Snappy compression as it aims to achieve a very high speed and a reasonable amount of compression.

We can easily store and process any number of files in HDFS. To store compressed data, we don't need to specifically make any changes to the Hadoop cluster. You can simply copy the compressed data in the same way it's in HDFS. Here is an example of this:

hadoop fs -mkdir /compressed
hadoop fs –put file.bz2 /compressed

Now, we'll run a sample program to take a look at how Hadoop automatically uncompresses the file and processes it:

hadoop jar /usr/local/hadoop/share/hadoop/mapreduce/hadoop-mapreduce-examples-2.7.0.jar wordcount /compressed /compressed_out

Once the job is complete, you can verify the output.

How it works...

Hadoop explores native libraries to find the support needed for various codecs and their implementations. Native libraries are specific to the platform that you run Hadoop on. You don't need to make any configuration changes to enable compression algorithms. As mentioned earlier, Hadoop supports various compression algorithms that are already familiar to the computer world. Based on your needs and requirements (more space or more time), you can choose your compression algorithm.

Take a look at http://comphadoop.weebly.com/ for more information on this.

Tip

Downloading the example code

You can download the example code files for this book from your account at http://www.packtpub.com. If you purchased this book elsewhere, you can visit http://www.packtpub.com/support and register to have the files e-mailed directly to you.

You can download the code files by following these steps:

  • Log in or register to our website using your e-mail address and password
  • Hover the mouse pointer on the SUPPORT tab at the top.
  • Click on Code Downloads & Errata
  • Enter the name of the book in the Search box
  • Select the book for which you're looking to download the code files
  • Choose from the drop-down menu where you purchased this book from
  • Click on Code Download

Once the file is downloaded, please make sure that you unzip or extract the folder using the latest version of:

  • WinRAR / 7-Zip for Windows
  • Zipeg / iZip / UnRarX for Mac
  • 7-Zip / PeaZip for Linux
主站蜘蛛池模板: 紫云| 和政县| 靖边县| 新兴县| 成都市| 巧家县| 砚山县| 大竹县| 澄迈县| 娄烦县| 凤阳县| 鄯善县| 唐河县| 永仁县| 镇雄县| 皋兰县| 嘉义县| 海丰县| 宁乡县| 巨野县| 大冶市| 淄博市| 漯河市| 永顺县| 社旗县| 遂川县| 通河县| 北川| 万山特区| 丹江口市| 容城县| 会理县| 青河县| 贵定县| 苏尼特右旗| 延庆县| 贵港市| 繁峙县| 隆回县| 黄龙县| 哈尔滨市|