官术网_书友最值得收藏!

Time for action – checking the prerequisites

Hadoop is written in Java, so you will need a recent Java Development Kit (JDK) installed on the Ubuntu host. Perform the following steps to check the prerequisites:

  1. First, check what's already available by opening up a terminal and typing the following:
    $ javac
    $ java -version
    
  2. If either of these commands gives a no such file or directory or similar error, or if the latter mentions "Open JDK", it's likely you need to download the full JDK. Grab this from the Oracle download page at http://www.oracle.com/technetwork/java/javase/downloads/index.html; you should get the latest release.
  3. Once Java is installed, add the JDK/bin directory to your path and set the JAVA_HOME environment variable with commands such as the following, modified for your specific Java version:
    $ export JAVA_HOME=/opt/jdk1.6.0_24
    $ export PATH=$JAVA_HOME/bin:${PATH}
    

What just happened?

These steps ensure the right version of Java is installed and available from the command line without having to use lengthy pathnames to refer to the install location.

Remember that the preceding commands only affect the currently running shell and the settings will be lost after you log out, close the shell, or reboot. To ensure the same setup is always available, you can add these to the startup files for your shell of choice, within the .bash_profile file for the BASH shell or the .cshrc file for TCSH, for example.

An alternative favored by me is to put all required configuration settings into a standalone file and then explicitly call this from the command line; for example:

$ source Hadoop_config.sh

This technique allows you to keep multiple setup files in the same account without making the shell startup overly complex; not to mention, the required configurations for several applications may actually be incompatible. Just remember to begin by loading the file at the start of each session!

Setting up Hadoop

One of the most confusing aspects of Hadoop to a newcomer is its various components, projects, sub-projects, and their interrelationships. The fact that these have evolved over time hasn't made the task of understanding it all any easier. For now, though, go to http://hadoop.apache.org and you'll see that there are three prominent projects mentioned:

  • Common
  • HDFS
  • MapReduce

The last two of these should be familiar from the explanation in Chapter 1, What It's All About, and common projects comprise a set of libraries and tools that help the Hadoop product work in the real world. For now, the important thing is that the standard Hadoop distribution bundles the latest versions all of three of these projects and the combination is what you need to get going.

A note on versions

Hadoop underwent a major change in the transition from the 0.19 to the 0.20 versions, most notably with a migration to a set of new APIs used to develop MapReduce applications. We will be primarily using the new APIs in this book, though we do include a few examples of the older API in later chapters as not of all the existing features have been ported to the new API.

Hadoop versioning also became complicated when the 0.20 branch was renamed to 1.0. The 0.22 and 0.23 branches remained, and in fact included features not included in the 1.0 branch. At the time of this writing, things were becoming clearer with 1.1 and 2.0 branches being used for future development releases. As most existing systems and third-party tools are built against the 0.20 branch, we will use Hadoop 1.0 for the examples in this book.

主站蜘蛛池模板: 嘉黎县| 洪洞县| 睢宁县| 霍林郭勒市| 张家口市| 宜宾市| 德化县| 雷州市| 玛多县| 盘锦市| 福安市| 诏安县| 湛江市| 高雄县| 东阿县| 蓬莱市| 顺昌县| 博罗县| 新宁县| 民权县| 高密市| 卢氏县| 池州市| 永吉县| 松原市| 永泰县| 韶关市| 黄石市| 宝清县| 尉犁县| 庆云县| 镇赉县| 定襄县| 甘泉县| 吉安县| 忻州市| 永顺县| 抚顺市| 桂林市| 沈阳市| 平安县|