Planning and Setting Up Hadoop Clusters

In the last chapter, we looked at big data problems, the history of Hadoop, along with an overview of big data, Hadoop architecture, and commercial offerings. This chapter will focus on hands-on, practical knowledge of how to set up Hadoop in different configurations. Apache Hadoop can be set up in the following three different configurations:

Developer mode: Developer mode can be used to run programs in a standalone manner. This arrangement does not require any Hadoop process daemons, and jars can run directly. This mode is useful if developers wish to debug their code on MapReduce.
Pseudo cluster (single node Hadoop): A pseudo cluster is a single node cluster that has similar capabilities to that of a standard cluster; it is also used for the development and testing of programs before they are deployed on a production cluster. Pseudo clusters provide an independent environment for all developers for coding and testing.
Cluster mode: This mode is the real Hadoop cluster where you will set up multiple nodes of Hadoop across your production environment. You should use it to solve all of your big data problems.

This chapter will focus on setting up a new Hadoop cluster. The standard cluster is the one used in the production, as well as the staging, environment. It can also be scaled down and used for development in many cases to ensure that programs can run across clusters, handle fail-over, and so on. In this chapter, we will cover the following topics:

Prerequisites for Hadoop
Running Hadoop in development mode
Setting up a pseudo Hadoop custer
Sizing the cluster

Setting up Hadoop in cluster mode
Diagnosing the Hadoop cluster

官术网_书友最值得收藏!

Apache Hadoop 3 Quick Start Guide

Planning and Setting Up Hadoop Clusters