官术网_书友最值得收藏!

Introduction

Hadoop has been the primary platform for many people who deal with big data problems. It is the heart of big data. Hadoop was developed way back between 2003 and 2004 when Google published research papers on Google File System (GFS) and Map Reduce. Hadoop was structured around the crux of these research papers, and thus derived its shape. With the advancement of the Internet and social media, people slowly started realizing the power that Hadoop had, and it soon became the top platform used to handle big data. With a lot of hard work from dedicated contributors and open source groups to the project, Hadoop 1.0 was released and the IT industry welcomed it with open arms.

A lot of companies started using Hadoop as the primary platform for their Data Warehousing and Extract-Transform-Load (ETL) needs. They started deploying thousands of nodes in a Hadoop cluster and realized that there were scalability issues beyond the 4000+ node clusters that were already present. This was because JobTracker was not able to handle that many Task Trackers, and there was also the need for high availability in order to make sure that clusters were reliable to use. This gave birth to Hadoop 2.0.

In this introductory chapter, we are going to learn interesting recipes such as installing a single/multi-node Hadoop 2.0 cluster, its benchmarking, adding new nodes to existing clusters, and so on. So, let's get started.

主站蜘蛛池模板: 孟州市| 仙居县| 长武县| 大埔县| 壶关县| 武穴市| 吉木萨尔县| 海南省| 英山县| 汉寿县| 越西县| 宁强县| 五寨县| 广宗县| 平安县| 临洮县| 贡嘎县| 固始县| 镶黄旗| 疏附县| 德格县| 大邑县| 南川市| 如东县| 德化县| 郎溪县| 潞城市| 融水| 增城市| 富锦市| 工布江达县| 察隅县| 临朐县| 霸州市| 沾益县| 信丰县| 文安县| 宁陵县| 定结县| 桃源县| 神木县|