- Hadoop Real-World Solutions Cookbook(Second Edition)
- Tanmay Deshpande
- 237字
- 2021-07-09 20:02:47
Introduction
Hadoop has been the primary platform for many people who deal with big data problems. It is the heart of big data. Hadoop was developed way back between 2003 and 2004 when Google published research papers on Google File System (GFS) and Map Reduce. Hadoop was structured around the crux of these research papers, and thus derived its shape. With the advancement of the Internet and social media, people slowly started realizing the power that Hadoop had, and it soon became the top platform used to handle big data. With a lot of hard work from dedicated contributors and open source groups to the project, Hadoop 1.0 was released and the IT industry welcomed it with open arms.
A lot of companies started using Hadoop as the primary platform for their Data Warehousing and Extract-Transform-Load (ETL) needs. They started deploying thousands of nodes in a Hadoop cluster and realized that there were scalability issues beyond the 4000+ node clusters that were already present. This was because JobTracker was not able to handle that many Task Trackers, and there was also the need for high availability in order to make sure that clusters were reliable to use. This gave birth to Hadoop 2.0.
In this introductory chapter, we are going to learn interesting recipes such as installing a single/multi-node Hadoop 2.0 cluster, its benchmarking, adding new nodes to existing clusters, and so on. So, let's get started.
- 32位嵌入式系統與SoC設計導論
- Getting Started with Clickteam Fusion
- 圖解PLC控制系統梯形圖和語句表
- VMware Performance and Capacity Management(Second Edition)
- Moodle Course Design Best Practices
- Kubernetes for Developers
- LAMP網站開發黃金組合Linux+Apache+MySQL+PHP
- Hands-On Reactive Programming with Reactor
- Statistics for Data Science
- Cloud Security Automation
- R Data Analysis Projects
- 從零開始學SQL Server
- 精通ROS機器人編程(原書第2版)
- 計算機應用基礎學習指導與練習(Windows XP+Office 2003)
- PostgreSQL High Performance Cookbook