官术网_书友最值得收藏!

Summary

In this chapter, we saw the evolution of Hadoop and some of its milestones and releases. We went into depth on Hadoop 2.X and the changes it brings into Hadoop. The key takeaways from this chapter are:

  • MapReduce was born out of the necessity to gather, process, and index data at web scale. Apache Hadoop is an open source distribution of the MapReduce computational model.
  • In over 6 years of its existence, Hadoop has become the number one choice as a framework for massively parallel and distributed computing. The community has been shaping Hadoop to gear up for enterprise use. In 1.X releases, HDFS append and security, were the key features that made Hadoop enterprise-friendly.
  • MapReduce supports a limited set of use cases. Onboarding other paradigms into Hadoop enables support for a wider range of analytics and can also increase cluster resource utilization. In Hadoop 2.X, the JobTracker functions are separated and YARN handles cluster resource management and scheduling. MapReduce is one of the applications that can run on YARN.
  • Hadoop's storage layer was enhanced in 2.X to separate the filesystem from the block storage service. This enables features such as supporting multiple namespaces and integration with other filesystems. 2.X shows improvements in Hadoop storage availability and snapshotting.
  • Distributions of Hadoop provide enterprise-grade management software, tools, support, training, and services. Most distributions shadow Apache Hadoop in their capabilities.

MapReduce is still an integral part of Hadoop's DNA. In the next chapter, we will explore MapReduce optimizations and best practices.

主站蜘蛛池模板: 奉贤区| 长岭县| 达孜县| 廊坊市| 阳信县| 新巴尔虎左旗| 定远县| 射阳县| 泸水县| 五寨县| 冷水江市| 卢龙县| 盈江县| 紫云| 清远市| 财经| 年辖:市辖区| 衡阳市| 西平县| 荃湾区| 高碑店市| 山西省| 鄄城县| 南召县| 临泽县| 鄂尔多斯市| 武定县| 庆阳市| 皋兰县| 玛多县| 梧州市| 清新县| 临邑县| 黑河市| 遵义县| 遂川县| 阳曲县| 阿瓦提县| 合作市| 德保县| 栖霞市|