官术网_书友最值得收藏!

  • Mastering Hadoop
  • Sandeep Karanth
  • 250字
  • 2021-08-06 19:52:59

Summary

In this chapter, we saw the evolution of Hadoop and some of its milestones and releases. We went into depth on Hadoop 2.X and the changes it brings into Hadoop. The key takeaways from this chapter are:

  • MapReduce was born out of the necessity to gather, process, and index data at web scale. Apache Hadoop is an open source distribution of the MapReduce computational model.
  • In over 6 years of its existence, Hadoop has become the number one choice as a framework for massively parallel and distributed computing. The community has been shaping Hadoop to gear up for enterprise use. In 1.X releases, HDFS append and security, were the key features that made Hadoop enterprise-friendly.
  • MapReduce supports a limited set of use cases. Onboarding other paradigms into Hadoop enables support for a wider range of analytics and can also increase cluster resource utilization. In Hadoop 2.X, the JobTracker functions are separated and YARN handles cluster resource management and scheduling. MapReduce is one of the applications that can run on YARN.
  • Hadoop's storage layer was enhanced in 2.X to separate the filesystem from the block storage service. This enables features such as supporting multiple namespaces and integration with other filesystems. 2.X shows improvements in Hadoop storage availability and snapshotting.
  • Distributions of Hadoop provide enterprise-grade management software, tools, support, training, and services. Most distributions shadow Apache Hadoop in their capabilities.

MapReduce is still an integral part of Hadoop's DNA. In the next chapter, we will explore MapReduce optimizations and best practices.

主站蜘蛛池模板: 宜昌市| 张家界市| 观塘区| 八宿县| 舒城县| 昌宁县| 高陵县| 谢通门县| 汉沽区| 昌邑市| 六盘水市| 邮箱| 宁津县| 梧州市| 庄浪县| 华亭县| 隆化县| 安庆市| 桑植县| 南康市| 响水县| 万山特区| 石门县| 福州市| 襄城县| 鄯善县| 襄樊市| 锡林浩特市| 旌德县| 靖远县| 上思县| 新绛县| 榆林市| 庆云县| 吉隆县| 郎溪县| 忻州市| 泰兴市| 亚东县| 威宁| 农安县|