官术网_书友最值得收藏!

Components of Apache Hadoop

Apache Hadoop is composed of two core components. They are:

  • HDFS: The HDFS is responsible for the storage of files. It is the storage component of Apache Hadoop, which was designed and developed to handle large files efficiently. It is a distributed filesystem designed to work on a cluster and makes it easy to store large files by splitting the files into blocks and distributing them across multiple nodes redundantly. The users of HDFS need not worry about the underlying networking aspects, as HDFS takes care of it. HDFS is written in Java and is a filesystem that runs within the user space.
  • MapReduce: MapReduce is a programming model that was built from models found in the field of functional programming and distributed computing. In MapReduce, the task is broken down to two parts: map and reduce. All data in MapReduce flows in the form of key and value pairs, <key, value>. Mappers emit key and value pairs and the reducers receive them, work on them, and produce the final result. This model was specifically built to query/process the large volumes of data stored in HDFS.

We will be going through HDFS and MapReduce in depth in the next chapter.

主站蜘蛛池模板: 南漳县| 灵山县| 扬州市| 博爱县| 宁明县| 梁山县| 河北省| 读书| 梁平县| 蓝山县| 麻城市| 达拉特旗| 额济纳旗| 汉阴县| 剑阁县| 崇义县| 宜兰市| 普宁市| 保靖县| 杭州市| 绩溪县| 西吉县| 大丰市| 江口县| 盘锦市| 自贡市| 大港区| 双城市| 永安市| 宁化县| 永州市| 博白县| 日土县| 东至县| 河南省| 黄山市| 修武县| 诸暨市| 鸡泽县| 彰化市| 三穗县|