- Cloudera Administration Handbook
- Rohit Menon
- 203字
- 2021-12-08 12:36:34
Components of Apache Hadoop
Apache Hadoop is composed of two core components. They are:
- HDFS: The HDFS is responsible for the storage of files. It is the storage component of Apache Hadoop, which was designed and developed to handle large files efficiently. It is a distributed filesystem designed to work on a cluster and makes it easy to store large files by splitting the files into blocks and distributing them across multiple nodes redundantly. The users of HDFS need not worry about the underlying networking aspects, as HDFS takes care of it. HDFS is written in Java and is a filesystem that runs within the user space.
- MapReduce: MapReduce is a programming model that was built from models found in the field of functional programming and distributed computing. In MapReduce, the task is broken down to two parts: map and reduce. All data in MapReduce flows in the form of key and value pairs,
<key, value>
. Mappers emit key and value pairs and the reducers receive them, work on them, and produce the final result. This model was specifically built to query/process the large volumes of data stored in HDFS.
We will be going through HDFS and MapReduce in depth in the next chapter.
推薦閱讀
- Design for the Future
- 三菱FX3U/5U PLC從入門到精通
- Practical Data Wrangling
- R Machine Learning By Example
- 自動(dòng)控制原理
- STM32G4入門與電機(jī)控制實(shí)戰(zhàn):基于X-CUBE-MCSDK的無刷直流電機(jī)與永磁同步電機(jī)控制實(shí)現(xiàn)
- Docker High Performance(Second Edition)
- 信息物理系統(tǒng)(CPS)測(cè)試與評(píng)價(jià)技術(shù)
- Machine Learning with the Elastic Stack
- 空間機(jī)械臂建模、規(guī)劃與控制
- Spatial Analytics with ArcGIS
- 嵌入式GUI開發(fā)設(shè)計(jì)
- Photoshop CS5圖像處理入門、進(jìn)階與提高
- INSTANT Adobe Story Starter
- 人工智能云平臺(tái):原理、設(shè)計(jì)與應(yīng)用