官术网_书友最值得收藏!

The Hadoop platform

Hadoop can be used for a lot of things. However, when you break it down to its core parts, the primary features of Hadoop are Hadoop Distributed File System (HDFS) and MapReduce.

HDFS stores read-only files by splitting them into large blocks and distributing and replicating them across a Hadoop cluster. Two services are involved with the filesystem. The first service, the NameNode acts as a master and keeps the directory tree of all file blocks that exist in the filesystem and tracks where the file data is kept across the cluster. The actual data of the files is stored in multiple DataNode nodes, the second service.

MapReduce is a programming model for processing large datasets with a parallel, distributed algorithm in a cluster. The most prominent trait of Hadoop is that it brings processing to the data; so, MapReduce executes tasks closest to the data as opposed to the data travelling to where the processing is performed. Two services are involved in a job execution. A job is submitted to the service JobTracker, which first discovers the location of the data. It then orchestrates the execution of the map and reduce tasks. The actual tasks are executed in multiple TaskTracker nodes.

Hadoop handles infrastructure failures such as network issues, node, or disk failures automatically. Overall, it provides a framework for distributed storage within its distributed file system and execution of jobs. Moreover, it provides the service ZooKeeper to maintain configuration and distributed synchronization.

Many projects surround Hadoop and complete the ecosystem of available Big Data processing tools such as utilities to import and export data, NoSQL databases, and event/real-time processing systems. The technologies that move Hadoop beyond batch processing focus on in-memory execution models. Overall multiple projects, from batch to hybrid and real-time execution exist.

主站蜘蛛池模板: 镇康县| 灵璧县| 武陟县| 蓬安县| 察哈| 荔浦县| 遂溪县| 怀宁县| 酒泉市| 中西区| 兴文县| 安化县| 边坝县| 元阳县| 武义县| 三都| 黄梅县| 铜梁县| 安徽省| 三亚市| 德令哈市| 屏南县| 榕江县| 台中县| 慈利县| 安西县| 海伦市| 黎城县| 安徽省| 洪泽县| 保康县| 潮州市| 汉中市| 贵港市| 洪湖市| 天长市| 宁陕县| 吕梁市| 天津市| 深泽县| 庆安县|