- Hadoop 2.x Administration Cookbook
- Gurmukh Singh
- 309字
- 2021-07-09 20:10:27
Introduction
In this chapter, we will take a look at the storage layer, which is HDFS, and how it can be configured for storing data. It is important to ensure the good health of this distributed filesystem, and make sure that the data it contains is available, even in the case of failures. In this chapter, we will take a look at the replication, quota setup, and balanced distribution of data across nodes, as well as covering recipes on rack awareness and heartbeat for communication with the master.
The recipes in this chapter assume that you already have a running cluster and have completed the steps given in Chapter 1, Hadoop Architecture and Deployment.
Note
While the recipes in this chapter will give you an overview of a typical configuration, we encourage you to adapt this proposal according to your needs. The block size plays an important role in the performance and the amount of data that is worked on by a mapper. It is good practice to set up passphrase less access between nodes, so that the user does not need to enter a password while doing operations across nodes.
Overview of HDFS
Hadoop distributed file system (HDFS)is inspired from the Google File system (GFS). The fundamental idea is to split the files into smaller chunks called blocks and distribute them across nodes in the cluster. HDFS is not the only filesystem used in Hadoop, but there are other filesystems as well such as MapR-FS, ISILON, and so on.
HDFS is a pseudo filesystem that is created on top of other filesystems, such as ext3, ext4, and so on. An important thing to keep in mind is that to store data in Hadoop, we cannot directly write to native filesystems such as ext3, ext4, or xfs. In this chapter, we will cover recipes to configure properties of HDFS.
- 輕輕松松自動(dòng)化測(cè)試
- 控制與決策系統(tǒng)仿真
- 程序設(shè)計(jì)語(yǔ)言與編譯
- 統(tǒng)計(jì)策略搜索強(qiáng)化學(xué)習(xí)方法及應(yīng)用
- Enterprise PowerShell Scripting Bootcamp
- Mastering ServiceNow Scripting
- 中國(guó)戰(zhàn)略性新興產(chǎn)業(yè)研究與發(fā)展·智能制造裝備
- Drupal高手建站技術(shù)手冊(cè)
- Mastering OpenStack(Second Edition)
- Mastering DynamoDB
- 網(wǎng)頁(yè)設(shè)計(jì)與制作
- ROS Robotics By Example(Second Edition)
- Modern Big Data Processing with Hadoop
- 數(shù)據(jù)庫(kù)技術(shù):Access 2003·計(jì)算機(jī)網(wǎng)絡(luò)技術(shù)
- ARM Cortex-M3微控制器原理與應(yīng)用