- Hadoop 2.x Administration Cookbook
- Gurmukh Singh
- 328字
- 2021-07-09 20:10:28
Configuring HDFS replication
For redundancy, it is important to have multiple copies of data. In HDFS, this is achieved by placing copies of blocks on different nodes. By default, the replication factor is 3, which means that for each block written to HDFS, there will be three copies in total on the nodes in the cluster.
It is important to make sure that the cluster is working fine and the user can perform file operations on the cluster.
Getting ready
Log in to any of the nodes in the cluster. It is best to use the edge node, as stated in Chapter 1, and switch to the user hadoop
.
Create a simple text file named file1.txt
using any of your favorite text editors, and write some content in it.
How to do it...
ssh
to the Namenode, which in this case isnn1.cluster1.com
, and switch to userhadoop
.- Navigate to the
/opt/cluster/hadoop/etc/hadoop
directory. This is the directory where we installed Hadoop in Chapter 1, Hadoop Architecture and Deployment. If the user has installed it at a different location, then navigate to this directory. - Configure to the
dfs.replication
parameter in the directoryhdfs-site.xml
file. - See the following screenshot for this configuration:
- Once the changes are made, save the file and make changes across all nodes in the cluster.
- Restart the Namenode and Datanode daemons across the cluster. The easiest way of doing this is using the
stop-dfs.sh
andstart-dfs.sh
commands. - See the following screenshot, which shows the way to restart the daemons:
How it works...
The dfs.replication
parameter is usually the same across the cluster, but it can be configured to be different across all nodes in the cluster. The source node from which the copy operation is done will define the replication factor for a file. For example, if an edge node has replication set to 2, then the blocks will be replicated twice, irrespective of the value on Namenode.
See also
- The Configuring HDFS block size recipe
- 課課通計算機原理
- Project 2007項目管理實用詳解
- Ansible Quick Start Guide
- TIBCO Spotfire:A Comprehensive Primer(Second Edition)
- Learning Apache Cassandra(Second Edition)
- Hands-On Cybersecurity with Blockchain
- 西門子S7-200 SMART PLC實例指導學與用
- Photoshop CS3圖層、通道、蒙版深度剖析寶典
- 空間機械臂建模、規劃與控制
- 會聲會影X4中文版從入門到精通
- 貫通Java Web輕量級應用開發
- Windows 7來了
- Mastering Android Game Development with Unity
- 工程地質地學信息遙感自動提取技術
- 智能移動機器人的設計、制作與應用