官术网_书友最值得收藏!

Configuring rack awareness

There will always be failures in clusters, such as hardware issues with servers, racks, switches, power supplies, and so on.

To make sure that there is no single point of failure across the entire Hadoop infrastructure, and to ensure that the contention of resources is in a distributed manner, rack awareness plays an important role. Rack awareness is a concept in which Namenode is made aware of the layout of servers in a cluster, thus making intelligent decisions on block placement.

Getting ready

For the following steps, we assume that the cluster that is up and running with Datanodes is in a healthy state. We will log in to the Namenode and make changes there.

How to do it...

  1. ssh to Namenode and edit the hdfs-site.xml file to add the following property to it:
    <property>
    <name>topology.script.file.name</name>
    <value>/opt/cluster/topology.sh</value>
    </property>
  2. Make sure that the topology.sh file is readable by the user hadoop.
  3. Create two files, topology.sh and topology.data, and add the contents as shown in the following screenshot:
    How to do it...
  4. Restart the namenode daemon for the property to take effect:
    $ hadoop-daemons.sh stop namenode
    $ hadoop-daemons.sh start namenode
    
  5. Once the changes are made, the user will start seeing the rack field in the output of the hdfs dfsadmin –dfsreport command, as shown in the following screenshot:
    How to do it...
  6. We can have multiple levels in the topology by specifying topology.data:
    $ cat topology.data
    10.0.0.37 /sw1/rack1
    10.0.0.38 /sw1/rack2
    10.0.0.39 /sw2/rack3
    
  7. sw1 and sw2 are rack switches, so the failure of sw1 will cause the outage of rack1 and rack2. Namenode will make sure that all the copies of a block are not placed across rack1 and rack2:
    $ hadoop dfsadmin -refreshNodes
    

How it works...

Let's have a look at what we did throughout this recipe.

In steps 1 through 3 we added the new property to the hdfs-site.xml file and then restarted Namenode to make it aware of the changes. Once the property is in place, the Namenode becomes aware of the topology.sh file and it will execute it to find the layout of the Datanodes in the cluster.

When the Datanodes register with Namenode, the Namenode verifies their IP or hostname and places it in a rack map accordingly. This is dynamic in nature and is never persisted to disk.

In Hadoop 2, there are multiple classes, such as simpleDNS and table-based, that can be used to perform a resolution of hosts in the rack awareness algorithm. The user can use any scripting language or Java to configure this. We do not need to do anything if we are using a script as shown in the preceding method, but for Java invocations and other tabular formats, we need to modify the topology.node.switch.mapping.impl.

To troubleshoot this, there are some common things to look at, such as the file permissions and the path to the file. We will be able to see this if we check the Namenode logs.

See also

  • The Adding nodes to the Cluster recipe in Chapter 1, Hadoop Architecture a
    • hapter 6, Backup and Recovery, on cluster planning
主站蜘蛛池模板: 天峨县| 红桥区| 上虞市| 城市| 新安县| 临沧市| 五河县| 汉寿县| 冷水江市| 黄大仙区| 嵊州市| 建始县| 琼结县| 柞水县| 巍山| 奉新县| 九江市| 团风县| 红安县| 南丰县| 长治县| 奈曼旗| 吴桥县| 稻城县| 南宁市| 临沂市| 巴南区| 夏邑县| 忻城县| 肇州县| 合水县| 曲松县| 邢台市| 类乌齐县| 肥西县| 合川市| 盐津县| 威宁| 上杭县| 庄河市| 渝中区|