官术网_书友最值得收藏!

Adding nodes to the cluster

Over a period of time, our cluster will grow in data and there will be a need to increase the capacity of the cluster by adding more nodes.

We can add Datanodes to the cluster in the same way that we first configured the Datanode started the Datanode daemon on it. But the important thing to keep in mind is that all nodes can be part of the cluster. It should not be that anyone can just start a Datanode daemon on his laptop and join the cluster, as it will be disastrous. By default, there is nothing preventing any node being a Datanode, as the user has just to untar the Hadoop package and point the file "core-site.xml" to the Namenode and start the Datanode daemon.

Getting ready

For the following steps, we assume that the cluster that is up and running with Datanodes is in a healthy state and we need to add a new Datanode in the cluster. We will login to the Namenode and make changes there.

How to do it...

  1. ssh to Namenode and edit the file hdfs-site.xml to add the following property to it:
    <property>
    <name>dfs.hosts</name>
    <value>/home/hadoop/includes</value>
    <final>true</final>
    </property>
  2. Make sure the file includes is readable by the user hadoop.
  3. Restart the Namenode daemon for the property to take effect:
    $ hadoop-daemons.sh stop namenode
    $ hadoop-daemons.sh start namenode
    
  4. A restart of Namenode is required only when any property is changed in the file. Once the property is in place, Namenode can read the changes to the contents of the includes file by simply refreshing the nodes.
  5. Add the dn1.cluster1.com node to the file excludes:
    $ cat includes
    dn1.cluster1.com
    
  6. The file includes or excludes can contain a list of multiple nodes, one node per line.
  7. After adding the node to the file, we just need to reload the file by entering the following:
    $ hadoop dfsadmin -refreshNodes
    
  8. After some time, the node will be available in the cluster and can be seen:
    $ hdfs dfsadmin -report
    

How it works...

The file /home/hadoop/includes will contain a list of all the Datanodes that are allowed to join a cluster. If the file includes is blank, then all Datanodes are allowed to join the cluster. If there is both an include and exclude file, the list of nodes must be mutually exclusive in both the files. So, to decommission the node dn1.cluster.com from the cluster, it must be removed from the includes file and added to the excludes file.

There's more...

In addition to controlling the nodes as we described, there will be firewall rules in place and separate VLANs for Hadoop clusters to keep the traffic and data isolated.

主站蜘蛛池模板: 封丘县| 承德县| 鹿邑县| 安宁市| 探索| 元氏县| 阿克| 宜州市| 五常市| 沁水县| 灵山县| 新余市| 建水县| 县级市| 北海市| 沂水县| 读书| 新建县| 和政县| 浦东新区| 布尔津县| 长兴县| 望城县| 额敏县| 东城区| 景泰县| 霍林郭勒市| 陵水| 孟津县| 广宗县| 卫辉市| 柳州市| 麟游县| 黔东| 冀州市| 洪泽县| 衡南县| 德庆县| 菏泽市| 霍林郭勒市| 黔江区|