官术网_书友最值得收藏!

Adding new nodes to existing Hadoop clusters

Sometimes, it may happen that an existing Hadoop cluster's capacity is not adequate enough to handle all the data you may want to process. In this case, you can add new nodes to the existing Hadoop cluster without any downtime for the existing cluster. Hadoop supports horizontal scalability.

Getting ready

To perform this recipe, you should have a Hadoop cluster running. Also, you will need one more machine. If you are using AWS EC2, then you can launch an EC2 instance that's similar to what we did in the previous recipes. You will also need the same security group configurations in order to make the installation process smooth.

How to do it...

To add a new instance to an existing cluster, simply install and configure Hadoop the way we did for the previous recipe. Make sure that you put the same configurations in core-site.xml and yarn-site.xml, which will point to the correct master node.

Once all the configurations are done, simply execute commands to start the newly added datanode and nodemanager:

/usr/local/hadoop/sbin/hadoop-daemon.sh start datanode
/usr/local/hadoop/sbin/yarn-daemon.sh start nodemanager

If you take a look at the cluster again, you will find that the new node is registered. You can use the dfsadmin command to take a look at the number of nodes and amount of capacity that's been used:

hdfs dfsadmin -report

Here is a sample output for the preceding command:

How to do it...

How it works...

Hadoop supports horizontal scalability. If the resources that are being used are not enough, we can always go ahead and add new nodes to the existing cluster without hiccups. In Hadoop, it's always the slave that reports to the master. So, while making configurations, we always configure the details of the master and do nothing about the slaves. This architecture helps achieve horizontal scalability as at any point of time, we can add new nodes by only providing the configurations of the master, and everything else is taken care of by the Hadoop cluster. As soon as the daemons start, the master node realizes that a new node has been added and it becomes part of the cluster.

主站蜘蛛池模板: 县级市| 文昌市| 迁西县| 北京市| 建昌县| 垦利县| 云梦县| 衡阳市| 茂名市| 红原县| 通海县| 贡嘎县| 墨玉县| 定州市| 莱西市| 诸暨市| 乌兰察布市| 辰溪县| 西乌珠穆沁旗| 镇江市| 寿宁县| 磐安县| 沽源县| 湖州市| 南江县| 加查县| 宁明县| 孟州市| 大连市| 英超| 察隅县| 凉城县| 桑植县| 岢岚县| 修武县| 开阳县| 绍兴县| 石泉县| 榆树市| 鱼台县| 墨竹工卡县|