- Hadoop Real-World Solutions Cookbook(Second Edition)
- Tanmay Deshpande
- 357字
- 2021-07-09 20:02:48
Adding new nodes to existing Hadoop clusters
Sometimes, it may happen that an existing Hadoop cluster's capacity is not adequate enough to handle all the data you may want to process. In this case, you can add new nodes to the existing Hadoop cluster without any downtime for the existing cluster. Hadoop supports horizontal scalability.
Getting ready
To perform this recipe, you should have a Hadoop cluster running. Also, you will need one more machine. If you are using AWS EC2, then you can launch an EC2 instance that's similar to what we did in the previous recipes. You will also need the same security group configurations in order to make the installation process smooth.
How to do it...
To add a new instance to an existing cluster, simply install and configure Hadoop the way we did for the previous recipe. Make sure that you put the same configurations in core-site.xml
and yarn-site.xml
, which will point to the correct master node.
Once all the configurations are done, simply execute commands to start the newly added datanode
and nodemanager
:
/usr/local/hadoop/sbin/hadoop-daemon.sh start datanode /usr/local/hadoop/sbin/yarn-daemon.sh start nodemanager
If you take a look at the cluster again, you will find that the new node is registered. You can use the dfsadmin
command to take a look at the number of nodes and amount of capacity that's been used:
hdfs dfsadmin -report
Here is a sample output for the preceding command:
How it works...
Hadoop supports horizontal scalability. If the resources that are being used are not enough, we can always go ahead and add new nodes to the existing cluster without hiccups. In Hadoop, it's always the slave that reports to the master. So, while making configurations, we always configure the details of the master and do nothing about the slaves. This architecture helps achieve horizontal scalability as at any point of time, we can add new nodes by only providing the configurations of the master, and everything else is taken care of by the Hadoop cluster. As soon as the daemons start, the master node realizes that a new node has been added and it becomes part of the cluster.
- Big Data Analytics with Hadoop 3
- Practical Data Analysis
- PostgreSQL 11 Server Side Programming Quick Start Guide
- Deep Learning Quick Reference
- 統計策略搜索強化學習方法及應用
- 軟件工程及實踐
- SMS 2003部署與操作深入指南
- Hands-On Dashboard Development with QlikView
- 工業(yè)機器人集成應用
- WPF專業(yè)編程指南
- 基于元胞自動機的人群疏散系統建模與分析
- 軟件測試管理
- R Statistics Cookbook
- 單片機原理、應用與仿真
- 信息技術基礎與應用