- Big Data Analytics
- Venkat Ankam
- 524字
- 2021-08-20 10:32:24
Starting Spark daemons
If you are planning to use a standalone cluster manager, you need to start the Spark master and worker daemons which are the core components in Spark's architecture. Starting/stopping daemons varies slightly from distribution to distribution. Hadoop distributions such as Cloudera, Hortonworks, and MapR provide Spark as a service with YARN as the default resource manager. This means that all Spark applications will run on the YARN framework by default. But, we need to start spark master and worker roles to use Spark's standalone resource manager. If you are planning to use the YARN resource manager, you don't need to start these daemons. Please follow the following procedure depending on the type of distribution you are using. Downloading and installation instructions can be found in Chapter 2, Getting Started with Apache Hadoop and Apache Spark, for all these distributions.
Working with CDH
Cloudera Distribution for Hadoop (CDH) is an open source distribution including Hadoop, Spark, and many other projects needed for Big Data Analytics. Cloudera Manager is used for installing and managing the CDH platform. If you are planning to use the YARN resource manager, start the Spark service in Cloudera Manager. To start Spark daemons for Spark's standalone resource manager, use the following procedure:
- Spark on the CDH platform is configured to work with YARN. Moreover, spark 2.0 is not available on CDH yet. So, download the latest pre-built spark 2.0 package for Hadoop as explained in Chapter 2, Getting Started with Apache Hadoop and Apache Spark. If you would like to use Spark 1.6 version, run the
/usr/lib/spark/start-all.sh
command. - Start the service with following commands.
cd /home/cloudera/spark-2.0.0-bin-hadoop2.7/sbin sudo ./start-all.sh
- Check the Spark UI at
http://quickstart.cloudera:8080/
.
Working with HDP, MapR, and Spark pre-built packages
Hortonworks Data Platform (HDP) and MapR Converged Data Platform distributions also include Hadoop, Spark, and many other projects needed for Big Data Analytics. While HDP uses Apache Ambari for deploying and managing the cluster, MapR uses the MapR Control System (MCS). Spark's pre-built package has no specific manager component for managing Spark. If you are planning to use the YARN resource manager, start the Spark service in Ambari or MCS. To Start Spark daemons for using Spark's standalone resource manager, use the following procedure.
- Start services with the following commands:
- HDP:
/usr/hdp/current/spark-client/sbin/start-all.sh
- MapR:
/opt/mapr/spark/spark-*/sbin/start-all.sh
- Spark Package pre-built for Hadoop:
./sbin/start-all.sh
For a multi node cluster, start spark worker roles on all machines with the following command:
./sbin/start-slave.sh spark://masterhostname:7077
Another option is to provide a list of the hostnames of the workers in the
/conf/slaves
file and then use the./start-all.sh
command to start worker roles on all machines automatically. - HDP:
- Check logs located in the logs directory. Look at the master web UI at
http://masterhostname:8080
. If this port is already taken by another service, the next available port will be used. For example, in HDP, port 8080 is taken by Ambari, so the standalone master will bind to 8081. To find the correct port number, check the logs.Note
All programs in this chapter are executed on CDH 5.8 VM. For other environments, the file paths might change but the concepts are the same in any environment.
- OpenCV for Secret Agents
- Python神經網絡項目實戰
- Java面向對象程序開發及實戰
- aelf區塊鏈應用架構指南
- Learning Neo4j 3.x(Second Edition)
- Python機器學習編程與實戰
- Learning Selenium Testing Tools(Third Edition)
- 單片機應用與調試項目教程(C語言版)
- Java SE實踐教程
- 大話Java:程序設計從入門到精通
- 軟件體系結構
- Ext JS 4 Plugin and Extension Development
- 超簡單:Photoshop+JavaScript+Python智能修圖與圖像自動化處理
- Apache Solr PHP Integration
- Photoshop智能手機APP界面設計