- Fast Data Processing with Spark 2(Third Edition)
- Krishna Sankar
- 353字
- 2021-08-20 10:27:08
Deploying Spark with Chef (Opscode)
Chef is an open source automation platform that has become increasingly popular for deploying and managing both small and large clusters of machines. Chef can be used to control a traditional static fleet of machines and can also be used with EC2 and other cloud providers. Chef uses cookbooks as the basic building blocks of configuration and can either be generic or site-specific. If you have not used Chef before, a good tutorial for getting started with Chef can be found at https://learnchef.opscode.com/. You can use a generic Spark cookbook as the basis for setting up your cluster.
To get Spark working, you need to create a role for both the master and the workers as well as configure the workers to connect to the master. Start by getting the cookbook from https://github.com/holdenk/chef-cookbook-spark. The bare minimum requirements are to set the master host name (as master) to enable worker nodes to connect, and the username so that Chef can be installed in the correct place. You will also need to either accept Sun's Java license or switch to an alternative JDK. Most of the settings that are available in spark-env.sh
are also exposed through the cookbook settings. You can see an explanation of the settings in the Configuring multiple hosts over SSH section. The settings can be set as per role or you can modify the global defaults.
Create a role for the master with a knife role; create spark_master_role -e [editor]
. This will bring up a template role file that you can edit. For a simple master, set it to this code:
{ "name": "spark_master_role", "description": "", "json_class": "Chef::Role", "default_attributes": { }, "override_attributes": { "username":"spark", "group":"spark", "home":"/home/spark/sparkhome", "master_ip":"10.0.2.15", }, "chef_type": "role", "run_list": [ "recipe[spark::server]", "recipe[chef-client]", ], "env_run_lists": { } }
Then, create a role for the client in the same manner except that instead of spark::server
, you need to use the spark::client
recipe. Deploy the roles to different hosts:
knife node run_list add master role[spark_master_role] knife node run_list add worker role[spark_worker_role]
Then, run chef-client
on your nodes to update. Congrats, you now have a Spark cluster running!
- 造個小程序:與微信一起干件正經事兒
- Visual Basic程序開發(學習筆記)
- Practical UX Design
- Three.js開發指南:基于WebGL和HTML5在網頁上渲染3D圖形和動畫(原書第3版)
- Groovy for Domain:specific Languages(Second Edition)
- 精通網絡視頻核心開發技術
- Terraform:多云、混合云環境下實現基礎設施即代碼(第2版)
- CoffeeScript Application Development Cookbook
- 基于ARM Cortex-M4F內核的MSP432 MCU開發實踐
- Python 3.7從入門到精通(視頻教學版)
- Unity&VR游戲美術設計實戰
- 超簡單:Photoshop+JavaScript+Python智能修圖與圖像自動化處理
- Python面試通關寶典
- 精益軟件開發管理之道
- TypeScript High Performance