We have discussed enough about the history and theory of abstractions of Storm; it's now time to pe in and see the framework in execution and get hands on to the real code to actually see Storm in action. We are just one step away from the action part. Before we get there, let's understand what are the various components that get to play in Storm and what is their contribution in the building and orchestration of this framework.
Storm execution can be done in two flavors:
Local mode: This is a single node and a nondistributed setup that is generally used for demo and testing. Here, the entire topology is executed in a single worker and thus a single JVM.
Distributed mode: This is a multinode setup that is fully or partially distributed and this is the recommended mode for real-time application development and deployment.
Zookeeper is actually the orchestration engine and bookkeeper for Storm. Zookeeper actually does all the coordination such as submission of topology, creation of workers, keeping track and checking of dead nodes and processes, and restarting the execution on alternate choices in the cluster in the event of failures of supervisors or worker processes.
A Storm cluster
A Storm cluster is generally set up on multiple nodes and comprises the following processes:
Nimbus: This is the master process of the Storm framework and can be considered analogous to the JobTracker of Hadoop. This is the process which owns the task of topology submission and distributes the entire code bundle to all other supervisor nodes of the cluster. It also distributes the workers across various supervisors in the cluster. A Storm cluster has one Nimbus daemon process.
Supervisors: These are the processes that actually do the processing. A Storm cluster generally has multiple supervisors. Once the topology is submitted to the Nimbus and the worker assignment is done, worker processes within the supervisor nodes do all the processing; these worker processes are started by the Supervisor daemon.
UI: The Storm framework provides the following browser based interface to monitor the cluster and various topologies. The UI process has to be started on any one node in the cluster and the web application is accessible at http://ui-node-ip:8080:
Now that we have an idea about the various processes and components of the Storm cluster, let's get further acquainted with how the various moving parts function together and what happens when a topology is submitted to the cluster.
Have a look at the following diagram:
This diagram depicts the following:
Nimbus: It is the master Storm process of the cluster, essentially a Thrift server.
This is the Storm daemon where the topology is submitted by a Storm submitter. The code distribution (the JAR file and all its dependencies) is done from this node to all other nodes of the cluster. This node sets up all static information about the cluster and topology; it assigns all workers and starts the topology.
This process also keeps a check and monitor on failure situations across the cluster. If a supervisor node goes down, the Nimbus reassigns the tasks executing on that node to other nodes in the cluster.
Zookeeper: Once the topology is submitted (submission is done to the Nimbus Thrift server in JSON and Thrift), Zookeeper captures all worker assignments and thus keeps track of all processing components in the cluster.
Cluster synchronization and heartbeat tracking is done by Zookeeper for all worker processes.
This mechanism ensures that after the topology has been submitted, if the Nimbus goes down, the topology operations would still continue to function normally because the co-ordination and heartbeat is tracked by Zookeeper.
Supervisors: These are the nodes that take up the processing job from the Nimbus and execute various workers according to the assignment to execute the same.
Workers: These are the processes that execute within the supervisor nodes; the job assigned to them is to execute and complete a portion of the topology.
Executors: These are Java threads that are spawned by worker processes within their JVM for processing of the topology.
Tasks: These are the instances or the components within executor where actually a piece of work happens, or the processing is done.