官术网_书友最值得收藏!

Storm architecture and its components

We have discussed enough about the history and theory of abstractions of Storm; it's now time to pe in and see the framework in execution and get hands on to the real code to actually see Storm in action. We are just one step away from the action part. Before we get there, let's understand what are the various components that get to play in Storm and what is their contribution in the building and orchestration of this framework.

Storm execution can be done in two flavors:

  • Local mode: This is a single node and a nondistributed setup that is generally used for demo and testing. Here, the entire topology is executed in a single worker and thus a single JVM.
  • Distributed mode: This is a multinode setup that is fully or partially distributed and this is the recommended mode for real-time application development and deployment.

The instructions can be referred to from the Apache Storm site at https://storm.apache.org/documentation/Setting-up-a-Storm-cluster.html. A typical Storm setup has the following components.

A Zookeeper cluster

Zookeeper is actually the orchestration engine and bookkeeper for Storm. Zookeeper actually does all the coordination such as submission of topology, creation of workers, keeping track and checking of dead nodes and processes, and restarting the execution on alternate choices in the cluster in the event of failures of supervisors or worker processes.

A Storm cluster

A Storm cluster is generally set up on multiple nodes and comprises the following processes:

  • Nimbus: This is the master process of the Storm framework and can be considered analogous to the JobTracker of Hadoop. This is the process which owns the task of topology submission and distributes the entire code bundle to all other supervisor nodes of the cluster. It also distributes the workers across various supervisors in the cluster. A Storm cluster has one Nimbus daemon process.
  • Supervisors: These are the processes that actually do the processing. A Storm cluster generally has multiple supervisors. Once the topology is submitted to the Nimbus and the worker assignment is done, worker processes within the supervisor nodes do all the processing; these worker processes are started by the Supervisor daemon.
  • UI: The Storm framework provides the following browser based interface to monitor the cluster and various topologies. The UI process has to be started on any one node in the cluster and the web application is accessible at http://ui-node-ip:8080:

Now that we have an idea about the various processes and components of the Storm cluster, let's get further acquainted with how the various moving parts function together and what happens when a topology is submitted to the cluster.

Have a look at the following diagram:

This diagram depicts the following:

  • Nimbus: It is the master Storm process of the cluster, essentially a Thrift server.

    This is the Storm daemon where the topology is submitted by a Storm submitter. The code distribution (the JAR file and all its dependencies) is done from this node to all other nodes of the cluster. This node sets up all static information about the cluster and topology; it assigns all workers and starts the topology.

    This process also keeps a check and monitor on failure situations across the cluster. If a supervisor node goes down, the Nimbus reassigns the tasks executing on that node to other nodes in the cluster.

  • Zookeeper: Once the topology is submitted (submission is done to the Nimbus Thrift server in JSON and Thrift), Zookeeper captures all worker assignments and thus keeps track of all processing components in the cluster.

    Cluster synchronization and heartbeat tracking is done by Zookeeper for all worker processes.

    This mechanism ensures that after the topology has been submitted, if the Nimbus goes down, the topology operations would still continue to function normally because the co-ordination and heartbeat is tracked by Zookeeper.

  • Supervisors: These are the nodes that take up the processing job from the Nimbus and execute various workers according to the assignment to execute the same.
  • Workers: These are the processes that execute within the supervisor nodes; the job assigned to them is to execute and complete a portion of the topology.
  • Executors: These are Java threads that are spawned by worker processes within their JVM for processing of the topology.
  • Tasks: These are the instances or the components within executor where actually a piece of work happens, or the processing is done.
主站蜘蛛池模板: 蓬溪县| 桦甸市| 阳朔县| 来宾市| 中阳县| 武宣县| 昌宁县| 专栏| 嵊泗县| 休宁县| 三门峡市| 大姚县| 吉林省| 虞城县| 甘洛县| 惠水县| 酉阳| 宿州市| 娱乐| 调兵山市| 富锦市| 福清市| 青田县| 乌鲁木齐市| 广丰县| 西藏| 若羌县| 梁平县| 广灵县| 高台县| 渝北区| 诸暨市| 延安市| 广河县| 句容市| 定南县| 剑阁县| 沿河| 望奎县| 梁山县| 彝良县|