电子娱乐游戏网站平台大全

書名： Real-Time Big Data Analytics
作者名： Sumit Gupta Shilpi
本章字數： 774字
更新時間： 2021-07-16 12:54:34

How and when to use Storm

I am a believer of the fact that the quickest way to get acquainted to a tool or technology is to do it, and we have been doing a lot of theoretical talking so far rather than actually doing it, so let's actually begin the fun. We would start with the basic word count topology, I have lot of experience of using Storm on Linux, and there is a lot of online material available for the same. I have used a Windows VM for execution of the word count topology. Here are a couple of prerequisites:

apache-storm-0.9.5-src.
JDK 1.6+.
Python 2.x. (I figured this out by a little trial and error. My Ubuntu always had Python and it never gave any trouble; for example, the word count uses a Python script for splitting sentences, so I set up Python 3 (the latest version), but later figured out that the compatible one is 2.x.)
Maven.
Eclipse.

Here we go.

Set up the following environment variables accurately:

JAVA_HOME
MVN_HOME
PATH

The PATH variable should have the path to Python installation on your machine:

Just to crosscheck that everything is set up accurately, please issue the following commands on your command prompt and match the output:

Now, we are all set to import the Storm starter project into Eclipse and actually see it executing. The following screenshot depicts the steps to be executed for importing and reaching the word count topology in the Storm source bundle:

Now, we have the project ready to be built and executed. I will explain all the moving components of the topology in detail, but for now let's see what the output would be like.

We have seen the topology executing, so there are a couple of important observations I want you all to deduce from the previous screenshot of the Eclipse console. Note that though it's a single-node Storm execution, you can still see the loggers pertaining to the creation of the following:

Zookeeper connection
Supervisor process startup
Worker process creation
Tuple emission after actual word count

Now let's take a closer look at understanding the wiring of the word count topology. The following diagram captures the essence of the flow and the relevant code snippets. The word count topology is basically a network of RandomSentenceSpout, SplitSentenceBolt, and WordCountBolt.

Though this diagram is self-explanatory in terms of the flow and action, I'll take up a few lines to elaborate on the code snippets:

WordCountTopology: Here, we have actually done the networking or wiring of the various streams and processing components of the application:

// we are creating the topology builder template class
TopologyBuilder builder = new TopologyBuilder();
// we are setting the spout in here an instance of   //RandomSentenceSpout
builder.setSpout("spout", new RandomSentenceSpout(), 5);
//Here the first bolt SplitSentence is being wired into //the builder template
builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout");
//Here the second bolt WordCount is being wired into the //builder template
builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

One item worth noting is how various groupings, fieldsGrouping and shuffleGrouping, in the preceding examples are being used for wiring and subscribing the streams. Another interesting point is the parallelism hint defined against each component while it's being wired into the topology.

builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));

For example, in the preceding snippet, a new bolt WordCount is defined with a parallelism hint of 12. That means 12 executor tasks would be spawned to handle the parallel processing for this bolt.

RandomSentenceSpout: This is the spout or the feeder component of the topology. It picks up a random sentence from the group of sentences hardcoded in the code itself and emits that onto the topology for processing using the collector.emit() method. Here's an excerpt of the code for the same:

public void nextTuple() {
  Utils.sleep(100);
  String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" };
  String sentence = sentences[_rand.nextInt(sentences.length)];
  _collector.emit(new Values(sentence));
}
…
public void declareOutputFields(OutputFieldsDeclarer declarer) {
  declarer.declare(new Fields("word"));
}

The nextTuple() method is executed for every event or tuple that is read by the spout and pushed into the topology. I have also attached the snippet of the declareOutputFields method wherein a new stream emerging from this spout is being bound to.

Now that we have seen the word count executing on our Storm setup, it's time to delve into storm internals and understand the nitty-gritties of Storm internals and its architecture, and dissect some of the key capabilities to discover the implementation for the same.

官术网_书友最值得收藏!

Real-Time Big Data Analytics

How and when to use Storm