- Real-Time Big Data Analytics
- Sumit Gupta Shilpi
- 774字
- 2021-07-16 12:54:34
How and when to use Storm
I am a believer of the fact that the quickest way to get acquainted to a tool or technology is to do it, and we have been doing a lot of theoretical talking so far rather than actually doing it, so let's actually begin the fun. We would start with the basic word count topology, I have lot of experience of using Storm on Linux, and there is a lot of online material available for the same. I have used a Windows VM for execution of the word count topology. Here are a couple of prerequisites:
- apache-storm-0.9.5-src.
- JDK 1.6+.
- Python 2.x. (I figured this out by a little trial and error. My Ubuntu always had Python and it never gave any trouble; for example, the word count uses a Python script for splitting sentences, so I set up Python 3 (the latest version), but later figured out that the compatible one is 2.x.)
- Maven.
- Eclipse.
Here we go.
Set up the following environment variables accurately:
JAVA_HOME
MVN_HOME
PATH
The PATH
variable should have the path to Python installation on your machine:

Just to crosscheck that everything is set up accurately, please issue the following commands on your command prompt and match the output:

Now, we are all set to import the Storm starter project into Eclipse and actually see it executing. The following screenshot depicts the steps to be executed for importing and reaching the word count topology in the Storm source bundle:

.
Now, we have the project ready to be built and executed. I will explain all the moving components of the topology in detail, but for now let's see what the output would be like.

We have seen the topology executing, so there are a couple of important observations I want you all to deduce from the previous screenshot of the Eclipse console. Note that though it's a single-node Storm execution, you can still see the loggers pertaining to the creation of the following:
- Zookeeper connection
- Supervisor process startup
- Worker process creation
- Tuple emission after actual word count
Now let's take a closer look at understanding the wiring of the word count topology. The following diagram captures the essence of the flow and the relevant code snippets. The word count topology is basically a network of RandomSentenceSpout
, SplitSentenceBolt
, and WordCountBolt
.

Though this diagram is self-explanatory in terms of the flow and action, I'll take up a few lines to elaborate on the code snippets:
WordCountTopology
: Here, we have actually done the networking or wiring of the various streams and processing components of the application:// we are creating the topology builder template class TopologyBuilder builder = new TopologyBuilder(); // we are setting the spout in here an instance of //RandomSentenceSpout builder.setSpout("spout", new RandomSentenceSpout(), 5); //Here the first bolt SplitSentence is being wired into //the builder template builder.setBolt("split", new SplitSentence(), 8).shuffleGrouping("spout"); //Here the second bolt WordCount is being wired into the //builder template builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
One item worth noting is how various groupings,
fieldsGrouping
andshuffleGrouping
, in the preceding examples are being used for wiring and subscribing the streams. Another interesting point is the parallelism hint defined against each component while it's being wired into the topology.builder.setBolt("count", new WordCount(), 12).fieldsGrouping("split", new Fields("word"));
For example, in the preceding snippet, a new bolt
WordCount
is defined with a parallelism hint of12
. That means 12 executor tasks would be spawned to handle the parallel processing for this bolt.RandomSentenceSpout
: This is the spout or the feeder component of the topology. It picks up a random sentence from the group of sentences hardcoded in the code itself and emits that onto the topology for processing using thecollector.emit()
method. Here's an excerpt of the code for the same:public void nextTuple() { Utils.sleep(100); String[] sentences = new String[]{ "the cow jumped over the moon", "an apple a day keeps the doctor away", "four score and seven years ago", "snow white and the seven dwarfs", "i am at two with nature" }; String sentence = sentences[_rand.nextInt(sentences.length)]; _collector.emit(new Values(sentence)); } … public void declareOutputFields(OutputFieldsDeclarer declarer) { declarer.declare(new Fields("word")); }
The nextTuple()
method is executed for every event or tuple that is read by the spout and pushed into the topology. I have also attached the snippet of the declareOutputFields
method wherein a new stream emerging from this spout is being bound to.

Now that we have seen the word count executing on our Storm setup, it's time to delve into storm internals and understand the nitty-gritties of Storm internals and its architecture, and dissect some of the key capabilities to discover the implementation for the same.
- 深度實踐OpenStack:基于Python的OpenStack組件開發
- 測試驅動開發:入門、實戰與進階
- PyTorch自動駕駛視覺感知算法實戰
- 算法零基礎一本通(Python版)
- JIRA 7 Administration Cookbook(Second Edition)
- Visual Basic程序設計教程
- jQuery從入門到精通 (軟件開發視頻大講堂)
- Hands-On JavaScript High Performance
- Hands-On C++ Game Animation Programming
- Easy Web Development with WaveMaker
- Expert Android Programming
- Cocos2d-x學習筆記:完全掌握Lua API與游戲項目開發 (未來書庫)
- Getting Started with LLVM Core Libraries
- Mastering Xamarin.Forms(Second Edition)
- 響應式架構:消息模式Actor實現與Scala、Akka應用集成