官术网_书友最值得收藏!

Overview of Storm

Storm is an open source, distributed, resilient, real-time processing engine. It was started by Nathan Marz in late 2010. He was working at BackType. On his blog, he mentioned the challenges he faced while building Storm. It is a must read: http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html.

Here is the crux of the whole blog: initially, real-time processing was implemented like pushing messages into a queue and then reading the messages from it using Python or any other language and processing them one by one. The challenges with this approach are:

  • In case of failure of the processing of any message, it has to be put back into the queue for reprocessing
  • Keeping queues and the worker (processing unit) up and running all the time

What follows are two sparking ideas by Nathan that make Storm capable of being a highly reliable and real-time engine:

  • Abstraction: Storm is a distributed abstraction in the form of streams. Streams can be produced and processed in parallel. Spouts can produce new streams and a bolt is a small unit of processing in a stream. Topology is top-level abstraction. The advantage of abstraction here is that nobody need worry about what is going on internally, such as serialization/deserialization, sending/receiving messages between different processes, and so on. The user can focus on writing the business logic.
  • A guaranteed message processing algorithm is the second idea. Nathan developed an algorithm based on random numbers and XORs that would only require about 20 bytes to track each spout tuple, regardless of how much processing was triggered downstream.

In May 2011 BackType was acquired by Twitter. After becoming popular in public forums, Storm started to be called "real-time Hadoop". In September, 2011, Nathan officially released Storm. In September, 2013, he officially proposed Storm in Apache Incubator. In September, 2014, Storm became a top-level project in Apache.

主站蜘蛛池模板: 静安区| 亚东县| 望奎县| 洞头县| 建水县| 呼图壁县| 涟源市| 沧州市| 南陵县| 南漳县| 罗甸县| 邢台县| 七台河市| 德兴市| 临武县| 亳州市| 永城市| 兴文县| 尼木县| 文昌市| 电白县| 宜城市| 铜山县| 保德县| 日喀则市| 陇川县| 苍梧县| 广丰县| 休宁县| 安泽县| 和平县| 上犹县| 丹阳市| 鄂温| 巫山县| 大余县| 黑山县| 台湾省| 莲花县| 喀喇沁旗| 子长县|