- Practical Real-time Data Processing and Analytics
- Shilpi Saxena Saurabh Gupta
- 306字
- 2021-07-08 10:23:15
Overview of Storm
Storm is an open source, distributed, resilient, real-time processing engine. It was started by Nathan Marz in late 2010. He was working at BackType. On his blog, he mentioned the challenges he faced while building Storm. It is a must read: http://nathanmarz.com/blog/history-of-apache-storm-and-lessons-learned.html.
Here is the crux of the whole blog: initially, real-time processing was implemented like pushing messages into a queue and then reading the messages from it using Python or any other language and processing them one by one. The challenges with this approach are:
- In case of failure of the processing of any message, it has to be put back into the queue for reprocessing
- Keeping queues and the worker (processing unit) up and running all the time
What follows are two sparking ideas by Nathan that make Storm capable of being a highly reliable and real-time engine:
- Abstraction: Storm is a distributed abstraction in the form of streams. Streams can be produced and processed in parallel. Spouts can produce new streams and a bolt is a small unit of processing in a stream. Topology is top-level abstraction. The advantage of abstraction here is that nobody need worry about what is going on internally, such as serialization/deserialization, sending/receiving messages between different processes, and so on. The user can focus on writing the business logic.
- A guaranteed message processing algorithm is the second idea. Nathan developed an algorithm based on random numbers and XORs that would only require about 20 bytes to track each spout tuple, regardless of how much processing was triggered downstream.
In May 2011 BackType was acquired by Twitter. After becoming popular in public forums, Storm started to be called "real-time Hadoop". In September, 2011, Nathan officially released Storm. In September, 2013, he officially proposed Storm in Apache Incubator. In September, 2014, Storm became a top-level project in Apache.
- 數據科學實戰手冊(R+Python)
- C程序設計簡明教程(第二版)
- Interactive Data Visualization with Python
- Vue.js 3.0源碼解析(微課視頻版)
- 名師講壇:Java微服務架構實戰(SpringBoot+SpringCloud+Docker+RabbitMQ)
- FLL+WRO樂高機器人競賽教程:機械、巡線與PID
- Mastering Android Game Development
- Building Machine Learning Systems with Python(Second Edition)
- 智能手機APP UI設計與應用任務教程
- Vue.js 3應用開發與核心源碼解析
- jQuery技術內幕:深入解析jQuery架構設計與實現原理
- RESTful Web API Design with Node.js
- Learning ROS for Robotics Programming
- Oracle Database 12c DBA官方手冊(第8版)
- PHP典型模塊與項目實戰大全