- Stream Analytics with Microsoft Azure
- Anindita Basak Krishna Venkataraman Ryan Murphy Manpreet Singh
- 416字
- 2021-07-02 22:35:58
Logical flow of processing
In the current era of data explosion and requirement for an always-connected paradigm, organizations are collecting colossal volumes of data on a continuous basis in real or near-real-time basis. The value of this data surge depends on the ability to extract actionable and contextual insights in a timely fashion. Streaming applications have a very strong mandate to derive real-time actionable insights from massive data ingestion pipelines. They have to react to data in real time. For instance, as a data stream arrives, it should trigger a multitude of dependent actions and capture reactions. The most critical part of building streaming solutions is to understand the interlude between input, output and query processing at scale. Do also note streaming applications never exist in siloed mode, but part of the larger ecosystem of applications.
The following illustration provides a high-level conceptual view of various interplay with different components. Starting with a stream of data, reference data included to enrich the arriving streaming data, queries are executed and responses pushed out, followed by notifications to end users and storage of the final results in the data store for future references:

Logical view of streaming flow processing
If you take a traditional transactional data processing workload, all the data is collected before the start of processing. In the stream, processing queries are run against that data in flight as illustrated as follows:
Queries executed on streaming data
When data is continually in motion keeping the state of the data is challenging or difficult, the state is stored in in-memory that is working memory and that is limited. Additionally, networking challenges will creep in turn resulting late arrival of data or missing data sets. Patterns like Command Query Responsibility Segregation is used to scale out read and writes separately.
Command Query Responsibility Segregation (CQRS) is an architecture pattern for separating concerns, the reads and writes are separated and provides the ability to read and write faster in separate streams. The event stored in the event store is immutable with a timestamp.

In the preceding architecture, immutable events with data stamp are sent through the event pipe and split between immediate event action and long-term data retention. Events are stored with a timestamp and it gives the ability to determine the state of the system at any previous point in time by querying the Events. By splitting the data streams into multiple channels higher throughput is achieved.