官术网_书友最值得收藏!

Real-time processing

While batch processing frameworks are good for most data warehousing use cases, there is a critical need for processing the data and generating actionable insight as soon as the data is available. For example, in a credit card fraud detection system, the alert should be generated as soon as the first instance of logged malicious activity. There is no value if the actionable insight (denying the transaction) is available as a result of the end-of-month batch process. The idea of a real-time processing framework is to reduce latency between event time and processing time. In an ideal system, the expectation would be zero differential between the event time and the processing time. However, the time difference is a function of the data source input, execution engine, network bandwidth, and hardware. Real-time processing frameworks achieve low latency with minimal I/O by relying on in-memory computing in a distributed manner. Some of the most popular real-time processing frameworks are:

  • Apache Spark: This is a distributed execution engine that relies on in-memory processing based on fault tolerant data abstractions named RDDs (Resilient Distributed Datasets).
  • Apache Storm: This is a framework for distributed real-time computation. Storm applications are designed to easily process unbounded streams, which generate event data at a very high velocity.
  • Apache Flink: This is a framework for efficient, distributed, high volume data processing. The key feature of Flink is automatic program optimization. Flink provides native support for massively iterative, compute intensive algorithms.

As the ecosystem is evolving, there are many more frameworks available for batch and real-time processing. Going back to the machine intelligence evolution cycle (Perceive, Process, Persist, Perform), we are going to leverage these frameworks to create programs that work on Big Data, take an algorithmic approach to filter relevant data, generate models based on the patterns within the data, and derive actionable insight and predictions that ultimately lead to value from the data assets.

主站蜘蛛池模板: 叙永县| 皮山县| 丹寨县| 科尔| 建平县| 桃江县| 当涂县| 延长县| 逊克县| 丰城市| 临高县| 察隅县| 栾川县| 岗巴县| 金川县| 华蓥市| 龙泉市| 久治县| 彭泽县| 思南县| 长武县| 九龙县| 永吉县| 泸溪县| 陆丰市| 东乡县| 九龙坡区| 鹤壁市| 靖远县| 安阳县| 华池县| 新昌县| 玛曲县| 南通市| 措勤县| 陆川县| 辉县市| 威宁| 贵南县| 商城县| 安塞县|