官术网_书友最值得收藏!

Spark Streaming

Spark Streaming is a package that is used to process a stream of data in real time. There can be many different types of a real-time stream of data; for example, an e-commerce website recording page visits in real time, credit card transactions, a taxi provider app sending information about trips and location information of drivers and passengers, and more. In a nutshell, all of these applications are hosted on multiple web servers that generate event logs in real time.

Spark Streaming makes use of RDD and defines some more APIs to process the data stream in real time. As Spark Streaming makes use of RDD and its APIs, it is easy for developers to learn and execute the use cases without learning a whole new technology stack.

Spark 2.x introduced structured streaming, which makes use of DataFrames rather than RDD to process the data stream. Using DataFrames as its computation abstraction brings all the benefits of the DataFrame API to stream processing. We shall discuss the benefits of DataFrames over RDD in coming chapters.

Spark Streaming has excellent integration with some of the most popular data messaging queues, such as Apache Flume and Kafka. It can be easily plugged into these queues to handle a massive amount of data streams.

主站蜘蛛池模板: 寿光市| 诸暨市| 卫辉市| 江门市| 瑞昌市| 肥东县| 巧家县| 重庆市| 美姑县| 留坝县| 达州市| 鄂托克前旗| 奉贤区| 九台市| 临夏县| 大丰市| 墨竹工卡县| 上虞市| 治县。| 曲松县| 平定县| 凌源市| 柘荣县| 高陵县| 收藏| 连江县| 清丰县| 崇州市| 建水县| 阿坝县| 沙坪坝区| 大姚县| 湘阴县| 云霄县| 龙江县| 辽宁省| 汉源县| 吉安县| 南丰县| 曲沃县| 康乐县|