官术网_书友最值得收藏!

How it works...

The most important characteristic of a data lake is that it stores data in perpetuity. The only way to really meet this requirement is to use object storage, such as AWS S3. S3 provides 11 nines of durability. Said another way, S3 provides 99.999999999% durability of objects over a given year. It is also fully managed and provides life cycle management features to age objects into cold storage. Note that the bucket is defined with the DeletionPolicy set to Retain. This highlights that even if the stack is deleted, we still want to ensure that we are not inappropriately deleting this valuable data.

We are using Kinesis Firehose because it performs the heavy lifting of writing the events to the bucket. It provides a buffer based on the time and size, compression, encryption, and error handling. To simplify this recipe, I did not use compression or encryption, but it is recommended that you use these features.

This recipe defines one delivery stream, because in this cookbook, our stream topology consists of only one stream with ${cf:cncb-event-stream-${opt:stage}.streamArn}. In practice, your topology will consist of multiple streams, and you will define one Firehose delivery stream per Kinesis stream to ensure that the data lake is capturing all events. We set prefix to ${cf:cncb-event-stream-${opt:stage}.streamName}/ so that we can easily distinguish the events in the data lake by their stream.

Another important characteristic of a data lake is that the data is stored in its raw format, without modification. To this end, the transformer function adorns all available metadata about the specific Kinesis stream and Firehose delivery stream, to ensure that all available information is collected. In the Replaying events recipe, we will see how this metadata can be leveraged. Also, note that transformer adds the end-of-line character (\n) to facilitate future processing of the data.

主站蜘蛛池模板: 滁州市| 皮山县| 桑植县| 苍溪县| 道真| 松原市| 景德镇市| 天柱县| 宣汉县| 钟山县| 德阳市| 忻城县| 名山县| 瑞金市| 介休市| 宝鸡市| 从化市| 南宫市| 衡东县| 涞水县| 原阳县| 巴林左旗| 萍乡市| 昭觉县| 平和县| 丽江市| 陵川县| 新化县| 四会市| 离岛区| 江阴市| 界首市| 库伦旗| 九江市| 洪湖市| 渝中区| 马龙县| 台北市| 阿坝| 怀远县| 沂源县|