官术网_书友最值得收藏!

How it works...

In this recipe, we implement a Command-Line Interface (CLI) program that reads events from the data lake S3 bucket and sends them to a specific AWS Lambda function. When replaying events, we do not re-publish the events because this would broadcast the events to all subscribers. Instead, we want to replay events to a specific function to either repair the specific service or seed a new service.

When executing the program, we provide the name of the data lake bucket and the specific path prefix as arguments. The prefix allows us to replay only a portion of the events, such as a specific month, day, or hour. The program uses functional reactive programming with the Highland.js library. We use a generator function to page through the objects in the bucket and push each object down the stream. Backpressure is a major advantage of this programming approach, as we will discuss in Chapter 8, Designing for Failure. If we retrieved all the data from the bucket in a loop, as we would in the imperative programming style, then we would likely run out of memory and/or overwhelm the Lambda function and receive throttling errors.

Instead, we pull data through the stream. When downstream steps are ready for more work they pull the next piece of data. This triggers the generator function to paginate data from S3 when the program is ready for more data.

When storing events in the data lake bucket, Kinesis Firehose buffers the events until a maximum amount of time is reached or a maximum file size is reached. This buffering maximizes the write performance when saving the events. When transforming the data for these files, we delimited the events with an EOL character. Therefore, when we get a specific file, we leverage the Highland.js split function to stream each row in the file one at a time. The split function also supports backpressure.

For each event, we invoke the function specified in the command-line arguments. These functions are designed to listen for events from a Kinesis stream. Therefore, we must wrap each event in the Kinesis input format that these functions are expecting. This is one reason why we included the Kinesis metadata when saving the events to the data lake in the Creating a data lake recipe. To maximize throughput, we invoke the Lambda asynchronously with the Event InvocationType, provided that the payload size is within the limits. Otherwise, we invoke the Lambda synchronously with the RequestReponse InvocationType. We also leverage the Lambda DryRun feature so that we can see what events might be replayed before actually effecting the change.

主站蜘蛛池模板: 伊宁县| 历史| 永新县| 铁岭县| 本溪市| 法库县| 庄浪县| 罗定市| 孟津县| 喀什市| 禹城市| 静海县| 乌鲁木齐县| 饶阳县| 惠东县| 邵武市| 霍城县| 福清市| 墨脱县| 德清县| 太白县| 永州市| 泸西县| 蛟河市| 宁乡县| 邮箱| 金坛市| 桂阳县| 金昌市| 徐水县| 濉溪县| 潢川县| 大新县| 石嘴山市| 武穴市| 得荣县| 灯塔市| 乳山市| 保德县| 江达县| 海阳市|