官术网_书友最值得收藏!

How it works...

In this recipe, we implement a Command-Line Interface (CLI) program that reads events from the data lake S3 bucket and sends them to a specific AWS Lambda function. When replaying events, we do not re-publish the events because this would broadcast the events to all subscribers. Instead, we want to replay events to a specific function to either repair the specific service or seed a new service.

When executing the program, we provide the name of the data lake bucket and the specific path prefix as arguments. The prefix allows us to replay only a portion of the events, such as a specific month, day, or hour. The program uses functional reactive programming with the Highland.js library. We use a generator function to page through the objects in the bucket and push each object down the stream. Backpressure is a major advantage of this programming approach, as we will discuss in Chapter 8, Designing for Failure. If we retrieved all the data from the bucket in a loop, as we would in the imperative programming style, then we would likely run out of memory and/or overwhelm the Lambda function and receive throttling errors.

Instead, we pull data through the stream. When downstream steps are ready for more work they pull the next piece of data. This triggers the generator function to paginate data from S3 when the program is ready for more data.

When storing events in the data lake bucket, Kinesis Firehose buffers the events until a maximum amount of time is reached or a maximum file size is reached. This buffering maximizes the write performance when saving the events. When transforming the data for these files, we delimited the events with an EOL character. Therefore, when we get a specific file, we leverage the Highland.js split function to stream each row in the file one at a time. The split function also supports backpressure.

For each event, we invoke the function specified in the command-line arguments. These functions are designed to listen for events from a Kinesis stream. Therefore, we must wrap each event in the Kinesis input format that these functions are expecting. This is one reason why we included the Kinesis metadata when saving the events to the data lake in the Creating a data lake recipe. To maximize throughput, we invoke the Lambda asynchronously with the Event InvocationType, provided that the payload size is within the limits. Otherwise, we invoke the Lambda synchronously with the RequestReponse InvocationType. We also leverage the Lambda DryRun feature so that we can see what events might be replayed before actually effecting the change.

主站蜘蛛池模板: 玛沁县| 定远县| 淮南市| 桃园市| 红河县| 大石桥市| 大兴区| 嘉黎县| 积石山| 通山县| 崇义县| 武陟县| 鹤峰县| 崇州市| 广宗县| 五家渠市| 石楼县| 汾西县| 德令哈市| 甘谷县| 尼木县| 建宁县| 丹寨县| 微山县| 塘沽区| 个旧市| 肥东县| 南召县| 平南县| 峨边| 旺苍县| 息烽县| 措勤县| 息烽县| 白银市| 依安县| 土默特左旗| 嘉黎县| 建宁县| 闸北区| 鄂温|