- Serverless Design Patterns and Best Practices
- Brian Zambrano
- 258字
- 2021-08-27 19:12:02
Big data patterns
It may seem confusing that lambda can refer to both AWS Lambda functions as well as a pattern in and of itself. The lambda pattern was born from the need to analyze large amounts of data in real time. Before this, the big data movement, where large batch jobs would run to calculate and recalculate things, was in full swing. The problem faced by this movement was that these batch jobs, in order to get the latest results, would need to spend the majority of their computing resources recalculating metrics on data that hadn't changed.
The lambda pattern, which we will discuss in Chapter 7, Data Processing Using the Lambda Pattern, creates two parallel planes of computation, a batch layer, and a speed layer. The naming of these layers should give you an idea of what they're responsible for.
MapReduce is another well-known and tested paradigm that has been popular in the software world for some time now. Hadoop, arguably the most famous framework for MapReduce, helped to bring this pattern front and center after Google's original MapReduce paper in 2004.
As amazing as Hadoop is as a software system, there are substantial hurdles to overcome in running a production Hadoop cluster of your own. Due to this, systems such as Amazon's Elastic MapReduce (EMR) were developed, which provide on-demand Hadoop jobs to the developer. Still, authoring Hadoop jobs and managing the underlying computing resources can be non-trivial. We'll walk through writing your serverless MapReduce system in Chapter 8, The MapReduce Pattern.