官术网_书友最值得收藏!

Big data patterns

It may seem confusing that lambda can refer to both AWS Lambda functions as well as a pattern in and of itself. The lambda pattern was born from the need to analyze large amounts of data in real time. Before this, the big data movement, where large batch jobs would run to calculate and recalculate things, was in full swing. The problem faced by this movement was that these batch jobs, in order to get the latest results, would need to spend the majority of their computing resources recalculating metrics on data that hadn't changed.

The lambda pattern, which we will discuss in Chapter 7, Data Processing Using the Lambda Pattern, creates two parallel planes of computation, a batch layer, and a speed layer. The naming of these layers should give you an idea of what they're responsible for.

MapReduce is another well-known and tested paradigm that has been popular in the software world for some time now. Hadoop, arguably the most famous framework for MapReduce, helped to bring this pattern front and center after Google's original MapReduce paper in 2004.

As amazing as Hadoop is as a software system, there are substantial hurdles to overcome in running a production Hadoop cluster of your own. Due to this, systems such as Amazon's Elastic MapReduce (EMR) were developed, which provide on-demand Hadoop jobs to the developer. Still, authoring Hadoop jobs and managing the underlying computing resources can be non-trivial. We'll walk through writing your serverless MapReduce system in Chapter 8, The MapReduce Pattern.

主站蜘蛛池模板: 丹凤县| 西盟| 宁海县| 九龙城区| 溧水县| 桐庐县| 湘潭市| 马龙县| 曲水县| 图木舒克市| 黑河市| 鄂托克旗| 马尔康县| 黄平县| 西充县| 景洪市| 都江堰市| 铁力市| 武安市| 腾冲县| 青阳县| 鄂尔多斯市| 务川| 太仆寺旗| 黎平县| 高平市| 洪雅县| 馆陶县| 岳西县| 道孚县| 广饶县| 山丹县| 桐庐县| 南丹县| 灵石县| 即墨市| 凤翔县| 大城县| 金门县| 仁寿县| 边坝县|