官术网_书友最值得收藏!

Big data patterns

It may seem confusing that lambda can refer to both AWS Lambda functions as well as a pattern in and of itself. The lambda pattern was born from the need to analyze large amounts of data in real time. Before this, the big data movement, where large batch jobs would run to calculate and recalculate things, was in full swing. The problem faced by this movement was that these batch jobs, in order to get the latest results, would need to spend the majority of their computing resources recalculating metrics on data that hadn't changed.

The lambda pattern, which we will discuss in Chapter 7, Data Processing Using the Lambda Pattern, creates two parallel planes of computation, a batch layer, and a speed layer. The naming of these layers should give you an idea of what they're responsible for.

MapReduce is another well-known and tested paradigm that has been popular in the software world for some time now. Hadoop, arguably the most famous framework for MapReduce, helped to bring this pattern front and center after Google's original MapReduce paper in 2004.

As amazing as Hadoop is as a software system, there are substantial hurdles to overcome in running a production Hadoop cluster of your own. Due to this, systems such as Amazon's Elastic MapReduce (EMR) were developed, which provide on-demand Hadoop jobs to the developer. Still, authoring Hadoop jobs and managing the underlying computing resources can be non-trivial. We'll walk through writing your serverless MapReduce system in Chapter 8, The MapReduce Pattern.

主站蜘蛛池模板: 商南县| 凤凰县| 子洲县| 兴文县| 五指山市| 合阳县| 噶尔县| 镇雄县| 凤凰县| 塔河县| 达尔| 徐闻县| 涿州市| 波密县| 莱西市| 铜山县| 罗城| 祁东县| 五家渠市| 阳信县| 潜山县| 油尖旺区| 延津县| 泸水县| 饶河县| 西林县| 漳平市| 沐川县| 和林格尔县| 宾川县| 定襄县| 定西市| 望奎县| 呼和浩特市| 志丹县| 祁连县| 秦皇岛市| 安徽省| 隆尧县| 合江县| 桐梓县|