官术网_书友最值得收藏!

  • Mastering MongoDB 3.x
  • Alex Giamas
  • 325字
  • 2021-08-20 10:10:57

Aggregation versus MapReduce

In MongoDB, we can essentially get data out of our database using three methods: querying, the aggregation framework, and MapReduce. All three of them can be chained to each other and many times it is useful to do so; however it's important to understand when we should use aggregation and when MapReduce may be a better alternative.

We can use both aggregation and MapReduce with sharded databases.

Aggregation is based on the concept of a pipeline. As such, it's important to be able to model our data from input to final output, in a series of transformations and processing that can get us there. It's also mostly useful when our intermediate results can be used on their own, or feed parallel pipelines. Our operations are limited by the operators that we have available from MongoDB so it's important to make sure that we can calculate all the results we need using available commands.

MapReduce on the other hand, can be used to construct pipelines by chaining the output of one MapReduce job to the input of the next one via an intermediate collection but this is not its primary purpose.

MapReduce's most common use case is to periodically calculate aggregations for large datasets. Having MongoDB's querying in place we can incrementally calculate these aggregations without the need to scan through the whole input table every time. In addition, its power comes from its flexibility as we can define mappers and reducers in JavaScript with the full flexibility of the language when calculating intermediate results. Not having the operators that the aggregation framework provides us, we have to implement them on our own.

In many cases, the answer is not either/or. We can (and should) use the aggregation framework to construct our ETL pipeline and resort to MapReduce for the parts that are not yet supported sufficiently by it.

A complete use case with aggregation and MapReduce is provided in Chapter 5, Aggregation.

主站蜘蛛池模板: 麻栗坡县| 莱州市| 福州市| 治多县| 乌兰察布市| 乌兰县| 永靖县| 西乌珠穆沁旗| 梧州市| 大田县| 西乌珠穆沁旗| 始兴县| 大庆市| 平罗县| 海丰县| 上犹县| 彰化县| 湖北省| 连南| 绿春县| 新建县| 南昌县| 南康市| 苍梧县| 大荔县| 澄江县| 攀枝花市| 柘城县| 略阳县| 姜堰市| 商河县| 白河县| 从化市| 铜梁县| 滨州市| 安丘市| 建平县| 格尔木市| 潮州市| 桓台县| 五寨县|