官术网_书友最值得收藏!

Incremental MapReduce

Incremental MapReduce is a pattern where we use MapReduce to aggregate to previously calculated values. An example would be counting non-distinct users in a collection for different reporting periods (that is, hour, day, month) without the need to recalculate the result every hour.

To set up our data for incremental MapReduce we need to do the following:

  • Output our reduce data to a different collection
  • At the end of every hour, query only for the data that got into the collection in the last hour
  • With the output of our reduce data, merge our results with the calculated results from the previous hour

Following up on the previous example, let's assume that we have a published field in each of the documents, with our input dataset being:

> db.books.find()
{ "_id" : ObjectId("592149c4aabac953a3a1e31e"), "isbn" : "101", "name" : "Mastering MongoDB", "price" : 30, "published" : ISODate("2017-06-25T00:00:00Z") }
{ "_id" : ObjectId("59214bc1aabac954263b24e0"), "isbn" : "102", "name" : "MongoDB in 7 years", "price" : 50, "published" : ISODate("2017-06-26T00:00:00Z") }

Using our previous example of counting books we would get the following:

var mapper = function() {
emit(this.id, 1);
};
var reducer = function(id, count) {
return Array.sum(count);
};
> db.books.mapReduce(mapper, reducer, { out: "books_count" })
{
"result" : "books_count",
"timeMillis" : 16700,
"counts" : {
"input" : 2,
"emit" : 2,
"reduce" : 1,
"output" : 1
},
"ok" : 1
}
> db.books_count.find()
{ "_id" : null, "value" : 2 }

Now we get a third book in our mongo_books collection with a document:

{ "_id" : ObjectId("59214bc1aabac954263b24e1"), "isbn" : "103", "name" : "MongoDB for experts", "price" : 40, "published" : ISODate("2017-07-01T00:00:00Z") }
> db.books.mapReduce( mapper, reducer, { query: { published: { $gte: ISODate('2017-07-01 00:00:00') } }, out: { reduce: "books_count" } } )
> db.books_count.find()
{ "_id" : null, "value" : 3 }

What happened here, is that by querying for documents in July 2017 we only got the new document out of the query and then used its value to reduce the value with the already calculated value of 2 in our books_count document, adding 1 to the final sum of three documents.

This example, as contrived as it is, shows a powerful attribute of MapReduce: the ability to re-reduce results to incrementally calculate aggregations over time.

主站蜘蛛池模板: 荣成市| 镇巴县| 涪陵区| 滕州市| 内乡县| 阿拉善左旗| 千阳县| 迁安市| 安西县| 治县。| 海兴县| 金寨县| 临夏县| 高碑店市| 惠来县| 京山县| 枣阳市| 墨竹工卡县| 银川市| 忻州市| 泰来县| 涞源县| 静海县| 湘潭市| 彰武县| 鹤庆县| 扶余县| 嘉善县| 晋中市| 巴里| 玉山县| 宝应县| 茶陵县| 巴马| 昌宁县| 沛县| 建德市| 米脂县| 辽宁省| 根河市| 石城县|