官术网_书友最值得收藏!

Incremental MapReduce

Incremental MapReduce is a pattern where we use MapReduce to aggregate to previously calculated values. An example would be counting non-distinct users in a collection for different reporting periods (that is, hour, day, month) without the need to recalculate the result every hour.

To set up our data for incremental MapReduce we need to do the following:

  • Output our reduce data to a different collection
  • At the end of every hour, query only for the data that got into the collection in the last hour
  • With the output of our reduce data, merge our results with the calculated results from the previous hour

Following up on the previous example, let's assume that we have a published field in each of the documents, with our input dataset being:

> db.books.find()
{ "_id" : ObjectId("592149c4aabac953a3a1e31e"), "isbn" : "101", "name" : "Mastering MongoDB", "price" : 30, "published" : ISODate("2017-06-25T00:00:00Z") }
{ "_id" : ObjectId("59214bc1aabac954263b24e0"), "isbn" : "102", "name" : "MongoDB in 7 years", "price" : 50, "published" : ISODate("2017-06-26T00:00:00Z") }

Using our previous example of counting books we would get the following:

var mapper = function() {
emit(this.id, 1);
};
var reducer = function(id, count) {
return Array.sum(count);
};
> db.books.mapReduce(mapper, reducer, { out: "books_count" })
{
"result" : "books_count",
"timeMillis" : 16700,
"counts" : {
"input" : 2,
"emit" : 2,
"reduce" : 1,
"output" : 1
},
"ok" : 1
}
> db.books_count.find()
{ "_id" : null, "value" : 2 }

Now we get a third book in our mongo_books collection with a document:

{ "_id" : ObjectId("59214bc1aabac954263b24e1"), "isbn" : "103", "name" : "MongoDB for experts", "price" : 40, "published" : ISODate("2017-07-01T00:00:00Z") }
> db.books.mapReduce( mapper, reducer, { query: { published: { $gte: ISODate('2017-07-01 00:00:00') } }, out: { reduce: "books_count" } } )
> db.books_count.find()
{ "_id" : null, "value" : 3 }

What happened here, is that by querying for documents in July 2017 we only got the new document out of the query and then used its value to reduce the value with the already calculated value of 2 in our books_count document, adding 1 to the final sum of three documents.

This example, as contrived as it is, shows a powerful attribute of MapReduce: the ability to re-reduce results to incrementally calculate aggregations over time.

主站蜘蛛池模板: 广昌县| 蒲城县| 汉寿县| 舞阳县| 鄂伦春自治旗| 沛县| 中超| 汕尾市| 鄂托克前旗| 遵化市| 察雅县| 浪卡子县| 长汀县| 平顺县| 炉霍县| 四平市| 凯里市| 苍溪县| 吉木乃县| 宜宾市| 临西县| 江北区| 成安县| 诸城市| 六枝特区| 富顺县| 子长县| 原平市| 丰县| 福州市| 天台县| 利津县| 黄冈市| 高要市| 大宁县| 新巴尔虎右旗| 百色市| 山阳县| 西乌| 乌兰县| 大余县|