November 22, 2009
This is the first in a series of articles detailing the syntax, patterns, and use cases for MongoDB’s aggregation functions:
Here, as an introduction, we cover the basics of count(), distinct(), and group().
count() simply returns the number of documents in a collection.
Suppose we have a collection where each document represents a pageview. We can get the total number of pageviews like so:
// Javascript
db.pageviews.count()
# Ruby @db['pageviews'].count
Admittedly more useful would be the number of pageviews from a given month:
// Javascript db.pageviews.count({date: {'$gte': new Date("Nov 1, 2009"), '$lt': new Date("Dec 1, 2009")}})
# Ruby @db['pageviews'].find({:date => {'$gte' => Time.utc(2009, 11), '$lt' => Time.utc(2009, 12)}}).count
Notice that we’ve passed a document selector to the count() method. This ensures that only documents matching the selector are included in the total.
Another simple but useful aggregator is distinct(), which returns an array of unique values for a given key in a collection. We can specify a root-level key or a nested key. Here, we request unique ip-addresses and user agents:
// Javascript db.pageviews.distinct('ip-address') db.pageviews.distinct({'user.agent'})
# Ruby @db['pageviews'].distinct('ip-address') @db['pageviews'].distinct('user.agent')
MongoDB’s group() command provides some of the same functionality as SQL’s GROUP BY, although the syntax is rather different. Like most database commands, group() takes a document whose keys designate the parameters of the operation:
// Group by user-agent, returning a sum for each user-agent. db.pageviews.group( { key: {'user.agent': true}, initial: {sum: 0}, reduce: function(doc, prev) { prev.sum += 1} }); // Returns (Note: simplified) [ { 'user.agent': 'Mozilla 5.0 / Gecko' 'sum': 241 }, { 'user.agent': 'Mozilla 5.0 / Webkit' 'sum': 79 } ]
Looking at the command itself, the first parameter is clear enough: key specifies which key or keys to group by.
The next two parameters, initial and reduce, may be less familiar. Their use is analgous to most programming language implementations of inject.
With initial, we provide a base, aggregator document for each grouped result. Here, we’re just saying that the base document should contain one key, sum, having a value of 0.
// Initial, aggregator document {sum: 0}
The reduce function take two parameters: 1) the current document being processed, and 2) the document we’re aggregating over (which starts out as the initial document described above).
// Reduce function function(doc, prev) { prev.sum += 1; }
In this simple case, we don’t even use the doc parameter; we just increment our initial document’s sum key. Since a new document is used for each grouping, the group() function returns all those documents as an array, adding any keys we’re grouping by, so that we’re given a result like this:
// Result (Note: simplified) [ { 'user.agent': 'Mozilla 5.0 / Gecko' 'sum': 241 }, { 'user.agent': 'Mozilla 5.0 / Webkit' 'sum': 79 } ]
The command we’ve been describing could be run from Ruby as follows:
# Group from Ruby @db['pageviews'].group(['user.agent'], nil, {'sum' => 0}, "function(doc, prev) { prev.sum += 1}")
With the second parameter, here nil, we can provide a query selector, so that group() will only operate over a certain subset of the collection. In the next installment, we explore that along with some of group()’s more advanced features.
In the meantime, fire the up the MongoDB shell, experiment with some of these functions, and don’t hesitate to consult the docs.