MongoDB in Three Minutes

November 05, 2009

Couple nights back, I was given three minutes to demonstrate MongoDB before a somewhat large group of people who’d never heard of it. Source code is on github. At one minute each, the highlights are these:

1. Schema-free documents.

MongoDB is schema-free; this means that the structure of MongoDB data need not be defined up front. MongoDB stores data as collections of documents. A document can be thought of as a JSON object, Python dictionary, or Ruby hash (among other things). Documents are natural, elegant ways of representing data, and are of the essence of MongoDB.

Suppose we want to download a few tweets.

# DB connection and collection
@db  = Mongo::Connection.new.db(DATABASE_NAME)
@nyc = @db.collection('nyc')
 
(1..5).each do |page|
  Twitter::Search.new('nyc').page(page).each do |tweet|
    @nyc.save(tweet)
  end
end

That gets us the first five pages of ‘nyc’-related tweets and saves them to the our ‘nyc’ collection in the db. For each tweet, the Twitter gem returns a Ruby hash, which saves naturally to the database.

2. Dynamic queries.

MongoDB speaks the language of documents, enabling expressive queries. To take a few examples:

We can query for a specific key:

  @nyc = DB.collection('nyc')
  @nyc.find(:username => "hwaet")  

Or on a nested key pointing to an array:

  @nyc.find('user.followers' => {'234352343'})  

We can search a field using a regular expression:

  @nyc.find('text' => /z.+/})  

Or query across a date range:

  @nyc.find('created_at' => {'$lte' => Time.now - (60*60*24)}})  

And all of this can be efficient because each collection can define up to 40 secondary indexes:

  @nyc.create_index(['text', 1])
  @nyc.create_index(['user.followers', 1])
  @nyc.create_index([from_user, 1], ['created_at', -1])

3. Binary Storage

What if we want to store images, videos, or music? MongoDB stores arbitrary binary data, too.

This code goes through our collection of tweets, fetches each user’s profile image, and saves it to the database using GridFS (i.e., MongoDB’s specification for storing larger binary objects).

@nyc.find.each do |tweet|
  filename = tweet['from_user'].downcase + ".jpg"
  next if GridStore.exist?(@db, filename)
 
  GridStore.open @db, filename, 'w+' do |file|
    data = open(tweet['profile_image_url']).read
    file.content_type = 'image/jpeg'
    file.puts data
  end
end

If we want to serve those images with sinatra:

get '/images/:id' do
  content_type "image/jpeg"
  filename = params[:id].downcase + ".jpg"
  GridStore.read(DB, filename)
end

Speed, scalability, ease of use…

All this is to say nothing of production case studies or paths to scalability. Interested readers are encouraged to download a binary and start experimenting.

funky dingbat
blog comments powered by Disqus