MongoDB has good support for indexing and search, including prefix matching for AJAX completion lists

I have been spoiled by great support for indexing and search in relational databases (e.g., Sphinx, native search in PostgreSQL and MySQL, etc.)

I was pleased to discover, after a little bit of hacking this morning, how easy it is to do indexing and search using the MongoDB document-centered database. I have two common use cases for search, and MongoDB seems to handle both of them fairly well:
  • Search for words inside of text fields
  • Efficient word prefix search to support AJAX "suggest" style lists
My approach does require combining search results for multiple search terms in application code, but that is OK. Assuming the use of MongoRecord, here is a code snippet:
class Recipe < MongoRecord::Base
collection_name :recipes
fields :name, :directions, :words
def to_s
"recipe: #{name} directions: #{directions[0..20]}..."
end
def Recipe.make collection, name, directions
collection.insert({:_id => Mongo::ObjectID.new, :name => name,
:directions => directions,
:words => (name + ' ' + directions).split.uniq})
end
end

host = 'localhost'
port = Mongo::Connection::DEFAULT_PORT
MongoRecord::Base.connection = Mongo::Connection.new(host,port).db('mongorecord-test')

db = MongoRecord::Base.connection

coll = db.collection('recipes')
coll.remove({})

coll.create_index(:words, Mongo::ASCENDING)

Recipe.make coll, 'Rice Soup', 'Cook the rice, then add extra water to thin it out.'
Recipe.make coll, 'Cheese and Rice Crackers', 'Slice the cheese and layer on top of crackers.'

puts "\nSimple find"
puts Recipe.find_by_name(:name => 'Rice Soup').to_s

puts "\nFind recipe by regular expression (ignoring case) in array of words /water/i"
Recipe.find(:all, :conditions => {:words => /^water/i}).each { |row| puts row.to_s }
According to the MongoDB documentation, a regular expression match like /^water/i will use an index just as a relational database match in the form like 'water%' does.

I am still in a learning mode with MongoDB, so I would appreciate any comments on improving this aproach.

Comments

  1. You could lowercase the words and not use the /i in the regex. I'm not sure if this saves much, but it prevents the database from doing extra work.

    Also, for multiple words that require an AND search. You can loop through the words and each subsequent word search can use the $in to search within the list returned by the previous search.

    ReplyDelete
  2. 7/12/2010: I wrote up an alternative on my Ruby blog: http://www.rubyplanet.net/2010/07/another-way-to-do-text-search-on.html

    ReplyDelete
  3. Please let me point you to an alternative that MongoDB might serve as well. As you restrict completion to prefixes of words you could keep all prefixes and their definite completions as hash. Using definite completions allows ajax completion to start efficiently from the first character with fast response times and a minimum of data to transfer. By the way you get rid of the input field, like http://bu4.taipudex.com/pinyin.htm , that implements a dictionary as dynamic menu, more http://taipudex.com

    ReplyDelete
  4. Hello Zawuni. I downloaded the Trie type TCL software in your links. Looks interesting enough but solves a very different problem that a MongoDB data store. Your Trie based system seems to be hand crafted, for a topic domain, for completion suggestions.

    BTW, I usually delete user comments that mostly serve as an advertising link to the user's business. I am going to leave your comment in place right now, but if at some time in the future I delete your comment, you will know why.

    ReplyDelete
  5. For multiple words that require and AND search, Adnan's suggestion is not the most performant. To do it in a single query, use $all:

    db.things.find( { a: { $all: [ 2, 3 ] } } );

    http://www.mongodb.org/display/DOCS/Advanced+Queries#AdvancedQueries-%24all

    (That's the raw Mongo API; I don't know about MongoRecord.)

    ReplyDelete

Post a Comment

Popular posts from this blog

My Dad's work with Robert Oppenheimer and Edward Teller

Time and Attention Fragmentation in Our Digital Lives

I am moving back to the Google platform, less excited by what Apple is offering