Using Lucene with JRuby

I use the Ruby Ferret indexing and search library a lot. Ferret is a port (some Ruby, mostly C) of Lucene. I have recently been getting into using JRuby. A few days ago, I discovered that it was reasonable easy to run a simple Rails web application using the Java application server JBoss using JRuby (this took me an hour - next time will be easy). Today, I spent a short while getting Lucene and JRuby working together:

require "java"
require "lib/lucene-core-2.1.0.jar"

class Lucene
  @index_path = nil
  def initialize(an_index_path = "data/")
    @index_path = an_index_path
  end
  def add_documents id_text_pair_array # e.g., [[1,"test1"],[2,'test2']]
    index_available = org.apache.lucene.index.IndexReader.index_exists(@index_path)
    index_writer = org.apache.lucene.index.IndexWriter.new(
          @index_path,
          org.apache.lucene.analysis.standard.StandardAnalyzer.new,
          !index_available)
    id_text_pair_array.each {|id_text_pair|
      term_to_delete = org.apache.lucene.index.Term.new("id", id_text_pair[0].to_s) # if it exists
      a_document = org.apache.lucene.document.Document.new
      a_document.add(org.apache.lucene.document.Field.new('text', id_text_pair[1],
                       org.apache.lucene.document.Field::Store::YES,
                       org.apache.lucene.document.Field::Index::TOKENIZED))
      a_document.add(org.apache.lucene.document.Field.new('id', id_text_pair[0].to_s,
                       org.apache.lucene.document.Field::Store::YES,
                       org.apache.lucene.document.Field::Index::TOKENIZED))
      index_writer.updateDocument(term_to_delete, a_document) # delete any old docs with same id
    }
    index_writer.close
  end
  def search(query)
    parse_query = org.apache.lucene.queryParser.QueryParser.new(
         'text',
         org.apache.lucene.analysis.standard.StandardAnalyzer.new)
    query = parse_query.parse(query)
    engine = org.apache.lucene.search.IndexSearcher.new(@index_path)
    hits = engine.search(query).iterator
    results = []
    while (hits.hasNext && hit = hits.next)
      id = hit.getDocument.getField("id").stringValue.to_i
      text = hit.getDocument.getField("text").stringValue
      results << [hit.getScore, id, text]
    end
    engine.close
    results
  end
  def delete_documents id_array # e.g., [1,5,88]
    index_available = org.apache.lucene.index.IndexReader.index_exists(@index_path)
    index_writer = org.apache.lucene.index.IndexWriter.new(
          @index_path,
          org.apache.lucene.analysis.standard.StandardAnalyzer.new,
          !index_available)
    id_array.each {|id|
      index_writer.deleteDocuments(org.apache.lucene.index.Term.new("id", id.to_s))
    }
    index_writer.close
  end
end

This code assumes that the Java Lucence JAR file lucene-core-2.1.0.jar is in the subdirectory lib. A short test program is:

require "lucene"
require 'pp'

ls = Lucene.new
ls.add_documents([[1,"test one two"],[2,'testing 1 2 3'], [3,'this is a longer test string']])
ls.delete_documents([1])  # optional: test document delete from index
pp ls.search("test")

I had some hesitations about JRuby: I was concerned that using JRuby would lack the light weight feel of hacking in native Ruby. No worries though: JRuby is easy and quick to work with.

Search This Blog

Using Lucene with JRuby

Comments

Post a Comment

Popular posts from this blog

I am moving back to the Google platform, less excited by what Apple is offering

My Dad's work with Robert Oppenheimer and Edward Teller

Time and Attention Fragmentation in Our Digital Lives

Clojure vs. Scala smackdown

Nice: OpenCyc version 4.0 has been released

Ruby Sinatra web apps with background work threads

Small example app using Ember.js and Node.js

Writing a simple SQL data source for the free LGPL version of SmartGWT

Using the Datomic free edition in a lein based project

Comparing Clojure + Clojurescript with Scala + Scala.js

Happy New Year

And the best JVM replacement language for Java is: Java?

History in the making: first Lee Sedol vs. AlphaGo match game