Using Lucene with JRuby

I use the Ruby Ferret indexing and search library a lot. Ferret is a port (some Ruby, mostly C) of Lucene. I have recently been getting into using JRuby. A few days ago, I discovered that it was reasonable easy to run a simple Rails web application using the Java application server JBoss using JRuby (this took me an hour - next time will be easy). Today, I spent a short while getting Lucene and JRuby working together:

require "java"
require "lib/lucene-core-2.1.0.jar"

class Lucene
  @index_path = nil
  def initialize(an_index_path = "data/")
    @index_path = an_index_path
  end
  def add_documents id_text_pair_array # e.g., [[1,"test1"],[2,'test2']]
    index_available = org.apache.lucene.index.IndexReader.index_exists(@index_path)
    index_writer = org.apache.lucene.index.IndexWriter.new(
          @index_path,
          org.apache.lucene.analysis.standard.StandardAnalyzer.new,
          !index_available)
    id_text_pair_array.each {|id_text_pair|
      term_to_delete = org.apache.lucene.index.Term.new("id", id_text_pair[0].to_s) # if it exists
      a_document = org.apache.lucene.document.Document.new
      a_document.add(org.apache.lucene.document.Field.new('text', id_text_pair[1],
                       org.apache.lucene.document.Field::Store::YES,
                       org.apache.lucene.document.Field::Index::TOKENIZED))
      a_document.add(org.apache.lucene.document.Field.new('id', id_text_pair[0].to_s,
                       org.apache.lucene.document.Field::Store::YES,
                       org.apache.lucene.document.Field::Index::TOKENIZED))
      index_writer.updateDocument(term_to_delete, a_document) # delete any old docs with same id
    }
    index_writer.close
  end
  def search(query)
    parse_query = org.apache.lucene.queryParser.QueryParser.new(
         'text',
         org.apache.lucene.analysis.standard.StandardAnalyzer.new)
    query = parse_query.parse(query)
    engine = org.apache.lucene.search.IndexSearcher.new(@index_path)
    hits = engine.search(query).iterator
    results = []
    while (hits.hasNext && hit = hits.next)
      id = hit.getDocument.getField("id").stringValue.to_i
      text = hit.getDocument.getField("text").stringValue
      results << [hit.getScore, id, text]
    end
    engine.close
    results
  end
  def delete_documents id_array # e.g., [1,5,88]
    index_available = org.apache.lucene.index.IndexReader.index_exists(@index_path)
    index_writer = org.apache.lucene.index.IndexWriter.new(
          @index_path,
          org.apache.lucene.analysis.standard.StandardAnalyzer.new,
          !index_available)
    id_array.each {|id|
      index_writer.deleteDocuments(org.apache.lucene.index.Term.new("id", id.to_s))
    }
    index_writer.close
  end
end

This code assumes that the Java Lucence JAR file lucene-core-2.1.0.jar is in the subdirectory lib. A short test program is:

require "lucene"
require 'pp'

ls = Lucene.new
ls.add_documents([[1,"test one two"],[2,'testing 1 2 3'], [3,'this is a longer test string']])
ls.delete_documents([1])  # optional: test document delete from index
pp ls.search("test")

I had some hesitations about JRuby: I was concerned that using JRuby would lack the light weight feel of hacking in native Ruby. No worries though: JRuby is easy and quick to work with.

Search This Blog

Using Lucene with JRuby

Comments

Post a Comment

Popular posts from this blog

I am moving back to the Google platform, less excited by what Apple is offering

AI update: The new Deepseek-R1 reasoning language model, Bytedance's Trae IDE, and my new book

And the best JVM replacement language for Java is: Java?

Clojure vs. Scala smackdown

Nice: OpenCyc version 4.0 has been released

Ruby Sinatra web apps with background work threads

Writing a simple SQL data source for the free LGPL version of SmartGWT

Small example app using Ember.js and Node.js

Using the Datomic free edition in a lein based project

And the best JVM replacement language for Java is: Java?

Comparing Clojure + Clojurescript with Scala + Scala.js

Happy New Year

History in the making: first Lee Sedol vs. AlphaGo match game