Good PDF book: "Ferret" by David Balmain

Slashdot had a discussion yesterday on indexing and searching documents - a subject of particular interest to me. After reading the comments, I revisited the indexing and search tools that I have used over the years: Ferret (a Lucene clone) is my favorite library for several reasons: it uses the Lucene API (which I have used for years), it is very fast, and coding in Ruby is faster for me than Java (Lucene) or Common Lisp (Montezuma). I bought Dave's book on Ferret yesterday, and it is a good reference with lots of good examples.

I have a "semi alive" open source project (KBSPortal) written in Java, uses Lucene and my own clustering and analysis libraries. I have been mulling over switching to Ruby and Ruby on Rails because it would be easier developing the web interface, I like to code in Ruby more than Java, and there are some very nice text analysis Ruby Gems that I could use in place of some of my own Java analysis code (in the spirit of building on other people's libraries, when possible, to take advantage of shared work). I get consulting work setting up custom document management systems and I would like to have a complete stack that could be set up and customized in less than a day.

Comments

Popular posts from this blog

My Dad's work with Robert Oppenheimer and Edward Teller

Ruby Sinatra web apps with background work threads

Time and Attention Fragmentation in Our Digital Lives