I open sourced my Java KBtextmaster project

KBtextmaster reads a variety of document formats (Word, Powerpoint, PDF, OpenOffice.org, AbiWord) and performs categorization, summarization, part of speech tagging, document clustering, and indexing/search using Lucene.

You can get it here. It is released under the GPL, with alternative licenses available if the GPL does not work for your project.

Comments

Popular posts from this blog

Ruby Sinatra web apps with background work threads

My Dad's work with Robert Oppenheimer and Edward Teller

Time and Attention Fragmentation in Our Digital Lives