I open sourced my Java KBtextmaster project

KBtextmaster reads a variety of document formats (Word, Powerpoint, PDF, OpenOffice.org, AbiWord) and performs categorization, summarization, part of speech tagging, document clustering, and indexing/search using Lucene.

You can get it here. It is released under the GPL, with alternative licenses available if the GPL does not work for your project.

Comments

Popular posts from this blog

My Dad's work with Robert Oppenheimer and Edward Teller

Time and Attention Fragmentation in Our Digital Lives

I am moving back to the Google platform, less excited by what Apple is offering