I open sourced my Java KBtextmaster project

KBtextmaster reads a variety of document formats (Word, Powerpoint, PDF, OpenOffice.org, AbiWord) and performs categorization, summarization, part of speech tagging, document clustering, and indexing/search using Lucene.

You can get it here. It is released under the GPL, with alternative licenses available if the GPL does not work for your project.


Popular posts from this blog

Custom built SBCL and using spaCy and TensorFlow in Common Lisp

I have tried to take advantage of extra time during the COVID-19 pandemic

GANs and other deep learning models for cooking recipes