Extracting text from a documents

I am happy to see that the Apache POI project's new POI 3.5.1 beta 1 is supporting some OpenOffice.org document formats. I have been using POI for years to access the contents of Microsoft Office documents from Java applications. It is great to have one library that supports most document types that I need to work with. POI is also usable with JRuby or with RUBY using the POI-Ruby sub-project (requires compiling POI with gjc and then using SWIG). BTW, I have a Ruby library that I wrote about 4 years ago on my Open Source web page for working with OpenOffice.org, Word, and AbiWord documents if you want something simple and hackable.


Popular posts from this blog

Custom built SBCL and using spaCy and TensorFlow in Common Lisp

I have tried to take advantage of extra time during the COVID-19 pandemic

GANs and other deep learning models for cooking recipes