Complexity of Java code for reading documents vs. Microsoft documents

I have spent more time than I would like to admit writing Java code to pull plain text from Microsoft Word, PowerPoint, etc. files. This morning, I added support for reading documents to my Knowledge Management system: easy!

It took about 15 minutes of coding: used the ZipFile API to read the top level document file, and found the ZIP entry labeled "content.xml", got an input stream for this ZIP entry, fed it to a custom SAX parser class that simply aggregated character data inside <text:p> tags.


Popular posts from this blog

Custom built SBCL and using spaCy and TensorFlow in Common Lisp

I have tried to take advantage of extra time during the COVID-19 pandemic

GANs and other deep learning models for cooking recipes