Complexity of Java code for reading documents vs. Microsoft documents

I have spent more time than I would like to admit writing Java code to pull plain text from Microsoft Word, PowerPoint, etc. files. This morning, I added support for reading documents to my Knowledge Management system: easy!

It took about 15 minutes of coding: used the ZipFile API to read the top level document file, and found the ZIP entry labeled "content.xml", got an input stream for this ZIP entry, fed it to a custom SAX parser class that simply aggregated character data inside <text:p> tags.


Popular posts from this blog

Ruby Sinatra web apps with background work threads

My Dad's work with Robert Oppenheimer and Edward Teller

Time and Attention Fragmentation in Our Digital Lives