Complexity of Java code for reading OpenOffice.org documents vs. Microsoft documents

I have spent more time than I would like to admit writing Java code to pull plain text from Microsoft Word, PowerPoint, etc. files. This morning, I added support for reading OpenOffice.org documents to my Knowledge Management system: easy!

It took about 15 minutes of coding: used the ZipFile API to read the top level document file, and found the ZIP entry labeled "content.xml", got an input stream for this ZIP entry, fed it to a custom SAX parser class that simply aggregated character data inside <text:p> tags.

Comments

Popular posts from this blog

Ruby Sinatra web apps with background work threads

Time and Attention Fragmentation in Our Digital Lives

My Dad's work with Robert Oppenheimer and Edward Teller