Complexity of Java code for reading OpenOffice.org documents vs. Microsoft documents

June 05, 2004

I have spent more time than I would like to admit writing Java code to pull plain text from Microsoft Word, PowerPoint, etc. files. This morning, I added support for reading OpenOffice.org documents to my Knowledge Management system: easy!

It took about 15 minutes of coding: used the ZipFile API to read the top level document file, and found the ZIP entry labeled "content.xml", got an input stream for this ZIP entry, fed it to a custom SAX parser class that simply aggregated character data inside <text:p> tags.

Search This Blog

Complexity of Java code for reading OpenOffice.org documents vs. Microsoft documents

Comments

Post a Comment

Popular posts from this blog

I am moving back to the Google platform, less excited by what Apple is offering

Getting closer to AGI? Google's NoteBookLM and Replit's AI Coding Agent

My Dad's work with Robert Oppenheimer and Edward Teller

Clojure vs. Scala smackdown

Nice: OpenCyc version 4.0 has been released

Ruby Sinatra web apps with background work threads

Small example app using Ember.js and Node.js

Writing a simple SQL data source for the free LGPL version of SmartGWT

Using the Datomic free edition in a lein based project

Comparing Clojure + Clojurescript with Scala + Scala.js

And the best JVM replacement language for Java is: Java?

Happy New Year

History in the making: first Lee Sedol vs. AlphaGo match game