Archiving data (semantic web, business, etc.) in XML

The other night I needed some data that I had processed a few years ago - no problem; I have been archiving data in adhoc XML documents for years. I say adhoc because I usually don't use a DTD or Schema to define structure or to validate XML - instead, I write a program that collects and/or processes data and writes directly to well formed XML files - format determined by the application.

The important thing is that I can look at an old XML data file, see the format that I used, and in a minute or two have a little code that uses a SAX type parser to get out what I need. I have used XML files for:
  • Data scraped from the web matching board of directors members with companies (used for an experiment to detect interlocking board members)
  • Data form the CIA World Fact Book for countries
  • US State and city names
  • Categorization data from training on the 2 gigabyte Reuter's news story corpus
  • etc.
I used to keep data in a relational database - handy for adhoc queries, etc., but now I favor simply archiving interesting data in XML files.

I have thought about setting up a repository of free interesting data in XML - hopefully if I share with others then I will get some interesting stuff back in return. That is on my to-do list :-)

Comments

Popular posts from this blog

Ruby Sinatra web apps with background work threads

Time and Attention Fragmentation in Our Digital Lives

My Dad's work with Robert Oppenheimer and Edward Teller