Archiving data (semantic web, business, etc.) in XML
The other night I needed some data that I had processed a few years ago - no problem; I have been archiving data in adhoc XML documents for years. I say adhoc because I usually don't use a DTD or Schema to define structure or to validate XML - instead, I write a program that collects and/or processes data and writes directly to well formed XML files - format determined by the application.
The important thing is that I can look at an old XML data file, see the format that I used, and in a minute or two have a little code that uses a SAX type parser to get out what I need. I have used XML files for:
I have thought about setting up a repository of free interesting data in XML - hopefully if I share with others then I will get some interesting stuff back in return. That is on my to-do list :-)
The important thing is that I can look at an old XML data file, see the format that I used, and in a minute or two have a little code that uses a SAX type parser to get out what I need. I have used XML files for:
- Data scraped from the web matching board of directors members with companies (used for an experiment to detect interlocking board members)
- Data form the CIA World Fact Book for countries
- US State and city names
- Categorization data from training on the 2 gigabyte Reuter's news story corpus
- etc.
I have thought about setting up a repository of free interesting data in XML - hopefully if I share with others then I will get some interesting stuff back in return. That is on my to-do list :-)
Comments
Post a Comment