Globally unique identifiers

August 06, 2006

I really enjoyed listening to Tim Bray's talk on developing the ATOM specification on ITConversations. He made a lot of interesting points, but the one that resonated most was ATOM's requirement for a globally unique identifier for every feed and entry. With more syndication, we all see lots of duplicate material. Examples of duplication can readily be seen on rojo.com (used to be my customer, and I still enjoy their site a lot) and technorati.com: we end up with many URIs that refer to the same material.

It is possible to write software that detects duplicate feeds, but comparing two articles is not an inexpensive operation, and when comparing a very large number of feeds, the O(N^2) runtime is painful. I have experimented with much a less accurate algorithm: hash NGRAMs of articles and check for duplication using a hash lookup. I have found that this gives poor results - at least in my experiments. If you do partial matching of NGRAMs, you are back to O(N^2). (If anyone knows a good way to handle this, let me know :-)

Globally unique identifiers help solve many duplication problems, makes it easier to implement container relations, and in general ATOM just seems to be a better and more scalable platform than RSS 2.0 for complex new applications.

Search This Blog

Globally unique identifiers

Comments

Post a Comment

Popular posts from this blog

I am moving back to the Google platform, less excited by what Apple is offering

Getting closer to AGI? Google's NoteBookLM and Replit's AI Coding Agent

Topics: Recipe: Mark’s African Stew, and converting my Clojure CookingSpace web site to JavaScript

Clojure vs. Scala smackdown

Nice: OpenCyc version 4.0 has been released

Ruby Sinatra web apps with background work threads

Small example app using Ember.js and Node.js

Writing a simple SQL data source for the free LGPL version of SmartGWT

Using the Datomic free edition in a lein based project

Comparing Clojure + Clojurescript with Scala + Scala.js

And the best JVM replacement language for Java is: Java?

Happy New Year

History in the making: first Lee Sedol vs. AlphaGo match game