Nutch: a platform is born

I have used Nutch for two contracting jobs and Lucene for many jobs. Until today, I have viewed Nutch simply as:
  • Quick to configure for target websites to spider and to administer spidering
  • Trivial to run search web application
  • Web service provider (OpenSearch API)
Today however I started looking more closely at the underlying Hadoop architecture (like the distributed Google file system and their map reduce client library) and at both the available plugins and the plugin architecture. New opinion: Nutch is a platform for building more complex web applications and knowledge management applications.

Comments

Popular posts from this blog

My Dad's work with Robert Oppenheimer and Edward Teller

Time and Attention Fragmentation in Our Digital Lives

Ruby Sinatra web apps with background work threads