Nutch: a platform is born

I have used Nutch for two contracting jobs and Lucene for many jobs. Until today, I have viewed Nutch simply as:
  • Quick to configure for target websites to spider and to administer spidering
  • Trivial to run search web application
  • Web service provider (OpenSearch API)
Today however I started looking more closely at the underlying Hadoop architecture (like the distributed Google file system and their map reduce client library) and at both the available plugins and the plugin architecture. New opinion: Nutch is a platform for building more complex web applications and knowledge management applications.


Popular posts from this blog

Custom built SBCL and using spaCy and TensorFlow in Common Lisp

I have tried to take advantage of extra time during the COVID-19 pandemic

GANs and other deep learning models for cooking recipes