Reading two good books on using MapReduce algorithms for large scale text processing

July 04, 2010

I have a fair amount of experience with Hadoop, but little experience with associated tools like Pig and Mahout. I can spend more time with Pig in my local sandbox but I wanted more formal help getting up to speed with Mahout and general MapReduce application programming. I purchased the MEAP for Mahout In Action, reading new chapters as they are available. The authors (especially Robin Anil) have been very helpful on the online forum for the book, and I have found the material to be useful and interesting.

Another book I bought was just delivered yesterday morning: Data-Intensive Text Processing with MapReduce. I have only read the first few chapters but the book has been very interesting and informative.

I have done some work based on Hadoop for about half the customers I have had in the last year and a half, and I believe that knowing how to horizontally scale out machine learning and text analytics applications has become a must-have skill.

Comments

Alex Ott2:34 AM
last beta of Data-Intensive Text Processing with MapReduce is also available online at http://www.umiacs.umd.edu/%7Ejimmylin/book.html
ReplyDelete
Replies
Mark Watson, author and consultant3:25 AM
Thanks Alex! Good link.
ReplyDelete
Replies
Alex Ott8:50 AM
Mark, have you heard about Cascalog - http://nathanmarz.com/blog/introducing-cascalog/ ? It built with Clojure on top of Cascading and allows easily write queries against data in Hadoop

P.S. I'm currently experimenting with same technologies - mahout, etc. ;-)
ReplyDelete
Replies
Mark Watson, author and consultant8:12 AM
I have looked at Cascalog, but not actually tried it.
ReplyDelete
Replies

Add comment

Search This Blog

Reading two good books on using MapReduce algorithms for large scale text processing

Comments

Post a Comment

Popular posts from this blog

I am moving back to the Google platform, less excited by what Apple is offering

AI update: The new Deepseek-R1 reasoning language model, Bytedance's Trae IDE, and my new book

Wonderful book: "Land of Lisp" - Conrad Barski is a great author and communicator

Clojure vs. Scala smackdown

Nice: OpenCyc version 4.0 has been released

Ruby Sinatra web apps with background work threads

Writing a simple SQL data source for the free LGPL version of SmartGWT

Small example app using Ember.js and Node.js

Using the Datomic free edition in a lein based project

And the best JVM replacement language for Java is: Java?

Comparing Clojure + Clojurescript with Scala + Scala.js

Happy New Year

History in the making: first Lee Sedol vs. AlphaGo match game