great video talk: "Innovation in Search and Artificial Intelligence"
There were a lot of gems in this talk, but one that I may put to immediate use is using non-text data in map reduce, specifically using the protocol buffer tools. I have been using Hadoop more frequently and it is worth looking the effects of binary data for intermediate results. His comment that using map reduce is not necessarily incompatible with indexing data was also interesting. There is an overhead for creating indices, but it seems like there are opportunities to use indices for access to global information in a data set while making a complete sweep through the input data set during the map phase.