Great talk on Spark

January 26, 2016

I just listened to an ACM sponsored talk Making Big Data Processing Simple With Spark by Matei Zaharias. You may need to be an ACM member to watch the webinar. I first joined ACM in the mid 1970s - recommended.

For handling huge datasets Spark is evolutionary or revolutionary depending on your point of view. A bit of personal history before I talk specifically about Spark:

In the late 1980s I was an architect and developer on a multinational project to use seismic data from 38 data collection stations to detect atomic bomb tests. All of our data handling software was custom; if we had Spark, or even Hadoop, we would have saved a ton of effort. Similarly, in the 1990s I was tech lead on a fraud detection system that used massive real time telephone records data sets. Modern infrastructure would have saved a lot of time and money.

My first serious use of map reduce was processing large Twitter data sets at Compass Labs. We used Hadoop on Amazon ElasticMapreduce. Later when I worked as a contractor at Google, in addition to using map reduce, I was introduced to realtime interactive tools like Dremel that made it easy to interactively use large data sets.

With Spark, everyone gets to interactively work with massive datasets! I think that Spark is evolutionary in that it builds on and plugs into existing work like the Hadoop File Sytem and supports familiar map reduce style operations. I think that it is revolutionary in the memory based distributed architecture and application programming model. Spark was designed based on limitations of map reduce systems like Hadoop that while providing easy to use programming models, have ineffiencies in data access. With Spark, you have an easy to use programming model, more efficiency, and built in interactivity. I have examples of using Spark in my last book Power Java. You can experiment with Spark on your laptop and only worry about accessing a cluster when you need to scale.

Search This Blog

Great talk on Spark

Comments

Post a Comment

Popular posts from this blog

I am moving back to the Google platform, less excited by what Apple is offering

Getting closer to AGI? Google's NoteBookLM and Replit's AI Coding Agent

Topics: Recipe: Mark’s African Stew, and converting my Clojure CookingSpace web site to JavaScript

Clojure vs. Scala smackdown

Nice: OpenCyc version 4.0 has been released

Ruby Sinatra web apps with background work threads

Small example app using Ember.js and Node.js

Writing a simple SQL data source for the free LGPL version of SmartGWT

Using the Datomic free edition in a lein based project

Comparing Clojure + Clojurescript with Scala + Scala.js

And the best JVM replacement language for Java is: Java?

Happy New Year

History in the making: first Lee Sedol vs. AlphaGo match game