Friday, August 02, 2013

Semantic web and linked data are a form of agile development

I am working on a book on building intelligent systems in JavaScript (1) and I just wrote part of the introduction to the section on semantic web technologies:

In order to understand and process data we must understand the context in which it was created and used. We looked at document oriented data storage in the last chapter. To a large degree documents are an easier source of knowledge to use because they contain some of their own context, especially if they use a specific data schema. In general data items are smaller than documents and some external context is required to make sense of different data sources. The semantic web and linked data technologies provide a way to specify a context for understanding what is in data sources and the associations between data in different sources.

The thing that I like best about semantic web technologies is the support for exploiting data that was originally developed by small teams for specific projects. I view the process as a bottom up process with no need to plan complex schemas and plans for future use of data. Projects can start by solving a specific problem and then the usefulness of data can increase with future reuse. Good software developers learn early in their careers that design and implementation need to be done incrementally. As we develop systems, we get a better understanding of how our systems will be used and how to build them. I believe that this agile software development philosophy can be extended to data science: semantic web and linked data technologies facilitate agile development, allowing us to learn and modify our plans when building data sets and systems that use them.

(1) I prefer other languages like Clojure, Ruby, and Common Lisp, but JavaScript is a practical language for both server and client side development. I use a small subset of JavaScript and have JSLint running in the background while I work: “good enough” with the advantages of ubiquity and good run time performance.

3rd edition of my book just released: “Loving Common Lisp, or the Savvy Programmer’s Secret Weapon”

"Loving Common Lisp, or the Savvy Programmer’s Secret Weapon"

The github repo for the code is here.


Easy setup for A/B Testing with nginx, Clojure + Compojure

Actually, I figured out the following directions for my Clojure + Compojure web apps, but as long as you are using nginx, this would work for Node.js, Rails, Sinatra, etc.

The first thing you need to do is to make two copies of whatever web app you want to perform A/B Testing on, and get two Google Analytics user account tokens _uacct (i.e., the string beginning with “UA-”) tokens, one for each version. I usually use Hiccup, but for adding the Google Analytics Javascript code, I just add it as a string to the common layout file header like (reformatted to fit this page width by adding line breaks):
    [:head [:title "..."]
     (include-css "/css/bootstrap.css")
     (include-css "/css/mark.css")
     (include-css "/css/bootstrap-responsive.css")
The next step is to configure nginx to split requests (hopefully equally!) between both instances of you web app. In the following example, I am assuming that I am running the A test on port 6070 and the B test on port 6072. I modified my nginx.conf file to look like:
upstream backend {

  server {
     listen 80;
     location / {
        proxy_redirect off;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://backend;
    error_page 500 502 503 504  /error.html;
    location = /error.html {
        root  /etc/nginx;
The ip_hash directive is supposed to evenly split requests by requesting IP address. This means that if a user hits your web app from their home and then again at their local coffee shop that the might see both A and B versions of your web app. Other options would be to use a per user device cookie, etc., but I think that randomly assigning version A or B based on a hash of the requesting IP address is sufficient for my needs.

I am just starting to use this scheme for A/B Testing, but it seems to work as expected. I do suggest that when your clone you web app that you keep versions A and B identical for a few days and check the Google Analytics for both account tokens to make sure the statistics for page views, times on pages, etc. are close to being the same for A and B.

After more testing, Google Analytics shows that the nginx ip_hash directive seems to split traffic near perfectly 50% to each A and B versions of my web site.

The 4th edition of my book “Practical Artificial Intelligence Programming with Java” is now available

Buy a copy at Leanpub!

The recommended price is $6 and the minimum price is $3. This includes PDF, Kindle, and iPad/iPhone formats and free updates as I fix any errors and update the book with new material. You may want to look at the github repository for the book example code before purchasing this book to make sure that the material covered in my book will be of interest to you. I will probably update the book in a few weeks after getting feedback from early readers. I am also still working on Clojure and JRuby wrapper for the Java code examples and as I update the code I will frequently push changes to the github repository for the example code.

My version of Ember.js ‘Get Excited Video’ code with Sinatra based service

The Ember.js Get Excited Video has source code for the example in the video but the test data is local on the client side. I forked this project and added a Ruby Sinatra REST service. My version is here. This uses Ember.js version 1.0 RC4. It took me some time get the example code working with a REST service so I hope my forked example will save you some time and effort.

Rest service and client in DART

The DART language looks like a promising way to write rich clients in a high level language. I have been looking at DART and the Ruby to JavaScript compiler Opal as possible (but not likely) substitutes for Clojurescript with Clojure back ends. It took me a little while to get a simple REST service and client working in development mode inside the DART IDE. The following code snippets might save you some time. Here is a simple service that returns some JSON data:
import 'dart:io';
import 'dart:json' as JSON;

main() {
  var port = 8080;
  HttpServer.bind('localhost', port).then((HttpServer server) {
    print('Server started on port: ${port}');
    server.listen((HttpRequest request) {
      var resp = JSON.stringify({
        'name': 'Mark',
        'hobby': 'hiking'}
It is required to set Access-Control-Allow-Origin. Here is the client code (I am not showing the HTML stub that loads the client):
import 'dart:html';
import 'dart:json';

void main() {
  // call the web server asynchronously
  var request = HttpRequest.getString("http://localhost:8080/")

void onDataLoaded(String responseText) {
  var jsonString = responseText;
  Map map = parse(jsonString);
  var name = map["name"];
  var hobby = map["hobby"];
  query("#sample_text_id").text =
      "My name is $name and my hobby is $hobby";
The call to query(…) is similar to a jQuery call. As you might guess, “#sample_text_id” refers to a DOM element from the HTML page with this ID. DART on the client side seems to be very well supported both with components and tooling. I think that DART on the server side is still a work in progress but looks very promising.

Thursday, March 14, 2013

Small example app using Ember.js and Node.js

I have been playing with Ember.js, and generally trying to get a little at better programming in Javascript, a language I have used for years, but for which I am still a novice. I wrote the other day about Small example app using Ember.js and Clojure + Compojure + Noir and I thought I would try replacing the simple Clojure REST backend with an equally simple Node.js backend. The results of this simple exercise are in the github repo emberjs-nodejs. I leave it to you to take a look if you are interested.

I will say that development with Javascript, Ember.js, and Node.js seems very light weight and agile, even though I use IntelliJ for editing and project management. Starting an app takes maybe a second. Compared to, for example, Java + GWT, or even Clojure + Compojure, I find Javascript, Ember.js, and Node.js to be really a light and fun combination. It would be even more fun if I were a better Javascript programmer :-)

Tuesday, March 12, 2013

More Clojure deployment options

I have been very happy running multiple Clojure web apps using the embedded Jetty server using lein trampoline on a large VPS. I start each app on a different port and use nginx to map to each app to its own domain name. Easy and this lets me also adjust the JVM memory individually for each application. This works so well for me that I almost feel guilty trying alternatives :-)

I don't know if I will permanently change my deployment strategy but I am experimenting using Immutant which is a Clojure deployment platform built on top of JBoss AS 7. After installing the lein Immutant plugin and the latest version of Immutant, then you can run the JBoss AS/Immutant using lein, and separately deploy and un-deploy web applications using lein. Pretty slick, but I am still trying to get a grip on interactive development by connecting nREPL (documentation: Interactive development). My usual style of interactive development is pretty nice (I use IntelliJ to edit, keep the web app running with lein run with live loading of changes to Clojure code (including Hiccup, which is my favorite ""markup""), CSS, etc. I am going to give Immutant with nREPL interactive development a long and patient try - not sure what I will be using a month from now: probably not because I prefer stacks that are less complicated. My days of loving heavy weight J2EE apps are over :-)

BTW, I would bet that many of us have suffered a little using a do-it-yourself custom cartridge on Red Hat's OpenShift PaaS. It works, but at least for me, it was painful even with a lot of useful material on the web describing how to do it. This Immuntant based example looks more promising.

I have used Heroku in the past for Clojure web apps but since I usually have 3 or 4 distinct web apps deployed the cost of about $35 each is much more than putting them all on a large memory multiple core VPS. I very much wish that Heroku had a slightly less expensive paid 1 dyno plan that never gets swapped out when idle, causing a loading request delay.

I haven't yet tried Jelastic hosting. Their minimum paid plan with 128MB RAM (that I assume would always be active, so no loading request delays) is about $15/month which sounds fair enough. They deserve a try in the near future :-)

Another option that I have used for both one of my Clojure web apps and a low traffic web app for a customer is to pre-pay for a micro EC2 instance for a monthly cost of about $5. My only problem with this is that EC2 instances are not so reliable, and I feel like I am better off with a quality VPS hosting company. BTW, if you run a Clojure web app using the embedded Jetty server on a micro EC2, be sure you run it using "nice" to lower its priority and avoid AWS drastically reducing your resources because you use too much CPU time for too long of a period; I find much better continuous performance using "nice" - go figure that one out!

One AWS option I haven't tried yet is using lein-beanstalk which looks good from the documentation. Elastic Load Balancing on AWS costs about $18/month which drives up the cost of a Elastic Beanstalk deployment of a low traffic web app, but I think it does offer resilience to failing EC2s. You are limited to one web app per Elastic Beanstalk deployment, so this is really only a good option for a high traffic app.

A few years ago I also used appengine-magic for hosting on GAE but it bothers me that apps are not easily portable to other deployment platforms, especially converting datastore code. This is too bad because when Google raised prices and brought AppEngine out of beta, that made it more attractive to me, even with some developers complaining of large costs increases. Still, for my desire for a robust and inexpensive hosting for low or medium traffic web sites, AppEngine is in the running by simply setting the minimum number of idle instances to 1 and the maximum number of instances to 1: that should handle modest traffic for about $10/month with (hopefully) no loading request delays.

Edit: someone at my VPS hosting company (RimuHosting) just told me that as long as I set up all my apps, nginx, etc. to automatically start when a system is rebooted, then I probably have no worries: on any hardware problems they restart from a backup on a new physical server. I do occasionally try a soft-reboot of my VPS just to make sure everything gets started properly. I thought that RimuHosting would do this, but I asked to make sure.

Edit: the New York Times has a good article on the big business of offering cloud services that is worth reading:

Edit: 2013-03-14: Jim Crossley set up a demo project that exercises using the common services for Immuntant/JBoss, and wraps project source and resource files for live loading of changes to Clojure code and resources: Immutant demo.

Monday, March 11, 2013

Small example app using Ember.js and Clojure + Compojure + Noir

I use Clojure with Compojure and Noir for (almost) all of my web apps and lately I have also been experimenting with Ember.js. After buying and reading Marc Bodmer's book Instant Ember.js Application Development How-to yesterday I decided to make a very small template application using Ember.js for the UI and a trivial back end REST service written in Clojure. I used Marc's Ember.js setup and it worked well for me.

The github repo for my small template project is emberjs-clj

Please note that this example is a trivial Ember.js application (about 50 lines of code) and is intended just to show how to make a REST call from an Ember.js front end app, how to implement the REST service in Clojure, and not much else. I wanted a copy and paste type template project to use for starting "real projects."

You can grab the repo from github, or if you just want to see the interface between the UI and back end service, here is the code run by the Javascript UI:

RecipeTracker.GetRecipeItems = function() {
    url: '/recipes/',
    dataType: 'json',
    success : function(data) {
      for (var i = 0, len = data.length; i < len; i++) {
            title: data[i]['title'],
            directions: data[i]['directions'],
            ingredients: data[i]['ingredients']
    } });
and here is the Clojure code for returning some canned data:
(def data [
       {"title" "Gazpacho",

(defn recipes-helper []
  (json/write-str data))

(defpage "/recipes/" [] (recipes-helper ))
Hopefully my demo project will save you some effort if you want to use Ember.js with a Clojure back end.

Edit 2013-03-13: updated the example on github to Ember.js 1.0 RC1, cirrectling some breaking API changes.

Saturday, March 09, 2013

Google Research's wiki-links data set

wiki-links was created using Google's web crawl and looking for back links to Wikipedia articles. The complete data set less than 2 gigabytes in size, so this playing with the data is "laptop friendly."

The data looks like:

MENTION vacuum tubes 10838
MENTION electron gun 598
MENTION oscilloscope 1307
MENTION radar        1657
One possible use for this data might be to compare two (possibly multiple word) terms by looking up their Wikipedia pages, remove the stop (noise words) from both pages, and calculate a similarity based on "bag of words", etc. Looks like a great resource!

Another great data set from Google for people interested in NLP (natural language processing) is the Google ngram data set that has ngram sets for "n" in the range [1,5]. This data set is huge and not "laptop friendly" so last year I leased very large memory server from for a few months while I used the ngram data sets. I wish that I still had this data online but the cost of the server eventually became greater than the value of ready access to the data. The next time I need it I am planning on configuring a large memory EC2 instance with enough EBS storage for the data, indices, and application specific stuff - then I can stop the large memory instance when I don't need the data online which is probably 99% of the time: most of the costs will just be for the EBS storage itself, and not the (approximately) $0.50/hour when I keep the instance running.

Edit: I just did the math: renting a Hetzner server turns out to be much less expensive than using an EC2 instance that is usually spun down because 1 terabyte of EBS storage is $100/month (almost double what a Hetzner server costs).

Monday, February 25, 2013

Building custom data stores

Creating a custom datastore may seem like a bad idea when such great tools like Postgres, MongoDB, CouchDB, etc. are available in their open source goodness as well as good commercial products such as Datomic, AllegroGraph, Stardog, etc. Still, frustration of not having just what I needed for a project (more on requirements later) convinced me to spend some time building my own datastore based on some available open source libraries.

Much of the motivation for my work developing is to make possible the development of a larger turnkey information appliance. I have been using MongoDB for this, but even with an application specific wrapper MongoDB has been a little awkward for my requirements, which are:

  • I want a reasonably efficient document store that supports the usual CRUD operations on arbitrary Clojure maps (which can be nested to any depth). Clojure maps are basically what I use to contain and use data so I wanted a datastore that supports this, simply.
  • I want all text in documents (embedded at any depth in the document) to be searchable.
  • I need to be able to annotate data stored documents and sometimes relationships between documents.
  • My preferred notation for annotating data is RDF
  • I need to be able to efficiently perform SPARQL queries on the RDF annotations.
  • Coupling between documents and RDF: auto delete of any triples referencing a document ID, if the referenced document is deleted.

Initially I was going to write a wrapper library using two datastores as SaaS products: Cloudant (for CouchDB with Lucene indexing) and (for a RDF datastore, with extras). A small wrapper API would have made this all work but since a lot of what I am doing is in the experimenting phase I decided that I didn't want to use remote web services for coding experiments. Using these services, with a wrapper would be nice for production, but not for hacking.

Anyway, I have built a small project that uses HSQLDB (relational database) and Sesame (RDF :

EDIT: Patrick Logan asked about my use of HSQLDB; not specific to HSQLDB really, but here is the important code (hand edited to try to get it to fit on this web page) for adding documents that are nested maps, indexing them, and searching (note: I usually use Clucy/Lucene for search in Clojure code, but for what I am doing right now, this suffices):

(defn index-if-str [x id]
  (if (= (class x) java.lang.String)
    (sql/with-connection hsql-db
      (doseq [token (map (fn [s] (.toLowerCase s))
                     (clojure.string/split x #"[ ;.,]()"))]
        (if token
          (sql/insert-record "search" {:doc_id id :word token}))))))

(defn insert-doc [map]
  (let [id
        (:id (sql/with-connection hsql-db
                 "docs" {:json (json/write-str map)})))]
    (postwalk (fn [x] (index-if-str x id)) map)

;; (insert-doc {:foo "bar" :i 101 :name "sue jones"})

(defn search [s]
    (let [indices
            (let [tokens
                  (apply str (interpose ", "
                     (map (fn [s] (str "'" (.toLowerCase s) "'"))
                       (clojure.string/split s #"[ ;.,]()"))))]
              (sql/with-connection hsql-db
                 (sql/with-query-results results
                    [(str "select * from search where word in (" tokens ")")]
                    (into [] results)))))]
      (sort (fn [a b] (compare (second b) (second a))) (into [] (frequencies indices))))))

Friday, February 15, 2013

Using the Microsoft Translation APIs from Java, Clojure, and JRuby

I wrote last July about my small bit of code on github that wrapped the Microsoft Bing Search APIs. I recently extended this to also wrap the Translation APIs using the open source project microsoft-translator-java-api project on Google Code. I just provide a little wrapper for the microsoft-translator-java-api project and if you are working in Java you should just use their library directly.

Hopefully this will save you some time if you need to use the translation services. The free tier for the translation services is currently 2 million characters translated per month.

Saturday, February 02, 2013

Goodness of micro frameworks and libraries

I spent 10+ years using large frameworks, mainly J2EE and Ruby on Rails. A large framework is a community and set of tools that really frames our working lives. I have received lots of value and success from J2EE and Rails but in the last few years I have grown to prefer micro frameworks line Sinatra (Ruby) and Compojure + Noir + Hiccup (Clojure).

Practitioners who have mastered one of the larger frameworks like Rails, J2EE, Spring, etc. can sometimes impressively and quickly prototype and then build large functioning systems. I had an odd thought this morning, and the more I mull it over, the more it makes sense to me: large frameworks seem to be optimized for consultants and consulting companies for the quick kill: get in, build the most impressive system possible with the minimum resources, and leave after finishing a successful project. This is an oversimplification, but seems to be true in many cases.

The flip side to the initial productivity of large frameworks is a very real "tax" in long term maintenance because there are so many interrelated components in a system - components that might not be used or weakly used.

Micro frameworks are designed to do one or just a few things well and other third party libraires and plugins need to be chosen and integrated. This does take extra time but then the overall codebase with dependencies is smaller and focussed (mostly) on just what is required. I view using micro frameworks and libraries only, and composing systems into distinct services as a longer term strategy to reduce resource costs for systems long term.

I still invest a fair amount of time tracking larger frameworks, recently being especially interested in what is available in Rails 4 and the latest SmartGWT (a nice framework for writing both web client and server side code in Java - lots of functionality, but not as great for quick agile development in my opinion).

Monday, January 28, 2013

A little weird, but interesting: using IntelliJ LeClojure's debugger with Clojure code

I have been working on a web app using Compojure, Noir, and Hiccup. I usually run both lein repl and lein run concurrently and use either IntelliJ or Aquamacs (Mac Emacs) as a code editor. If I am working on a small bit of new code and experimenting a lot, then Aquamacs + nrepl works well for me. If I have my MacBook Air plugged into a huge monitor sometimes I run lein run and a LeClojure repl inside IntelliJ and detach a few edit panes as separate windows - so many nice choices for development!

I don't really like debuggers because getting into the habit of using them can end up wasting a lot of time that could be spent writing unit tests, etc. That said, I was looking at some Ruby code I wrote for a customer but have not deployed and it occurred to me to try the RubyMine debugger which worked very well and generally didn't get in my way or waste too much time manually stepping through some code.

So, I decided to spend a little time trying IntelliJ LeClojure's debugger on my Clojure project:

I wrote a little stub file to run as the debug target "script":

(ns cookingspace)
(use 'cookingspace)
I placed this file in the top level project directory so the relative path for loading required data files is set up OK. Also, I had to change the default run/debug option to not make the project before running - this is necessary since I have top level expressions that initialize data from reading data files, and the Clojure compiler is not run from the project directory so I would get build errors. You probably will not have that issue with your projects.

One thing that is cool about running the app in the debugger is that the LeClojure plugin also starts a repl (see the second tab at the bottom to the screenshot) that shows all debug println print outs and provides a repl without having to start a separate process.

I don't think that I will use the debugger very often (again, using a debugger to understand and debug code is usually not the best use of time) but it is cool to have another tool. That said, I might find that always running in the debugger and only setting breakpoints on the rare occasions that I need to better understand what some code is doing might work well. I will try this mode of development for a few hours a week as an experiment.

Friday, January 18, 2013

Cooking functionally

I am not actually using functional programming in the kitchen :-)

I am re-writing my old web app in Clojure with a new twist: the new web app will be an AI agent for planning recipes based on the food you have on-hand and based on a user's history, preferences, and the general mood they are in (food-wise). I have written some tools to convert my old relational database code for (more or less) static data (USDA nutrition information, recipes) to Clojure literal data.

Based on the type of food a user feels like cooking and what food they have on-hand, I need to transform recipes in an intelligent way to (hopefully!) also taste great after ingredients have substituted or morphed because a user would prefer a different type of sauce over a base recipe, etc., etc.

Anyway, working in Lisp and experimenting in a repl is making the entire process easier.

My wife and I are both very good cooks and we can make up super-tasty recipes on the fly with whatever ingredients we have on hand. I am betting on two things: I can automate our expertise and that people will enjoy using the system when it is finished.

Wednesday, January 16, 2013

Faceted search: one take on Facebook's new Graph Search vs. other search services

Faceted search is search where the domain being search is filtered by categories or some taxonomy. Individuals become first class objects in Facebook's new Graph Search and (apparently) search is relative to a node in their social graph that represents the Facebook user, other users they are connected with, and data for connected users.

I don't yet have access to Facebook's new Graph Search but I have no reason to doubt that as it evolves both Facebook users and Facebook's customers (i.e., advertisers and other organizations that make money from user data) should be happy with the service.

Google's Knowledge Graph and their search in general are also personalized per user. Once again this is made possible by collecting data on users and monetizing by providing this information to their customers (i.e., once more, advertisers, etc.)

Pardon a plug for the Evernote service (I am a happy paying customer): Evernote serves as private search. Throw everything relavant to my digital life into Evernote, and I can later search "just my stuff." I don't doubt that Evernote somehow also makes money by aggregating user information.

I assume that any 3rd party web service I use is somehow monetizing on data about me. I decide to use 3rd services more for the value they provide since my cynical self assumes the worse about privacy.

Dealing with faceted search and graph databases at Facebook and Google scale is an engineering art form. Fortunately for the rest of us, frameworks/libraries like Solr (faceted search) and Neo4J (a very easy to use graph database) make it straight forward to experiment with and use the same technologies, but admittedly without the advantage of very large data stores.

Sunday, January 13, 2013

The Three Things

Since I (mostly) shut down my consulting business (which I have enjoyed for 15 years) in favor of developing my own business ideas I find that I am working more hours a week now. When consulting the breakdown of my time spent per week was roughly 15 hours consulting, 8 hours writing, 5 hours reading technical material (blogs, books, and papers), 5 hours programming on side projects, and many hours of "me time" (hiking, cooking, reading fiction, meditation, and hanging out with my wife and our friends).

Wondering what "The Three Things" are? Hang on, I will get to that.

I now find myself spending 40+ hours a week on business development. Funny thing is, I still am spending a lot of time on the same activities like hiking, hanging out, reading, and side projects. I feel like a magician who has created extra time. Now for The Three Things:

I have always known this but things have become more clear to me:

  1. Spend time on what energizes you. If work drags you down and robs you of energy then you are probably doing the wrong thing. Whenever you can, engage in activities you love doing rather than maximizing how much money you make.
  2. Continually try to improve yourself by choosing activities for both your personal growth and that have a positive effect on other people.
  3. Even in the face of disappointments in life try to live in a constant state of gratitude for loved ones, skills, opportunities, and the adventure of not really knowing what comes next in life.

Friday, January 11, 2013

Ray Kurzweil On Future AI Project At Google

Here is a good 11 minute interview with Ray Kurzweil.

In the past Google has been fairly open with publishing details of how their infrastructure works (e.g., map reduce, google file system, etc.) so I am hopeful that the work of Ray Kurzweil, Peter Norvig, and their colleagues will be published, sooner rather than later.

Kurzweil talks in the video about how the neocortex builds hierarchical models of the world through experience and he pioneered the use of Hierarchical hidden Markov Models. It is beyond my own ability to judge if HHMMs are better than the type of hierarchical models formed in deep neural networks, as discussed a lot by Geoffrey Hinton in his class "Neural Networks for Machine Learning." In this video and in Kurzweil's recent Authors at Google talk he also discusses IBM's Watson project and how it is capable of capturing semantic information from articles it reads; humans do a better job at getting information from a single article, but as Kurzweil says, IBM Watson can read every Wikipedia article - something that we can not do.

As an old Lisp Hacker it fascinates me that Google does not use Lisp languages for AI since languages like Common Lisp and Clojure are my go-to languages for coding "difficult problems" (otherwise just use Java <grin>). I first met Peter Norvig at the Lisp Users & Vendors Conference (LUV) in San Diego in 1992. His fantastic book "Paradigms of Artificial Intelligence Programming: Case Studies in Common Lisp" had just been published in 1991, as had my much less good Springer-Verlag book "Common LISP Modules: Artificial Intelligence in the Era of Neural Networks and Chaos Theory." Anyway, it is not for me to tell companies what programming languages to use :-)

I thought that one of the most interesting parts of the linked video was Kurzweil's mention of how he sees real AI (i.e., being able to understand natural language) will fit into Google's products.

While individuals like myself and small companies don't have the infrastructure and data resources that Google has, if you are interested in "real AI", deep neural networks, etc. I believe that it is still possible to perform useful (or at least interesting) experiments with smaller data sets. I usually use a dump of all Wikipedia articles, without the comments and edit history. Last year I processed Wikipedia with both my KBSPortal NLP software (which, incidentally, I am hoping to ship a major new release in about two months) and the excellent OpenCalais web services. These experiments only took some patience and a leased Hetzner quad i7 32GB server. As I have time, I would also like to experiment with a deep neural network language model as discussed by Geoffrey Hinton in his class.