Friday, August 02, 2013

Semantic web and linked data are a form of agile development

I am working on a book on building intelligent systems in JavaScript (1) and I just wrote part of the introduction to the section on semantic web technologies:

In order to understand and process data we must understand the context in which it was created and used. We looked at document oriented data storage in the last chapter. To a large degree documents are an easier source of knowledge to use because they contain some of their own context, especially if they use a specific data schema. In general data items are smaller than documents and some external context is required to make sense of different data sources. The semantic web and linked data technologies provide a way to specify a context for understanding what is in data sources and the associations between data in different sources.

The thing that I like best about semantic web technologies is the support for exploiting data that was originally developed by small teams for specific projects. I view the process as a bottom up process with no need to plan complex schemas and plans for future use of data. Projects can start by solving a specific problem and then the usefulness of data can increase with future reuse. Good software developers learn early in their careers that design and implementation need to be done incrementally. As we develop systems, we get a better understanding of how our systems will be used and how to build them. I believe that this agile software development philosophy can be extended to data science: semantic web and linked data technologies facilitate agile development, allowing us to learn and modify our plans when building data sets and systems that use them.

(1) I prefer other languages like Clojure, Ruby, and Common Lisp, but JavaScript is a practical language for both server and client side development. I use a small subset of JavaScript and have JSLint running in the background while I work: “good enough” with the advantages of ubiquity and good run time performance.

3rd edition of my book just released: “Loving Common Lisp, or the Savvy Programmer’s Secret Weapon”

"Loving Common Lisp, or the Savvy Programmer’s Secret Weapon"

The github repo for the code is here.


Easy setup for A/B Testing with nginx, Clojure + Compojure

Actually, I figured out the following directions for my Clojure + Compojure web apps, but as long as you are using nginx, this would work for Node.js, Rails, Sinatra, etc.

The first thing you need to do is to make two copies of whatever web app you want to perform A/B Testing on, and get two Google Analytics user account tokens _uacct (i.e., the string beginning with “UA-”) tokens, one for each version. I usually use Hiccup, but for adding the Google Analytics Javascript code, I just add it as a string to the common layout file header like (reformatted to fit this page width by adding line breaks):
    [:head [:title "..."]
     (include-css "/css/bootstrap.css")
     (include-css "/css/mark.css")
     (include-css "/css/bootstrap-responsive.css")
The next step is to configure nginx to split requests (hopefully equally!) between both instances of you web app. In the following example, I am assuming that I am running the A test on port 6070 and the B test on port 6072. I modified my nginx.conf file to look like:
upstream backend {

  server {
     listen 80;
     location / {
        proxy_redirect off;
        proxy_set_header Host $host;
        proxy_set_header X-Real-IP $remote_addr;
        proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
        proxy_pass http://backend;
    error_page 500 502 503 504  /error.html;
    location = /error.html {
        root  /etc/nginx;
The ip_hash directive is supposed to evenly split requests by requesting IP address. This means that if a user hits your web app from their home and then again at their local coffee shop that the might see both A and B versions of your web app. Other options would be to use a per user device cookie, etc., but I think that randomly assigning version A or B based on a hash of the requesting IP address is sufficient for my needs.

I am just starting to use this scheme for A/B Testing, but it seems to work as expected. I do suggest that when your clone you web app that you keep versions A and B identical for a few days and check the Google Analytics for both account tokens to make sure the statistics for page views, times on pages, etc. are close to being the same for A and B.

After more testing, Google Analytics shows that the nginx ip_hash directive seems to split traffic near perfectly 50% to each A and B versions of my web site.

The 4th edition of my book “Practical Artificial Intelligence Programming with Java” is now available

Buy a copy at Leanpub!

The recommended price is $6 and the minimum price is $3. This includes PDF, Kindle, and iPad/iPhone formats and free updates as I fix any errors and update the book with new material. You may want to look at the github repository for the book example code before purchasing this book to make sure that the material covered in my book will be of interest to you. I will probably update the book in a few weeks after getting feedback from early readers. I am also still working on Clojure and JRuby wrapper for the Java code examples and as I update the code I will frequently push changes to the github repository for the example code.

My version of Ember.js ‘Get Excited Video’ code with Sinatra based service

The Ember.js Get Excited Video has source code for the example in the video but the test data is local on the client side. I forked this project and added a Ruby Sinatra REST service. My version is here. This uses Ember.js version 1.0 RC4. It took me some time get the example code working with a REST service so I hope my forked example will save you some time and effort.

Rest service and client in DART

The DART language looks like a promising way to write rich clients in a high level language. I have been looking at DART and the Ruby to JavaScript compiler Opal as possible (but not likely) substitutes for Clojurescript with Clojure back ends. It took me a little while to get a simple REST service and client working in development mode inside the DART IDE. The following code snippets might save you some time. Here is a simple service that returns some JSON data:
import 'dart:io';
import 'dart:json' as JSON;

main() {
  var port = 8080;
  HttpServer.bind('localhost', port).then((HttpServer server) {
    print('Server started on port: ${port}');
    server.listen((HttpRequest request) {
      var resp = JSON.stringify({
        'name': 'Mark',
        'hobby': 'hiking'}
It is required to set Access-Control-Allow-Origin. Here is the client code (I am not showing the HTML stub that loads the client):
import 'dart:html';
import 'dart:json';

void main() {
  // call the web server asynchronously
  var request = HttpRequest.getString("http://localhost:8080/")

void onDataLoaded(String responseText) {
  var jsonString = responseText;
  Map map = parse(jsonString);
  var name = map["name"];
  var hobby = map["hobby"];
  query("#sample_text_id").text =
      "My name is $name and my hobby is $hobby";
The call to query(…) is similar to a jQuery call. As you might guess, “#sample_text_id” refers to a DOM element from the HTML page with this ID. DART on the client side seems to be very well supported both with components and tooling. I think that DART on the server side is still a work in progress but looks very promising.