Showing posts from 2010

Nice: Neo4j version 1.2 final released

Neo4j is a high performance graph database that I usually use with the JRuby neo4j gem and sometimes with Clojure ( documentation here ). Neo4j is open source (AGPL v3) and is alternatively available for a reasonable fee with a commercial license (where you don't need to AGPL your project). I took advantage of the free offer to get your first Commercial Basic Server license, even though I am likely to open source my project anyway.

Good SimpleDB performance tips

I don't usually write blogs that just reference other people's material, but this three part article by Siddharth Anand ( first installment , then follow links to other articles) is really worth reading if you use SimpleDB. BTW, while the local SimpleDB simulator simpledb-dev works OK, for development I usually just access SimpleDB remotely from my laptop. One word of warning: while properly constructed SimpleDB queries can run very quickly from EC2 hosted applications, remote access tends to be 5 to 10 times slower.

WebServius wrapper for web services and data services

Webservius is a wrapper that provides for your web service APIs and data sources. Webservius handles billing, tools for your customers to use to monitor their use of your services, logging and use statistics, etc. They charge you 10% of what is billed through their service. When I have a chance I will try the free version of this service (limited number of API calls per day and access to end users must be free) and write up the experience. I noticed WebServius on Amazon's blog but Webservius is an independent company that uses the Amazon AWS platform.

Control of news media == ability to set public opinion

In the late 1800s Western Union was able to alter the results of a presidential election (reference: The Master Switch: The Rise and Fall of Information Empires - recommended!) A similar situation exists today in the USA: all major news sources provide a very slanted pro-corporate agenda that results in a large percentage of the population simply not understanding things like the benefits of a social safety net (a trade of tax money to keep society civil and safer), how we incarcerate a much larger percentage of our population than other countries, that Wikileaks has released a very small percentage of the diplomatic cables and those cables that have been released have been redacted to minimize the chance of putting people in danger, that we spend as much on our military as the rest of the world combined, how military spending enriches only a very small number of powerful people, the level of corruption in Congress (both democrats and republicans), etc. It is sad that better sour

Christmas came early: my Google TV arrived at 7pm tonight

Setting it up took 45 minutes because it immediately downloaded a new version of the OS when I set up the wireless Internet connection. I use Directv and it synced up with my DVR and Sony TV with no problems. The keyboard is nice, with a trackpad and mouse button top right corner. It is much nicer using the keyboard rather than the remote for Directv guide and DVR control. I have experimented with writing Android cellphone apps with the SDK and now I want to try some HTML 5 apps for both the Android cellphones and Google TV. Fun! I had a consulting job two years ago writing some Java blu-ray example apps and the development environment for that was, frankly, painful. Both Android and Google TV seem much more developer-friendly. You personalize Google TV by logging into your Google/GMail/Apps account. I had to enter my account information multiple times for and Picasa photo albums. There are some wrinklles to iron out but the platform has a lot of promise. We watched p

Excellent product: RubyMine 3.0

I bought RubyMine when it was first released and recently paid for upgrading to version 3.0. I switch between using TextMate (or GEdit on Linux) and RubyMine for both Ruby and for Rails development but since getting version 3.0 I spend almost all of my time using RubyMine. Rails 3 support is very good and working with RVM is a nice new feature. I never used to use the Ruby debugger (at all!) but I have used it twice briefly in version 3.0. At least for Rails 3 development I still separately run both rails server and rails console outside of the IDE - a matter of personal preference. I was using PyCharm (also by JetBrains) a lot for 2 weeks and the autocompletion hints and instant syntax warnings really helped me because my Python and Django skills are light-weight. I don't much need autocompletion hints and instant syntax warnings for Ruby and Rails development but it is unobtrusive and often useful. The HTML, JavaScript, Erb, and Haml support seems better also. One feature

Getting the Dojo 1.5 rich text editor dijit.Editor working nicely with Rails 3

I could not find much on the web and it took me a little time to get everything working (especially saving edited content) so I'll share some code, hopefully to save you some time. In apps/views/layouts/application.html.erb I added some boilerplate (with non-editor stuff not shown) to be included with each rendered page: <head> <link rel="stylesheet" type="text/css" href=""> <link rel="stylesheet" type="text/css" href= ""> <script type="text/javascript" src="" djConfig="parseOnLoad: true"> </script> <script type="text/javascript"> dojo.require("dijit.layout.BorderContainer");

Reading "The Rails 3 Way"

Obie had his publisher send me a review copy of his book Rails 3 Way that was just printed. It is a very good reference for Rails 3, and it really is a reference book meant to be accessed for specific problems. That said, I am reading it straight through because although I have a lot of Rails development experience, I would like to understand Rails at a deeper level. While Rails itself is "opinionated" Obie's book is even more so: he bases his business on Rails and the book reflects the way things are done in his company. The book uses Haml and RSpec exclusively. I still mostly use Test::Unit and Erb, and the book will probably not change that (but it might!). I found Paolo Perrotta's "Metaprogramming Ruby" to be a great resource for getting more into the low level details of Ruby and I expect that "The Rails 3 Way" will serve the same purpose for Rails.

Using cloud services when services like and Wave get cancelled

You can still access my public bookmarks if you want them. I will miss the service, as I miss Google Wave. I tried several cloud based to-do and getting things done style web sites but ended up writing my own web app but left the door open to other people by releasing the app as open source. I used ran the online word processor for a few years but made sure to provide data export options. People and companies who provide free services have no real obligation to continue the services forever but they have a responsibility to give users a good exit strategy . I exported my bookmarks and I suggest you do the same.

Suggestions for Python SDK and AppEngine

Partly because I don't often code in Python, I have been feeling some pain using the Python AppEngine SDK. I thought that I would pass on a few things that have made the process easier for me: Use an IDE. I have been using PyCharm but other people have mentioned liking Eclipse with the Python and AppEngine plugins. Live hints when I get a local variable name wrong, suggestions to automatically import one of my own modules, and autocompletion have really helped. If I were more familiar with Python and the Python AppEngine SDK then this would not be as beneficial. Unit tests: I have been using to support unit testing and that has been helpful. I get all of the model code and controller helper functions working and tested before doing the UI I find that the default Python SDK template engine (from Django) to be adequate and to be fairly easy to use. I have found Mark C. Chu-Carroll's book Code in the Cloud, Programming Google AppEngine to be a useful tutorial.

Platforms and Infrastructure as service

Good news for the people at Heroku (Salesforce just paid 200+ million for the company). This is definitely a kick up for platforms as a service. I have long considered Heroku to be the best platform as a service offering because it is so developer friendly - very different from AppEngine which I would label as "scalability friendly." I had to recently make a decision between developing for Heroku or AppEngine for my new business ideas ( and ). Heroku and Ruby on Rails, along with all of the useful plugins and auxiliary services offered by Heroku, make the most agile web application development and deployment story right now. In contrast, developing for AppEngine is a pain in many ways. That said, I think that the new AppEngine SDK and services are a solid improvement and since I hope for more than small scale success the relatively inexpensive hosting costs and automatic scalability of AppEngine won me over (at least for these project

AppEngine SDK 1.4 release is likely a game changer

I use AppEngine to host my own projects but not for my customers (everyone who has hired me for a few years has wanted Amazon AWS deployments - no exceptions). I think that the story for using AppEngine is definitely better with the 1.4 release. A charge of $9/month for keeping 3 compute instances always spun up for an application seems like a modest cost to workaround slow loading request times. (I have written twice about this: 1 and 2 .) Ten minute CPU limits for background processes is a nice increase over the old 30 second limits. I have recently been using the Python AppEngine SDK for my latest side project (something I need for my own use and I am also planning on offering it to other consultants for a small fee). Platform as a service: I have had the opportunity to help many individuals and companies in 14 years of running my own consulting business and I feel like I have a fairly clear understanding of the opportunity costs of manually managing servers. Small teams e

Python is not such a bad language

I have used Python off and on for about 10 years and have never particularly liked it. This is funny since Ruby is my favorite programming language and Ruby and Python are very similar both in features and in the types of applications that they are best used for. For me the largest shortcoming of Python is the lack of blocks. That said, I have two small side projects ( and ) that I wanted to host on AppEngine. I have a fair amount of experience with the Java AppEngine SDK but my gut instinct was to go with the Python SDK using the default webapp library. After spending about 5 hours writing code, I find that Python is fairly comfortable. 7/9/2011 edit: I enjoyed getting a basic system working with Python (a great learning experience) but ended up writing a feature complete version in Rails and using that for several months. In the last week or so I have re-implemented it in Java + GWT for the Appengine.

Must-have tool for understanding your web site and blog: Google Analytics

I use Google Analytics as a feedback mechanism for what people actually read on my blog and to know which pages on my web site people read (which I equate to which pages people find most useful). As an example, 50% of visitors go directly to my Open Content web page so this gives me a strong incentive to write more free web books and post the PDFs. As a consultant I interact with many developers and companies and it always surprises me when people don't bother to measure performance, in this case understanding what content people find interesting and/or useful. Speaking of measuring performance: Another must have tool if you do deployments: get a free 1-server account at . (Disclosure: if you end up eventually signing up for a non-free multiple server account then I get a small perk.) If one of your web applications starts to slow down it can save a lot of time being able to tell at a glance what the I/O, CPU, network, etc. activity has been in the last few da

Wonderful book: "Land of Lisp" - Conrad Barski is a great author and communicator

I have been enjoying Conrad Barski's web based tutorials for years and I recently received his new book Land of Lisp: Learn to Program in Lisp, One Game at a Time! in the mail. I have been writing Lisp code since the late 1970s and in ancient times I wrote two Lisp books for Springer Verlag. To be honest, I just bought the book for enjoyment but I find myself getting a new perspective and learning more about Common Lisp. Recommended! Tweet

Distributed NoSQL datastores: Cassandra and Cloudant's BigCouch

In my work for customers in recent years almost everything that I do uses PostgreSQL (sometimes PostGIS) and/or MongoDB. (I write a lot about the Semantic Web but so far no one has paid me to work on a project with an RDF data store like Sesame or AllegroGraph.) While I think PostgreSQL and MongoDB are great, their replication stories have not been great. MongoDB's master/slave and replica pairs work OK, and replica sets (MongoDB 1.6 and above) look to be a big improvement (it only takes a few minutes to try the MongoDB 1.6.x replica set tutorial example; follow the instructions .) I have not tried replica sets yet in a production environment but I am looking forward to it! I find MongoDB to be extremely developer friendly with convenient client libraires in Clojure and Ruby (I don't like dealing with JSON data and hashes in Java). PostgreSQL 9 replication is easier to set up and administer than Slony but I have not had to use it in production. The replication supports m

Good to be a programmer. Or: custom solutions are sometimes better

I signed up for a free Evernote account a year ago last April but never really used it. After reading an article in the New York Times this morning about sharing data across computers and handheld devices I decided to install the Evernote app on my Android cellphone. I also installed the web clipper Evernote Chrome browser extension. After setting up notebooks for major interests (writing tools, writing ideas, different technology areas), I spent some time using Evernote (OS X app, browser interface, and Droid app). Really nice. I thought of using Evernote for my to-do tasks but my own custom web app works better for me. This reminded me how great it is to be a programmer and make things just the way you want. Now, I must admit that is very simple (it has just the functionality that I want and nothing else). This project is open source and took me perhaps 6 hours to write and deploy on Heroku. It is really a great feeling to make something just for yourself (al

Is it better to spend time learning new programming languages or study languages and tools you already use?

I have been thinking about a discusion on Hacker News yesterday about which new programming language people want (or need) to learn. While I definitely enjoy learning new programming languages by writing small applications, I think that I personally get a better productivity boost by reviewing languages and tools I already use. In the last year I have mostly used Ruby and Clojure for my work. In the last few months I have read two books on Clojure and one on Ruby. Sort of: the more you know, the more you can learn and more deeply understand something. Recently I reviewed the commands and read another tutorial for the screen utility that I have been using almost everyday for three years. Well worth the time. I do a lot of work on remote servers and emacs is not always installed so I have also used vi for years. This morning, I saw a reference to learning vim by using vimtutor and spend 20 minutes working through the complete tutor program (learn by doing: you edit the tutorial a

New AppEngine 1.4.0 features a game changer for JVM languages

As I wrote last April you can use Objectify in Java apps to reduce loading request times (avoiding JDO overhead) but running JRuby and Clojure applications on AppEngine has been hindered by still long loading times. The new feature forthcoming to AppEngine (SDK available, but server side support is not in place yet) of allowing paid for apps to keep three compute instances always running will open up the AppEngine platform for other languages and frameworks. On the development side I have had good JRuby and Sinatra experiences with the AppEngine SDK but have never wanted to deploy anything other than experiments - that will change now. Tweet

Publishing to the Kindle, Android, iPhone, and iPad

I just bought a Kindle and I like it more than I thought that I would. The screen is easy to read and it is much lighter than an iPad. I have a plan to add support for Kindle, Android, iPhone, and iPad file formats in my writing/publishing pipeline. Currently, my "development system" for writing is Latex with some custom Ruby scripts and a Makefile for each writing project. From Latex source files, I can currently generate: Lulu print books PDFs for laptop viewing HTML pages with automatically inserted Google Adsense ads HTML pages I have been generating fairly good revenue from publishing my Open Content books and selling print copies (some generous people also pay for the PDF instead of using the free downloads). Because I am earning more money from my Open Content writing, I plan on devoting much more time to writing next year. I plan on always making the large format PDFs free for download, and selling print books and versions for hand-held devices. Tweet

Benchmarks: memory use can be as important as runtime for some applications

I often look to the benchmark game results. While I am pleased that Clojure is now included, I find the memory use (at least for these benchmark programs) to be disturbing (e.g., Clojure vs. Ruby 1.9 and Clojure vs. Java ). When deploying to small servers, VPSs, small EC2 instances, etc. memory use can be critical. Tweet

Clozure Common Lisp 1.6 has been released

Although I have used SBCL for more consulting work than Clozure CL, I have started using Clozure for more of my own projects. One thing I prefer is that standalone applications built with Clozure are about 10 megabytes smaller than apps built with SBCL. Release notes

Thanks to Tom Munnecke for portrait photographs

My friend Tom Munnecke recently took some casual portrait pictures of me, and I am now using one as my Twitter picture. Sometime I will also replace some of the pictures on main web site.

Contact your Congressional Representatives and Senators and ask them to have their families publicly go through the TSA grope experience

Use this link to contact them. Also, how about President Obama letting his wife and daughters get groped by TSA? Our Senators and Representatives let their wives and kids get groped? Hate to break it to you, but the government officials who we elect and who get paid off by corporate lobbyists don't live in the same world we do. People get the government that they deserve. Are you going to do anything about this? An email is good, but also consider taking the extra time to insist on talking to your elected officials. BTW, I don't think that TSA has every caught a terrorist, or done any good at all. (Not like the FBI and other intelligence services who do worthwhile work.) Tweet

I am softening my position on Oracle's stewardship of Java

I have to admit that I got a little carried away with my criticism, fueled largely by my agreeing in substance with the position of the Apache Foundation. However, Apple's announcement and IBM's earlier announcements about working with Oracle and OpenJDK have caused me to soften my position. To be fair to Oracle, my complaints about lack of tolerance for Apache Harmony and the FSF Gnu Java implementation apply equally to the now defunct Sun Microsystems. The GPL license for OpenJDK is not all that comforting given the patents wrapped up in Java implementations: only passing the Java Compatibility Kit tests gets a patent waiver. Still, for Java developers, I think that all is right in the world, on a practical level . Tweet

My nephew died this morning: rest in peace Anthony

My nephew Anthony was hit by a car last night and died early this morning. The following picture shows Anthony, my sister in law Anita, and me in front of my house in Sedona. Anthony was about 17 years old in this picture. Here is another picture of Anthony and my Dad when they were visiting Sedona. Anthony was 20 years old in this picture. Anthony loved talking about politics, his family, and music.

Question: what would be the legal ramifications of forking Java, but not calling it Java?

I understand that Oracle owns the trademark and there are patent issues. Still, an alternative pure open source platform named wombat or whatever might end up being necessary. I understand that Oracle would like to monetize Java but there is that "killing the goose that laid the golden egg" metaphor becoming real life :-) Ideas?

Downloading all of your data from Facebook

Nice: I just tried this: go to your Facebook Account menu on the upper right corner of the home web page and select Account Settings . Near the bottom of the page, click the learn more link for Download Your Information . Follow the instructions and you will get a ZIP file with your home page, friends list, all messages, wall, and photographs.

Free PDF for the Common Lisp edition of my "Practical Semantic Web and Linked Data Applications" book

I have been working on this book on and off for two years and finally finished it recently while on a long vacation. You can get a free PDF on my Open Content web page or if you want to encourage me to write more material on niche topics, you can buy a print copy . Although this book covers a small market, I believe that the combination of Common Lisp and the AllegroGraph RDF data store is a great combination for developing knowledge intensive software. Tweet

Wow: I think QuickLisp will change my Common Lisp development setup

I have been experimenting with the beta for and it looks very good. I never disliked the ASDF package management system, but QuickLisp looks to be easier to use and with a few hundred common Common Lisp already in the repository, getting most dependencies installed is quick and easy. Good job Zach! For many years I have been using a brute force approach to package management: in every project I work on, I have a utils directory where I un-tar the source code to all dependencies, and their dependencies, etc. I then locally (push "utils/PACKAGE_DIR/" asdf:*central-registry*) for all dependencies. This has always worked really well for me because I can ZIP up a project directory, rsync it to a server, and I am good to go with all dependencies. However, it is a pain to keep multiple copies of libraries in multiple projects. For reasons I don't even remember anymore, I never liked to use a global ASDF cache. It will be a long weekend day project, but if I d

Convergence on HTML5 for user facing software development

News this morning on ZDNet about Microsoft moving away from Silverlight towards using HTML5 is more good news. I expect (hope!) to see more iPhone and Android development to be based on HTML5 rather than native apps. As we see improvements in device independence (i.e., getting to read email, tweet, watch video, video teleconference, etc. on our smart phones, netbooks, laptops, Google TV, Wii, XBox, etc.) I hope to see real improvements in development platforms for universal applications.

Java support on the Mac

Long term, this is a big issue for me: I spend a lot of my time developing Java, Clojure, and Scala code using IntelliJ so I need support for desktop Java apps like IntelliJ. For now, my customer (CompassLabs) is a Clojure shop (at least the work I do) and Emacs+swank+Clojure is a sweet combination, so Clojure development will not really be impacted because this does not require Swing desktop app support. But I am pretty much addicted to IntelliJ for Java and Scala coding. I would be interested in an official statement from JetBrains how they will handle this issue, long term. An easy solution is to just use my i5 Windows+Ubuntu laptop in the future.

review of new Hulu Plus service

I got a beta preview invite today, and so far it looks really good: HD, no opening commercials (at least for what I chose to watch during my lunch break), and additional material. Hulu Plus is $10/month so I may not use it permanently but I will give it a try for several months. Just my opinion, but Hulu Plus at a reduced rate of $5 or $6/month would make it more compelling since for $15/month I get 2 Netflix blu-ray movie checkouts (at the same time) and their very good streaming service. My wife and I are also Directv customers and although we really like the service, it is expensive. Directv must be worried about competition from direct Internet viewing options because they called up several nights ago and offered a small reduction in my monthly rate and a free 6 month promotion for all their channels (except for pay per view). Directv is a good user experience. If you are like me, you use movies and a few TV shows for something to do while either bored or too tired at the end o

Future society: an optimum strategy for flourishing

I just watched this interview with Tim Wu . He nails it re: the tendency of information industries to move from open to closed systems. I just pre-ordered his new book: The Master Switch: The Rise and Fall of Information Empires There is little doubt in my mind how our society is going to evolve in an era of consolidated corporate power and ubiquitous information systems. Although I don't subscribe to the idea that history repeats itself, I do believe that history does inform us about human nature and how powerful people fight to consolidate power and influence. This tendency is firmly stapled into our DNA. There will almost certainly be strife between what used to be the middle class and financial and political elites. I read yesterday that one of the international rating agencies predicted a loss of "social cohesion" in the USA. Right now, there are large strikes in France over raising the retirement age from 60 to 62. It is interesting that here in the USA, the wa

Social network based authentication done right

Although there are valid privacy concerns using social networks like Facebook, and to a lessor degree Twitter (because almost all tweets are intended to be public), for most of us the value proposition of shared user identity between web sites provides advantages of consistent login/authentication without multiple accounts and also enabling web sites to potentially show you more things that are interesting based on your online behavior. I have been an occasional user since they went beta. Today I was looking at their login/authentication scheme that uses either Twitter or Facebook authentication. I tried using both Twitter and Facebook for authentication and liked that Hunch recognized that my previous Hunch account, my Facebook account, and my Twitter account belonged to the same person and immediately offered to merge the accounts. Giving a site like Hunch the ability to access some Twitter and Facebook data on users opens up even more opportunities for using machine l

Nice: Clojure results now in Computer Language Benchmarks Game

Examples: Clojure vs. Ruby 1.9 : median results 8 times faster Clojure vs. Java 6 server : median results 4 times slower Clojure vs. Python 3 : median results 10 times faster

My travel journal notes for my Siberia, Japan, and China trip

Here are my rough notes that I was emailing to my family and friends. As I edit them I will post a few of my best pictures here on my public Picasa web alblum - just look for recent photo albums with "2010" in the title. I did not make notes for the first week as we were going north west through the Aleutian Islands and into the Bering Sea. Fun excursion: 1.25 hour drive Petropavlovsk to Indigenous village We went way off the beaten track today, but had a lot of fun. Except for a lot of driving on very bad roads, most of the day was spent inside an Indian style lodge similar to what a large family would live in during the long winter. Entrance was a long very low tunnel. The lodge had a small hole in the ceiling directly above the fire pit. Three women sang and danced inside the lodge, one spoke a little English and told stories and legends, etc. They also cooked us a meal that was pretty good. I have some fantastic video of the singing and dancing (similar to Southw

I am back home after a 4 week vacation: blog comments are now enabled

I did not want to deal with blog comment SPAM while travelling so I had temporarily turned off comments. I will post my travel log when I get a chance.

Big productivity gain: not having an Internet connection in the middle of the Pacific Ocean

Carol and I are on a long cruise, and because of the high cost of Internet connectivity, I am only getting on the web for about 5 minutes every other day. I have been spending about 2 hours each day working on the Lisp edition of my Semantic Web book, and I must say that my productivity seems a lot better when I am not distracted with an Internet connection. So far, we have been very good about not over eating on this trip - enjoying the food but eating small portions. Except for some complementary Champaign the first night we have avoided alcohol, making it easier to not over-eat! We will be onboard for 25 days so we don't feel pressured to engage in all activities that might be fun. So far, we have been enjoying a series of onboard lectures, the movie theater, and lots of walking on deck.

I am going to be on travel for 4 weeks: temporarily turning off blog comments

Carol and I are leaving on a long trip. Unfortunately, I get SPAM comments on my blog which are easy enough to remove, but I will be off of the Internet for long periods of time. I'll turn comments back on when I get home. I have my laptop setup to work on the Common Lisp edition of my Semantic Web book so that will probably be available in final form in about 6 weeks.

Rich client web apps: playing with SproutCore, jQuery, and HTML5

In the last 14 years I have worked on two very different types of tasks: AI and textmining, and on (mostly server side) web applications. Putting aside the AI stuff (not the topic for today), I know that I need to make a transition to developing rich clients applications. This is not such an easy transition for me because I feel much more comfortable with server side development using Java, Ruby on Rails, Sinatra, Merb, etc. On the client side, I just use simple Javascript for AJAX support, HTML and CSS. As background learning activities I have been working through Bear Bibeault's and Yehuda Katz's jQuery in Action and Mark Pilgrim's HTML5 books. Good learning material. When I read that Yehuda Katz is leaving Engine Yard to work on the SproutCore framework I took another good look at SproutCore last night, worked through parts of the tutorial with Ruby + Sinatra, and Clojure + Compojure server backends. I find Javascript development to be awkward, but OK. I need to sp

MongoDB "good enough practices"

I have been using MongoDB for about a year for customer jobs and my own work and I have a few practices that are worth sharing: I use two levels of backup and vary the details according to how important or replaceable the data is: I like to perform rolling backups to S3 periodically. This is easy enough to do using cron , putting something like this in crontab : 5 16 * * 2 (cd /mnt/temp; rm -f -r *.dump*; /usr/local/mongodb/bin/mongodump -o myproject_tuesday.dump > /mnt/temp/mongodump.log; /usr/bin/zip -9 -r myproject_tuesday.dump > /mnt/temp/zip.log; /usr/bin/s3cmd put s3://mymongodbbackups) The other level of backup is to always run at least one master and one read-only slave. By design, the preferred method for robustness is replicating mongod processes on multiple physical services. Choose master/slave or replica set installations, but don't run just a single mongod. I often need to do a lot of read operations for

Very interesting technology behind Google's new Instant Search

Anyone using Google search and who is paying attention has noticed the very different end-user experience. Showing search results while typing queries now requires that Google has to to generate at least 5 times the number of results pages, use new Javascript support for fast rendering of instant search results, and, most interesting to me, a new approach to their backend processing: It has been about 7 years since I read the original papers on Google's Big Table and map reduce, so it is not at all surprising to me that Google re-worked their web indexing and search. The new approach using Caffeine forgoes the old approach of batch map reduce processing and maintains a large database that I think is based on Big Table and now performs continuous incremental updates. I am sure that Google will release technical papers on Caffeine - I can't wait!

Using Hadoop for analyzing social network data

At CompassLabs my colleague Vivek and I are using Hadoop and Amazon's Elastic MapReduce to process social network data. I can't talk about what we are doing except to say that it is cool. I blogged last week about taking the time to create a one-page diagram showing all map-reduce steps and data flow (with examples showing data snippets): this really helps manage complexity. I have a few other techniques that I have found useful enough to share: Take the time to setup a good development environment. Almost all of my map-reduce applications are written in either Ruby or Java (with a few experiments in Clojure and Python). I like to create Makefiles to quickly run multiple map-reduce jobs in a workflow on my laptop. For small development data sets, after editing source code, I can run a work flow and be looking at output in about 10 seconds for Ruby, a little longer for Java apps. Complex work flows are difficult to write and debug so get comfortable with your development en

why doesn't iTunes support Ogg sound files 'out of the box'?

You know why: Apple does not mind inconveniencing users in order to keep their little walled garden the way they want it. I have been a long time Apple supporter (I wrote the chess game they gave away with the early Apple IIs, and wrote a commercial Mac app in 1984) but sometimes they do aggravate me.

Two new books today

I just got my delivery from Amazon: "Linear Algebra" (George Shilov) and "Metaprogramming Ruby" (Paolo Perrotta). I have a degree in Physics but I find my linear algebra to be a little rusty so I bought Shilov's book to brush up. I bought Perrotta's book because while reading over some of the Rails 3 codebase, too often I find bits of code that I don't quite understand, at least without some effort.

I've improved my Hadoop map reduce development process

I had to design a fairly complicated work flow in the last several days, and I hit upon a development approach that worked really well for me to get things written and debugged on my laptop: I started by hand-crafting small input data sets for all input sources. I then created a quick and dirty diagram using OmniGraffle (any other diagramming tool would do) showing how I thought my multiple map reduce jobs would play together. I marked up the diagram with job names and input/output directories for each job that included sample data. Each time new output appeared, I added sample output to the diagram. I had a complicated work flow so it was tricky to keep everything on one page for reference, but the advantage of having this overview diagram is that it made it much easier to keep track of what each map reduce job in the workflow needed to do and made it easier to hand-check each job. As I refactored my workflow by adding or deleting jobs and changing code, I took a few minutes to k

Efficient: just signed up to write an article on Rails 3 after spending weeks spinning up on Rails 3

I was just asked to write an article on my first impressions of Rails 3. This is very convenient because I have been burning a lot of off-work cycles spinning up on Rails 3 (I have done no work using Rails in 5 months because I have been 100% booked doing text/data mining). Architecturally and implementation-wise, Rails 3 rocks: I will have fun writing about it.

Very cool: a tutorial on using the MongoDB sniff tool

No original material here, I just wanted to link some else's cool article on using mongosniff to watch all network traffic going into and out of a mongod process. The output format is easy to read and useful.

Very good news that Google will be providing a "Wave in a Box" open source package

Early this year I played around with the open source code on the Wave protocol site, but "play" is the active word here: I did nothing practical with it. Although I never used Wave's web UI very much, I did find writing Wave robots interesting and potentially very useful. I invested a fair amount of time in learning the technology. I was disappointed when Google recently announced their phasing out support of Wave but today's announcement that they are completing the open source project to the point of its being a complete system is very good news.

I finished reviewing a book proposal tonight for an AI text book

Based on the number of books I have written, it is obvious that I love writing. I also enjoy reviewing book proposals and serving as a tech editor, as long as I am fascinated by the subject matter! The proposal that I just reviewed for Elsevier was very interesting. I believe that the world (some parts faster than others) is transitioning to a post industrial age where the effective use of information might start to approach the importance of raw labor, physical resources, and capital (and who knows how the world's money systems will transition). When I was reading this book proposal and also in general books and material on the web, one litmus test I have for "being interesting" is how forward thinking technical material is, that is, how well will it help people both cope and take advantage of new world economic systems.

GMail Priority InboxBox

Finally, I got an invitation and I am trying it. One problem that I have is feeling that I have to read email as it arrives so I find myself not running an email client if I am really concentrating on work or writing. With the new display, I will only see emails at the top of GMail's form if they are deemed important because they are from people I always respond to, etc. I is also convenient being able to switch back and forth between the old style inbox and priority inbox.

Command line tips for OS X and Linux

I wrote last year about keeping .ssh, .gpg, and other sensitive information on an encrypted disk and create soft links so when the disk is mounted, sensitive information is available. I have a few command line tricks that save me a lot of time that are worth sharing: Use a pattern like history | grep rsync to quickly find recent commands. Much better than wading through your history. Make aliases for accessing services on specific servers for example alias kb2_mongo='mongo' . By having consistent naming aliases for your servers and for running specific services like the mongo console, it is easy to both remember your aliases and use them. Create aliases with consistent naming conventions to ssh to all of your servers. I use different prefixes for my servers and for each of my customers. Create an alias like alias lh='ls -lth | head' to quickly see just the most recently modified files in a directory, most recent first. For your working develop

Consistent APIs for collections

I have been using Clojure a lot for work this year and the consistent API for anything that is a seq (lists, vectors, maps, trees, etc.) is probably my favorite language feature. Scala 2.8 collections offer the same uniform API. For me Clojure and Scala, with a fairly small number of operations to remember across most collections therefore represent a new paradigm for programming compared to some older languages Like Java, Scheme, and Common Lisp that force you to remember too many different operation names. The Ruby Enumerable Module is also provides a nice consistent API over collections. Most Ruby collection classes mixin Enumerable, but the API consistency is not as good as Scala and Clojure. That said, even though Enumerable only requires a small number of methods to be implemented like each , map , find , etc., the ability to combine these methods with blocks is very flexible.

Nice, just installed Rubinius-1.0.1

I first tried seriously using the "Ruby implemented in Ruby" Rubinius last spring and really liked it. If you have not done so already install it ( rvm install rbx ) and give it a try. Rubinius does not support 1.9.x syntax yet, but that is coming. Great work by the developers of Ruby 1.9.* but I still like the idea of Rubinius, long term. Good show for Engineyard supporting the Rubinius development.

Using cljr for Clojure development

At work I now use the Clojure setup that everyone else uses, emacs+swank-clojure, with our custom repositories. For my own Clojure hacking (my own projects) I have just about settled on using cljr for convenience and agility. For me, the big win is being able to access Clojure libraries, Java libraries, and JAR files containing data sets I use often for NLP work from any directory. I don't need a heavy weight project, like for example, using Leiningen with all dependencies locally loaded. cljr uses Leiningen to manage the packages in the single ~/.cljr repository. When you startup cljr, everything in ~/.cljr is on your JVM classpath: this may seem a little heavy, but it is very convenient. As an example, this morning I noticed an old Twitter direct message from the author of Nozzle library asking me if I had a chance to try it. Instead of setting up a separate Leiningen project directory, I just did a cljr install com.ashafa/nozzle 0.2.1 , went to my catch-all directory where I k

My light weight Clojure wrapper for the PowerLoom knowledge representation and reasoning system

A ZIP file with everything you need to try it is on my open source web page. PowerLoom has been in development for many years and is available in Common Lisp, C++, and Java editions. I wrapped the Java edition for this project. This is just a first cut at a wrapper because assertions and queries must be encoded as strings.

Ruby happiness: first Ruby 1.9.2 released, now Rails 3.0

Assuming you have RVM installed, don't wait: rvm install 1.9.2 rvm 1.9.2 gem install rails and you will be up to date. I wrote a small utility app to browse MongoDB data we use for text mining for my customer this morning and I used Rails 2.3.8 and hopefully that will be the last time I start a new project < version 3.0. My excuse for not using version 3.0 was that I wrote and deployed the app in less than an hour, and I am just not up to speed on Rails 3 yet. That will change!

I am merging my other three blogs into this (my main) blog

I had what I thought was a good idea in the last year: split out special interests into: This blog - general technology, and Java - my work and play with Clojure - everything I do with Ruby My artificial Intelligence blog I am going to leave my other three blogs intact, as-is, but I am going to start doing two things: all of my non-book writing will go into this single blog and I am going to copy a few of my recent articles in the other three blogs to this one. Havng four distinct blogs has been a nuisance.

Moving MySQL to a large EBS volume

I had to move a very large customer MySQL database used for data mining to a large EBS raid. Since following the usual documentation did not work for me, here are my notes: after following the standard instructions for setting up RAID 0 (EBS is robust enough that I see little reason to use error correcting RAID, but do so if you wish), I followed some of the instructions here for mapping the usual installation location for MySQL data to the RAID EBS volume and using some fstab trickery (copied from the linked article): sudo mkdir /vol/etc /vol/lib /vol/log sudo mv /etc/mysql /vol/etc/ sudo mv /var/lib/mysql /vol/lib/ sudo mv /var/log/mysql /vol/log/ sudo mkdir /etc/mysql sudo mkdir /var/lib/mysql sudo mkdir /var/log/mysql echo "/vol/etc/mysql /etc/mysql none bind" | sudo tee -a /etc/fstab sudo mount /etc/mysql echo "/vol/lib/mysql /var/lib/mysql none bind" | sudo tee -a /etc/fstab sudo mount /var/lib/mysql echo "/vol/log/mysql /var/log/mysql none bind&

Good resources for learning HTML5

I recommend starting with for a good overview, get really excited by playing with some demos at , and then using as a tutorial/reference.

filling a tech knowledge hole: I bought "HTML5 Up and Running"

This has been a very busy year for me, and because of that I have been ignoring the tidal wave known as HTML5. The book looks very good so far.

How programming languages affect thinking; Clojure at work; my Clojure wrapper for PowerLoom

The Sapir-Whorf hypothesis is that the human language that we think in and communicate with affects our thought processes: the way we think. Because my current job mostly uses the Clojure programming language, I have been thinking in Clojure idioms lately - a big change from Java and a smaller but still significant change to using Ruby. (BTW, you may have noticed that I don't blog here anymore about Ruby; this is because I have a dedicated Ruby blog now .) At work Clojure has been a good choice because it is concise, has well designed APIs (for example, most built in data structures support the seq uniform APIs: everything mostly works the same for lists, sequences, binary trees, maps, etc.), and can take advantage available Java libraries. As a personal project, I finished wrapping the PowerLoom knowledge representation and reasoning system in a thin Clojure library this morning. (See my Clojure blog for more information .)

Big Data

Since I have been working for CompassLabs I have been getting even more appreciation for just how value there is in data. This article in the New York Times also makes the business case for data mining. My first real taste for the power of data came about 10 years ago when I worked for WebMind. We looked at data from online financial discusion groups, SEC Edgar data , etc. to try to value stocks based on sentiment analysis (text mining) and raw data mining.

Haskell is much easier after coding in Clojure and Scala for a while

I got very excited by Haskell about 18 months ago and spent a fair amount of time kicking the tires and reading the Haskell book. That said, my interest waned and I moved on to other things (mostly Ruby development with some Clojure, less Scala). When I noticed the the recent new Haskell release for July 2010 I installed it and started working through Miran Lipovańća's online book Learn You a Haskell for Great Good! . This time, things seem to "just click" and Haskell's functional style seems very natural. I have real regrets that I probably won't be using Haskell much because I mostly code in what people pay me to use which in the last 5 years has been Lisp, Ruby, Java, and Clojure.

Interesting new Google Buzz API: PubSubHubbub firehose

I spent some time experimenting with the Buzz APIs this morning - well documented and simple to use. The firehose data will be useful for tracking public social media posts. I set up Google's example app on my AppEngine account and had fun playing with it. Unfortunately, because of the amount of incoming data, it would only run each day for about 4 or 5 hours before hitting the free resource quota limits. Since this was just for fun, I didn't feel like paying for additional resources.

Good news: Google buying Freebase

That is very cool, I think. I have lately been waist deep in using Freebase for customer work. While there is a lot of cruft data in Freebase, with some manual effort and some automation, it is a good source of a wide variety of information. Depending on application, DBpedia and GeoNames are other good resources for structured data. I have a fair amount of example code for Freebase, DBpedia, and GeoNames in my latest book (there is a free PDF download on my open content web page, or you can buy a copy at

Scala 2.8 final released. I updated my latest book's Scala examples

Good news, Scala 2.8 has been released. I updated the github repo for the code examples for my book Practical Semantic Web and Linked Data Applications. Java, Scala, Clojure, and JRuby Edition (links for free PDF download and print book purchase). I haven't had the opportunity to do very much coding in Scala for several months because the company I have been working for ( CompassLabs ) is mostly a Clojure and Java shop. That said, Scala is a great language it is good to see the final release of 2.8 with the new collections library and other changes.

Good job: CouchDB version 1.0

I usually use PostgreSQL and MongoDB (and sometimes RDF data stores) for my data store needs, but I have spent a lot of time in the last couple of years experimenting with CouchDB and always keep it handy on my laptop and one of my servers. I was happy to upgrade to version 1.0 today!

Monetizing social graphs

Interesting news this morning of Google's investment in online games 800 pound gorilla Zynga in order to have access to social graph data from people logging into Google accounts to play games. There has been a lot of buzz about Facebook's effective social graph data and games like those provided by Zynga have helped them. That said, I would still bet on Google having a better chance of making the most money off of social graphs because they get to effectively combine data from at least five sources to build accurate user profiles: statistical NLP analysis of GMail, search terms used by people who are logged in to any Google services, friends and business connections from GMail address books, social connections from Google Buzz (which often includes data from other social graphs like Twitter), and in the near future online multi-player gaming. There is another issue: infrastructure. While I am willing to roughly equate the capabilities for non-realtime analytics of very large