Thursday, December 31, 2009

My tech industry predictions for 2010

As both an author and as a technical consultant, I am fairly opinionated in terms of what I expect for next year (tomorrow!). Here are my predictions:
  • There will be pressure to reduce IT expenditures. Increasing trend to favor outsourcing deployment platform (e.g., Google AppEngine and Heroku), infrastructure (e.g., Amazon AWS, RackSpace, both relational and NoSLQ data stores), and software as a service (e.g., CRM, etc.). Cloud computing gets more real.
  • Research will be concentrated on shorter term profits rather than long strategy. China might be an exception to this: a friend reports that he has seen willingness in China to fund very long term Artificial Intelligence research.
  • More people will spend more time using web based information and recreational resources using portable devices. There will be a shortage of wireless bandwidth in some areas.
  • Use of the Java platform will stay strong, but with more emphasis on alternative languages like JRuby, Scala, and Clojure.
  • Skills and education in data analysis, all types of modeling, text mining, and analytics will continue to be in increasing demand.
  • There will be well publicized sporadic failures by cloud service providers but customers will be tolerant because of cost savings.
  • The importance of the hugely increasing availability of all types of data will encourage development of new business areas like knowledge processing as a service.
  • Although the Semantic Web has seen slow adoption in the last decade, Linked Data, a new and more popular subset of the Semantic Web, will see wide adoption in the next few years.
  • The wide support for Linked Data will help make computer to computer interactions easier and more profitable.
  • Geo location will become increasing important. We are likely to see many interesting smaller scale systems using technology like PostGIS and Spatial Lucene that will be able to compete in local markets with Google, Microsoft, and Yahoo.
  • I think that the next decade will be a time of micro web businesses: a huge number of small scale web portals that offer very specific services to a relatively small number of customers who are willing to pay a use fee. Most of these will be deployed on very low cost cloud hosting. Many of these will be tightly integrated with the Facebook platform, the Wave platform, etc.
  • In the next 10 years: the USA and Europe will become much less important in technical inovation because their large debt liabilities will create business and research unfriendly environments.
  • In the next 20 years: educated, creative, and/or generally productive people will identify more with similar people around the world than with people in their own countries who may get bogged down in what looks to be a crushing world wide economic collapse (but not quite as bad in developing countries). There will be more of a feeling of world wide citizenship, the world will evolve first into a safer place (a trend over the the last 20 years has seen more safety, less civilian deaths - something that most people don't realize) and then eventually into greater prosperity.
Next year looks like an exciting time in our industry!

Best Wishes for a Happy New Year

Thursday, December 24, 2009

$1.5 trillion a year for "defense" spending, little money left for local governments; living locally; Happy Holidays

If you factor in the cost of the debt to pay for our military spending then I think that a reasonable estimate of our yearly defense spending is about $1.5 trillion.

The amount we spend on defense swamps other US government spending, including social programs. This is just my opinion, but I believe that we could keep our country relatively safe (compared to other countries) and spend far less money. The problem is basically banana republic style corruption: too much money is made by special interests for there to be any meaningful reform of military spending. The same comment is also true of other corporate interests like Wall Street, industrialized food production, pharmaceutical companies, insurance, etc. This is all enabled by total corporate control of the news media and shameful corruption in the lobbying industry and our federal government.

Fortunately, for most of us, life is still very good despite corruption of the world's "elite." Again, this is just my personal opinion: it is best to act in ways that help yourself, your family, and local community. Avoid doing business with large banks, preferring credit unions. Prefer buying food from local producers. Form or join local organizations for food distribution, computer clubs (promote open source solutions to help people use old PCs), and help with local school fund raising activities. Prefer to buy goods that are manufactured as close to where you live as possible.

Joy in life comes from our family and social interactions and in acting in service to others. A materialistic lifestyle is something that you have been conditioned to accept by lifetime exposure to media - try to resist the temptation to acquire material goods that you really don't need.

And, most of all:

Have a Merry Christmas and a Happy Hanukkah!

Friday, December 18, 2009

Building the EtherPad system and perusing the source code

The EtherPad collaboration-enabled online system is now open source. Cool. Google bought the company, the developers are joining the Wave team, and their product is now released under the Apache 2 license.

Be sure to follow the instructions (failing to set the environment variable for the path to a MySQL client JAR file produces a strange "cp -f" error that has hung up a few people trying to build the system, as reported on Hacker News). It only took about 15 minutes to download the source and build the system - simple enough.

After running the system and trying it, I used IntelliJ 9 to set up a project (choose project from existing source, main directory etherpad/trunk) and I am spending some time perusing the Scala and JavaScript code. Really nice looking code base, and reading through the Scala code will be an education. My JavaScript skills are a little weak, but I might still take a careful look at the JavaScript code to understand how they used Comet.

Monday, December 14, 2009

Amazon Elastic Load Balancing (ELB) is pretty cool

Using this service costs $0.025/hour so it may make sense to just run HAProxy yourself on a EC2 instance, but then you have to worry about fault tolerance/recovery if that instance fails. The ELB cost is small in comparison to running a cluster of EC2 instances and "outsourcing" as much of your system as possible to AWS (e.b., SimpleDB, Elastic Load Balancing, Relational Database Server, EBS, etc.) can certainly reduce both the complexity of your application architecture and also your implementation costs.

Here are my notes for a simple ELB setup for an AMI that contains a Rails web application:
export EC2_PRIVATE_KEY=pk-....pem  # different on your system
export EC2_CERT=/Users/markw/.ec2/cert-...pem # different on your system

ec2run -k gsg-keypair ami-e767ds71
ec2run -k gsg-keypair ami-e767ds71

Note: specifying "gsg-keypair" matches later doing ssh -i ~/.ssh/id_rsa-gsg-keypair ...

elb-create-lb MarkTest123 --headers --listener "lb-port=80,instance-port=4242,protocol=HTTP" --availability-zones us-east-1b

elb-configure-healthcheck MarkTest123 --headers --target "HTTP:4242/ping" --interval 30 --timeout 3 --unhealthy-threshold 2 --healthy-threshold 2

elb-register-instances-with-lb MarkTest123 --headers --instances i-X,i-Y

* substitute real EC2 IDs for i-X and i-Y, and this works great!

ssh -i ~/.ssh/id_rsa-gsg-keypair
create a file public/ping - put a line of text in it
start Rails on port 4242

* note: repeat for other EC2 instance. In production, obviously you would want to start your Rails apps using /etc/init.d/...

Then, hitting this URL works:


elb-delete-lb MarkTest123
and delete EC2 instances
It took me a little while to get ELB working, so I hope that my notes save you some time.

The cost of commoditization of computing: infrastructure and software

Discipline, a new view of system architecture, and rigorous automation procedures are required to take advantage of Amazon EC2, Google AppEngine, and other infrastructure as a service providers. Last week a customer commented on the rapid pace of Amazon's innovation. Yesterday they announced a new way to generate revenue from unused server instances by letting users bid on a spot market for unused EC2 instances.

Discipline When you own your own server farm, even if it is just a few back room servers, you can spread out applications over your servers in a haphazard way and usually get away with some sloppiness in your process architecture. When you are dealing with someone else's infrastructure a more disciplined approach is just about mandatory.

New view of system architecture Both Google and Amazon have published papers on dealing with very large scale geographically disperse systems comprised of many components, some of which are guaranteed to fail. These companies have also shared how for some applications they trade off immediate data consistency (between distributed data writers and readers) for scalability.

Rigorous automation procedures Large distributed fault tolerant systems involve two types of software development: core application development and writing automation code to manage managing servers (starting/stopping, DNS management) and partitioning an application over available servers.

Software The open source and free software movements have hugely contributed to the commoditization of computing. Even not counting operating system and networking infrastructure, almost all of the code in the systems that we create is written by other people and it is usually open source software. I would argue that in addition to classical skills like algorithms and raw coding skill, a modern developer must be well versed in available frameworks and libraries, their strengths and weaknesses, and knowledge of modifying and using them.

Commoditization comes with cost This is something that I am in the process of learning about. For most of my career I have had either dedicated hardware for development and deployment (starting in the 1980s, developing world wide distributed systems, I had expensive dedicated servers), or in the last 10 years, my "world" consisted of spending a lot of effort building systems on top of open source frameworks (J2EE, Rails, etc.) and deploying to a leased server, or a VPS for projects with few users. In this old scenario, deployment and admin would be a very small part of the effort of creating a new system. Now, designing, implementing, and deploying to Amazon AWS, Google AppEngine, etc. has changed that because of the effort it takes to properly "live" in a commodity computing world.

Sunday, December 13, 2009

IntelliJ version 9.0

This week JetBrains gave me an upgrade license for version 9 of IntelliJ. I don't do too much Java development anymore - mostly maintenance on some of my old projects and new AppEngine and Google Wave development.

Overall, version 9 is a nice upgrade: nicer source code repository integration, built in task management, the IDE seems faster, etc.

I use the JetBrains RubyMine product almost every day. Long term, my use of IntelliJ 9 will depend on how good the AppEngine support is. As a test, I generated a new AppEngine project, added some home page material, edited the appengine-web.xml file to specify a registered app name, set the version, then deployed to Google's servers with no problems. I did have a problem importing an AppEngine project from Eclipse but eventally realized that I needed to go to Module Settings -> Artifacts, and drag "Available elements" from the right window pane to the <output root> tree display in the left pane.

I tried writing a simple GWT 2.0 (new version) web app. When I created a new project, no template GWT client and server code was generated - a small disappointment. I deleted that project and used a command line tool to generate a project:
/Users/markw/Documents/WORK/gwt-2.0.0/webAppCreator -out mytest com.mytest
I then created a new IntelliJ project "from existing source", and had no problems using the generated template project, running the local development server, and using the web app in Firefox (after installing the GWT FireFox plugin).

I am really busy with client work right now but I hope to get some time next year to do an AppEngine based project of my own, and I am almost sure that I will use IntelliJ 9 for that because general Java development is a bit better, I think, than using Eclipse, and it looks like the AppEngine support is OK.

Monday, December 07, 2009

Balkanisation of Ruby?

When I first started using Ruby, Matz's C-Ruby was mostly the only game in town.

I am also an enthusiastic user Ruby 1.9.x, JRuby, and MacRuby. Seeing the Ruby spec being developed in Japan under government funding, and that it is for Ruby 1.8.7, I get a feeling of déjà vu as a long time Lisp user.

The balkanisation of Lisp has been more than a small nuisance for me in the last 25 years. I would prefer that all Ruby implementations eventually implement a common specification, but I would rather it be for something that looks like 1.9.x.

Sunday, December 06, 2009

Privacy and Security in the Internet Age

Just some advice that I give friends and family:
  • Delete all cookies in your browser every week - it is easy enough to sign in again to web sites that require authentication. People who do not delete their cookies never see what sites are tracking them. It is easiest to do a 'delete all cookies' operation and not to try to save the 5 or 10 cookies out of thousands that are stored in your local browser data.
  • Keep a text file with all passwords in encrypted form - and, do not use the same password for different purposes.
  • Every time you use your super market's discount card (or possibly pay with a credit card), your purchases are permanently associated with you - do you care? maybe or maybe not.
I do use a lot of web services that track what I do (GMail, for example) but I make the decision to give up privacy vs. benefits on a service by service basis.

Coolness! good instructions for trying Rails 3.0pre

This is worth sharing: easy to follow instructions for installing Rails 3.0pre.

Thanks to Oscar Del Ben!

Yesterday I spent 90 minutes trying a fresh install of Ruby 1.9.1 and Rails 3.0pre with no luck, so thanks to Oscar for his writeup. BTW, I Used Ruby 1.8.7 when following Oscar's directions.

Thursday, December 03, 2009

New Amazon Web Services feature: boot from EBS

Awesome - in the past I have had to write my own code/scripts to manage attaching and EBS file volume to a new EC2 instance. This new feature will make it a lot easier to manage your own EC2 based services. If you would prefer a PDF with the complete documentation, then use this link.

This new feature will make using EC2 even easier for some applications - a welcome change.

That said, I have my own scheme for automatically mounting EBS volumes, assigning ElasticIP addresses, etc. and for some deployments I will continue to use temporary boot volumes and pull AMIs from S3.

Wednesday, November 25, 2009

Playing with Chrome OS

I have to say that even in its alpha (or beta?) version, Chrome OS looks good. I like the home "apps page" and except for not having an application that is a terminal emulator, it is fairly complete: my calendar, google docs, etc., are all available, and web browsing is fast.

Again, given a good terminal program for use cases where I need to quickly SSH to a server, I can see a light weight and low power netbook nicely augmenting my laptop + external monitor setup.

Thursday, November 19, 2009

I am watching the live Chrome OS Webcast

For end users, Chrome OS is a great idea - I would argue that most of my friends and relatives would be better off not running Windows, OS X, or (full) Linux.

What about software developers: still useful, but not a replacement for a laptop. In a pinch, assuming a terminal window to run remote bash shells, Emacs, etc., I could still get work done while travelling. Still, for me, a MacBook with Ubuntu and OS X with RubyMine, IntelliJ, Eclipse, OmniGraffle, etc. is just about perfect for my workflow.

That said, I will buy a Chrome OS netbook when they are available.

Tuesday, November 17, 2009

Hosted MongoDB and CouchDB

After I finish up some client work this morning, I am planning on finishing a DevX article on using Heroku as a deployment platform. Since deploying to Heroku is so simple and so well documented, you might think that I would have a difficult time writing new material :-)

After a short tutorial on getting started, I am writing mostly about using both CouchDB and MongoDB as data store, either hosted yourself on EC2 (or another server external to Heroku, which is itself hosted on EC2) or commercial managed solutions like Cloudant for CouchDB and MongoHQ for a managed MongoDB service.

I like to manage my own and customer deployments on EC2 - frankly, it is fun :-)

That said, I think that there are sometimes business reasons for using hosted solutions like Heroku, Cloudant, and MongoHQ. It is a balance between development and admin costs and paying for managed platform as a service offerings.

nice: Rubymine 2.0 released

I use Rubymine for most of my Ruby/Rails/Sinatra development on Ubuntu, and use it in conjunction with TextMate on OS X. I find it convenient enough to alternate between TextMate when I don't need IDE features, and Rubymine when I do.

One of the biggest improvements is that indexing now occurs in the background and auto-complete and other features become available that depend on knowledge of an application and the gems that it uses.

This is subjective, but once Rubymine 2.0 loads up and is done with any background indexing then the CPU use is minimal, and I think improved from earlier versions (nice to not have the fan kick in on my laptop when the CPU cores heat up). For the Rails application that I am coding on right now, Rubymine is using about 360MB of resident memory - this is OK with me.

Wednesday, November 11, 2009

MongoDB has good support for indexing and search, including prefix matching for AJAX completion lists

I have been spoiled by great support for indexing and search in relational databases (e.g., Sphinx, native search in PostgreSQL and MySQL, etc.)

I was pleased to discover, after a little bit of hacking this morning, how easy it is to do indexing and search using the MongoDB document-centered database. I have two common use cases for search, and MongoDB seems to handle both of them fairly well:
  • Search for words inside of text fields
  • Efficient word prefix search to support AJAX "suggest" style lists
My approach does require combining search results for multiple search terms in application code, but that is OK. Assuming the use of MongoRecord, here is a code snippet:
class Recipe < MongoRecord::Base
collection_name :recipes
fields :name, :directions, :words
def to_s
"recipe: #{name} directions: #{directions[0..20]}..."
def Recipe.make collection, name, directions
collection.insert({:_id =>, :name => name,
:directions => directions,
:words => (name + ' ' + directions).split.uniq})

host = 'localhost'
port = Mongo::Connection::DEFAULT_PORT
MongoRecord::Base.connection =,port).db('mongorecord-test')

db = MongoRecord::Base.connection

coll = db.collection('recipes')

coll.create_index(:words, Mongo::ASCENDING)

Recipe.make coll, 'Rice Soup', 'Cook the rice, then add extra water to thin it out.'
Recipe.make coll, 'Cheese and Rice Crackers', 'Slice the cheese and layer on top of crackers.'

puts "\nSimple find"
puts Recipe.find_by_name(:name => 'Rice Soup').to_s

puts "\nFind recipe by regular expression (ignoring case) in array of words /water/i"
Recipe.find(:all, :conditions => {:words => /^water/i}).each { |row| puts row.to_s }
According to the MongoDB documentation, a regular expression match like /^water/i will use an index just as a relational database match in the form like 'water%' does.

I am still in a learning mode with MongoDB, so I would appreciate any comments on improving this aproach.

Tuesday, November 10, 2009

How to install CouchDB + nginx + basic authentication on EC2, including a Ruby client

Please note that if want to more secure installation, SSL should also be installed following these instructions (I used these instructions and another web blog to create the following abbreviated instructions). For my purposes, basic HTTP authentication is good enough. I assume that you are used to using nginx and CouchDB and either installed them from source or using apt-get. I am using Ubuntu, so you might have to modify these instructions slightly. On my laptop, I created a simple crypt program because OS X does not include one:
print crypt($ARGV[0],$ARGV[0])."\n";
After giving this script execute permissions, I created an encrypted password:
crypt my12398pass61
You should save the output because on your EC2 instance you need to, as root or sudo, edit the file /etc/nginx/htpasswd adding a line:
where myEKNgP2ivVVo was the output from crypt for the plain text password my12398pass61. Then edit nginx.conf file adding something like:
    server {
listen 9001;
server_name; # not a real domain name
location / {
auth_basic "Please login to use CouchDB";
auth_basic_user_file /etc/nginx/htpasswd;
proxy_pass http://localhost:5984;
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
If you restart nginx, then you should be able to access
You will have to enter couchclient as the user name and my12398pass61 as the password. I allowed my browser to set an authentication cookie so I would not have to keep logging in. (Obviously, you should use a different user name and password with crypt and setting up your /etc/nginx/htpasswd file.)

For a ruby client, do a "gem install couchrest" and try this:
require 'rubygems'
require 'couchrest'

db = CouchRest.database!("")
response = db.save_doc({:key => 'value', 'another key' => 'another value'})
doc = db.get(response['id'])
puts doc.inspect
You should be good to go writing Ruby applications that use your remote CouchDB service.

This installation is not very secure and should probably not be used on a production server containing sensitive data. I am not a security expert; if you are then I would appreciate your comments on this blog entry.

- - -

PS. an hour after writing this blog, I found a simpler solution of using a SSH tunnel. Check this out on the Disco Blog. You set a tunnel like:
ssh -i ~/.ssh/id_rsa-gsg-keypair  -L 5984:localhost:5984
If you use an Elastic IP address so your server always has the same IP address, then this ssh command can be aliased, for fast temporary connections to CouchDB and other services that are confgured for only localhost client connections.

Sunday, November 08, 2009

"always on" MongoDB installation on my laptop

I spend a lot of time experimenting with infrastructure software, sometimes for customer jobs and sometimes just because it is fun to learn new things. For non-SQL data stores, I have spent a lot of time in the last year experimenting with and using CouchDB, AppEngine datastore, Tokyo Cabinet, MongoDB, Cassandra, and SimpleDB. Tokyo Cabinet and SimpleDB store hash values as strings, and don't have the great client APIs that the others have because limitations in string-only hash values. That said, for an Amazon hosted application SimpleDB can be a good choice and Tokyo Cabinet is light weight and easy to install and use. Casandra looks great, and as I have written about here before, Cassandra is easy to use from ruby and has great features.

MongoDB has great performance and similar capabilities as Casandra. Chris Kampmeier has a great writeup that covers installing MongoDB on OS X, including setting it up as a system service. I followed Chris's directions. A pleasant surprise is that MongoDB has a light footprint, and leaving it running as a service like I do with PostgreSQL and MySQL is reasonable.

Along with the mongo and mongo_record gems, MongoDB is an awesome tool, and always keeping it running makes it easy to experiment with. BTW, I also keep Sesame and CouchDB running on my local network for much the same reasons.

Tuesday, October 27, 2009

Using nailgun for faster JRuby startup

I finally got around to trying nailgun tonight. On OS X with JRuby 1.4.0RC2, I built nailgun using:
cd JRUBY_HOME/tool/nailgun
make # I ignored the warning "no debug symbols in executable (-arch x86_64)"
In one terminal window just leave a nailgun server running:
$ jruby --ng-server
NGServer started on all interfaces, port 2113.
When you want to run JRuby as a railgun client, try something like:
jruby --ng text-resource.rb
On my MacBook, this cuts about 5 seconds of JRuby startup time off of running this test program.

Sweet. For small programs, using ruby is still faster than jruby but this makes developing with JRuby faster.

I just tried Amazon's new Relational Database Service (RDS)

Amazon just released a beta of their Relational Database Service (RDS). You pay by the EC2 instance hour, about the same cost as a plain EC2, but about $0.01/hour more for a small instance, plus some storage costs, and bandwidth costs if you access the database outside of an Amazon availability zone.

RDS MyQL compatible (version 5.1) and is automatically monitored, restarted, and backed up.

Currently, there is no master slave replication, but this is being worked on (RDS beta just started today).

Here are my notes on my first use of RDS:
  • Install the RDS command line tools
  • rds-create-db-instance --db-instance-identifier marktesting123 --allocated-storage 5 --db-instance-class db.m1.small --engine MySQL5.1 --master-username marktesting123 --master-user-password markpasstesting123
  • Wait a few minutes and see if the RDS instance is ready: rds-describe-db-instances
  • Open up ports for external access, if required (note, here I am opening up for world wide access just for this test): rds-authorize-db-security-group-ingress default --cidr-ip
  • Use a mysql client to connect: mysql -h -u marktesting123 -p
  • create database recipes;
  • in another bash shell: cat recipes.sql | mysql -h recipes -u marktesting123 -p
  • In the mysql client: use the remote RDS hosted database and be happy :-)
  • delete RDS instance (to stop paying for it): rds-delete-db-instance marktestng123 --skip-final-snapshot
Any mysql client libraries should work fine.

Securing your Mac laptop

Laptops get lost and stolen a lot. I am extra careful with my laptop because I keep so much of my and my customer's private data on it. I take a few steps to protect this information that I want to share with you (Mac OS X specific):

I keep a small encrypted disk image that contains all my passwords and other sensitive information. It also contains my .ec2, .s3cfg, .profile, .ssh, .gnupg, and .heroku files. Then in my home directory I make soft links ln -s ... to these files.

I do not keep the password for this disk image in my OS X keychain!

It is a very small hassle: each time I boot up, I mount this image so my .ssh, etc. files are available. This adds 10 seconds of "overhead" to each time I boot my laptop.

Whenever I start working for a new customer, I ask them if they would like me to also keep their working materials encrypted (some overhead involed, so I like to ask them if I should spend the time doing this).

Update: a reader pointed out that this is only a partial solution, and I agree with that. Using full volume encryption is a much more secure system. With my easy scheme, for example, browser session cookies are vulnerable so someone might be able to access your gmail account and any services that use email password resets.

Saturday, October 24, 2009

More getting stuff done by doing what I most want to do experiments

I read an interesting article a few weeks ago (sorry, no attribution - can't find the article again) about trying to always do what you want to be doing. I used to do "round robin" style scheduling of my time: keeping a single to-do list and cycling through it (and sometimes just finishing small tasks outright).

I have always thought that I needed to apply some meta-level discipline to get tasks that I don't enjoy as much done in a timely way. Scheduling work is not so difficult because I usually have just 3 or 4 active customers, and I enjoy most of my work. Other things like yard work (I prefer new projects over maintenance) got the round-robin treatment, and even recreation (I like to hike, cook/eat, read, and watch movies) activities used to be scheduled round-robin style to a (very) small degree.

Lately, I have been experimenting with not doing any meta-level scheduling. Now when I finish an activity I start the new activity that is what I most want to do. The reason that this works is that even activities that I don't enjoy as much (e.g., I just did some maintenance work on our deck) actually get done because it feels so good to clear them from my to-do list!

Work-wise, this new "non scheduling" approach seems to also be working fine. I very much enjoy working, but some work is more fun :-) Still, I think that I am getting a small up-tick in productivity, and I find that I still get around to all tasks. One reason that this works is that some work tasks really are very difficult technically for me, and if I hit them right when I feel like a harder problem, things that I thought would be very difficult turn out to be simpler.

I was motivated to write up my experiments a few nights ago when I re-watched the documentary "The Hero's Journey" on Joseph Campbell's life and work. One of Campbell's teachings is to "follow your bliss," that is, to do in life what you really want to do. Listening to Joseph Campbell is like getting a tune up :-)

Friday, October 23, 2009

RDF datastores are noSQL also - always keep an RDF data store service running

We tend not to use things that are not "ready at hand."

RDF datastores are noSQL also :-)

I always keep Sesame running as a service just as I run PostgreSQL and MySQL services.

Some things are better stored, queried, and maintained in a graph database.

If you always have something like Sesame (or the free edition of AllegroGraph) running as a service, and if you have client libraries installed for your favorite programming languages then it is easier to quickly choose the best data store for any given task.

BTW, I also always keep a CouchDB service running.

Sunday, October 18, 2009

Cloud computing options and portability

I listened to Paul Miller's podcast with Rackspace's president of their Cloud Division Lew Moorman this morning. I mostly agree with his comments on easy portability between Rackspace cloud services and Amazon's EC2.

I have not yet used Rackspace's cloud offerings, so my comments here are based on their documentation and a conversation I had with one of their support engineers (for one of my steady customers: I declined some work tasks to move to Rackspace because I don't like to spread myself too thin: I spend a lot of effort staying up to speed on Amazon and AppEngine, so I prefer to specialize on those two deployment platforms). The advantage of Rackspace is the binding of a persistent disk volume with their virtualized server instances (really, they offer a standard sort of VPS hosting service) where with Amazon it takes a little extra work to manage EBS volumes separately. For me, I like the benefit of Amazon's SQS, S3, and Elastic MapReduce - that said, I would make a small bet that Rackspace will provide similar services, otherwise they will just be competing in the VPS business space, although with good support (that said, I have always received very good support from smaller hosting companies like RimuHosting).

Lew Moorman was critical of Google's AppEngine because it depends on very proprietary software (mostly their highly scalable non-relational datastore). A fair criticism, but if you really wanted to move off of AppEngine, there are migration paths like using DataNucleus JDO support for a relational database server or something massively scalable like HBase (from the Hadoop project). One big win for using AppEngine is that if you decide to build on top of the Wave platform, software agents at least for now need to be hosted on AppEngine, and this process is fairly simple.

Saturday, October 17, 2009

Switching an AppEngine project from JRuby+Sinatra to Java+JSP

It is a bit of a pain to take several hours to convert a working codebase in one language/platform to another. I kept having small problems with JRuby and Sinatra that were just AppEngine specific (Ruby (or JRuby) and Sinatra are awesome).

I am only about 20% into development, and I decided that I wanted really solid tools/platform. Also, converting working code in one language to another is simple.

What convinced me to make the switch is that Java + Eclipse plugins support is just so good for AppEngine development, that for now the change seems like a good decision. For my next AppEngine project, I'll probably go back to JRuby + Sinatra since the support is getting better.

I built the open source IDEA 9.0 git snapshot - works fine

Something to do while watching TV :-)

With the Apache 2.0 license, it will be interesting to see how it is used. It is a large git clone, but built easily using ant.

I get a free commercial license for IntelliJ IDEA (as I used to get free Enterprise JBuilder licenses from Borland) but I still plan on following the open source IDEA project - hopefully interesting things will happen! I use Eclipse a lot just because the Java AppEngine support is so very good, but for plain old Java coding, I like IDEA.

The open source edition of IDEA is great for plain old Java coding, BTW, but is missing JSP + Tomcat development support (but NetBeans does a good job for J2EE-- development, and who does J2EE development anymore :-)

It takes a while to do a git clone and build the IDE (builds versions for OS X, Windows, and Linux as the default ant build target) and since the build process is so easy, it was not much fun, so you might as well just download a built version for your OS platform if you don't want to peruse the source code.

Wednesday, October 14, 2009

Nice tool for writing and maintaining documentation: YMUL web service and yumlcmd Ruby gem

Although some customers request using a Word Processor for producing documentation, if it is my choice I like Latex and OmniGraffle for producing diagrams. Latex is the fastest tool (that I use) for producing great looking print or PDF documents.

I am experimenting with something else this morning: the YUML web app for creating UML diagrams and the yumlcmd Ruby gem (add to your gem source and then gem install yumlcmd). Thanks to Under the Hat for pointing these tools out - check their blog for directions.

Although YUML hardly replaces OmniGraffle, it is cool to have documentation text based (Latex files and YUML files): faster, and less work.

Sunday, October 11, 2009

Some frustration with JRuby + Rails on Google AppEngine

A few engineers at Google and other developers are doing some good work towards getting Rails running on AppEngine both robustly and in a way that provides a good local development environment. One problem is simply that if your web app is not active, initializing JRuby + Rails + and all required gems can time out (30 second window for handling requests).

The Java and Python support for AppEngine is fantastic, but for two projects I want to do (my own projects, but may be revenue generating :-) I want a more agile programming language that Java and while my Python skills are sort-of OK, my knowledge of Django is very light.

I should probably just bite the bullet and spin up on Django, but I would strongly prefer working in Ruby. I have been experimenting with the JRuby + Sinatra + ERB + datamapper combination and at least an inactive web application spins up well within the 30 second request timeout window. I very much like datamapper (object identity issues) and it should not be too difficult to be completely portable on two platforms (given data import/export utilities):
  • JRuby + AppEngine
  • Ruby (1.8.x or 1.9.x) on any server
I like Sinatra as a light weight framework, and this technology choice is OK with me, except for one worry: my reason for wanting to use AppEngine (rather that Amazon EC2+S3+EBS, which is what I have been using for most customer projects and my own stuff) is to minimize my hosting costs if one or both of my ideas works out - I worry that using JRuby on Java AppEngine will not provide the same high performance as Java or Python web apps. I did compare (using the Apache benchmark tool) request times for JRuby + Sinatra + AppEngine (about 400 milliseconds/request) with Java + JSP + AppEngine (about 600 milliseconds per request). The rendered Java page was more complex so the benchmark between JRuby + Sinatra vs. Java + JSP looks like a wash.

For future projects if I need lots of back end processing (map reduce, spidering sites, etc.) then I will stick with Amazon EC2. If I can get by with just a web interface and a data store, then I would prefer AppEngine (to save a little money). Two great platforms!

Saturday, October 10, 2009

Designing for scalability and platform portability

Once an application is designed and at least partially implemented, options for scalability and portability are reduced. If a system's usage profile can not be predicted, then deploying to physical servers is a real problem because you have to pay for support for peak usage periods - however, relying on cloud infrastructure can very much limit platform portability.

It helps to consider scalability up front! Relying on scalable data store infrastructure like Googles AppEngine datastore or Amazon's SimpleDB can make life easier. For server side Java, coding to JPA makes it possible with some work to be portable between AppEngine datastore, SimpleDB, or using a traditional database on your own server. Some care needs to be taken to code to a subset of JPA (e.g., no cross domain queries in SimpleDB) if portability is important. In the Rails world, using Datamapper provides similar flexibility for portability between AppEngine datastore, SimpleDB, or a conventional database. And, taking this approach fails when you need a relational database.

Using asynchronous messaging is anther good strategy for scalability (and for decoupling). With some care, applications can be somewhat portable between AppEngine's Task Queue API, Amazon's SQS, or running something like RabbitMQ on your own servers.

Monday, October 05, 2009

Storing Lucene indices in Cassandra; cloud versus running your own server farm

The Lucandra project looks very interesting, but is incomplete at this time (see the "to be dones" at the bottom of the linked page).

Cassandra is a great project. I almost incorporated it into the design of a customer project recently, but we decided to host on Amazon so using their EC2, S3, SQS, and Electric Map Reduce services won out over rolling a custom stack.

I think that this must be a start up dilemma: long term, it is probably least expensive running one's own small server farm, but when you are just getting started a "pay as you go" cloud approach using very solid infrastructure tools like EC2, S3, SQS, SimpleDB, etc. makes sense.

I can't say this from personal experience, but my gut feeling is that if you can live within the constraints of Google's AppEngine, then it is probably less expensive using AppEngine than running your own server farm - even long term. BTW, if you have not read my DevX article on implementing search on the Java version of AppEngine, please check it out.

Thursday, October 01, 2009

Interesting new book: "Networks, Crowds, and Markets: Reasoning about a Highly Connected World"

This book will be published in 2010 but a complete pre-publication draft is available here. There is a PDF download link for the entire book near the top of the page.

If you enjoyed reading Albert-László Barabási's classic book "Linked: The New Science of Networks" then this new book looks like a great followup. I have not got too far into David Easley's and Jon Kleinberg's new book yet, but the range of topics in this 800 page book looks like it will make a good read.

Using Facebook Connect just got a lot easier

A new set of tools makes it much easier to integrate Facebook Connect in your web sites. Good job Facebook. Their old APIs and support were somewhat difficult to work with.

I would have added another example to my last book (about Web 3.0 stuff) if this had been available a few months ago.

Tuesday, September 29, 2009

Nice: RubyMine 2.0 will be a free upgrade

In a world filled with great free IDEs, JetBrains keeps being competitive with commercial IDE offerings.

I think that their RubyMine product is hands down the best Ruby development environment (although I do sometimes use GEdit on Linux and TextMate on OS X). Offering the 2.0 upgrade for free (to be released in a few weeks) is a nice way to say thanks to their customers. A beta 2.0 download is available here. I'm running the beta right now, and it looks like a good upgrade.

Saturday, September 12, 2009

My DevX article "Using Gambit-C Scheme to Create Small, Efficient Native Applications" is now online

My article is a quick introduction to Scheme, and then some examples building small compiled applications in Scheme. Gambit-C Scheme compiles to C, and the generated C code is then compiled and linked. When I need to use Lisp, I tend to use Common Lisp for large applications and Gambit-C Scheme for small utilities. For me, being able to use a high level and expressive language like Scheme to build efficient and compact applications is a big win :-)

I find the development environment of Gambit-C Scheme with Gambit-C's Emacs support to be very productive. Marc Feeley, the developer of Gambit-C, mentioned to me that several companies are doing product develop in Gambit-C Scheme. I have a NLP toolkit that I have ported to Gambit-C and I hope to get the time to finish and "ship" it sometime this year.

Thursday, September 10, 2009

Adventures of living in the mountains: heavy monsoon rains and flooding

We live in the mountains of Central Arizona (Sedona) - great area with mountains, trees, water for kayaking, etc. We do have our problems though.

I was hiking with friends this morning, beautiful day. Fast forward to this afternoon: very heavy monsoon rains with lots of flooding: wash below our house overflowed into yards below us. We had water flowing through our yard, but it did not make it into our house house. We also ended up with several inches of accumulated hail on our deck and parts of our yard: the white ice looks like snow if you don't look too carefully.

Monday, September 07, 2009

I just read the text for Obama's speech to school kids: it is non-political and strong on American values

This is the text of the speech he is scheduled to give in a few days.

Well worth reading since some crazy right wingers have been telling lies and sowing so much disinformation. I have very much appreciated some conservative pundits who have publicly called this disinformation "stupid." Not all conservatives put the well being of their political party above the well being of our country. Good for them for speaking up!

When I was in grade school, a friend's father, who was conservative, helped arrange for our whole class to get to see President John F. Kennedy speak. I think that my friend's Dad disagreed with President Kennedy on things political, but he wanted his son and his son's classmates to hear a President speak.

There is a lot of anti-American rhetoric coming from some conservatives - it is up to the rest us, the majority I think, to speak up and point out stupidity when we hear it.

Sunday, September 06, 2009

Very much liking Amazon EC2

I remain very enthusiastic about Google's AppEngine (and also I am very much enjoying my developer's Wave account). That said, Amazon's AWS services are having a much larger effect on my work for customers and my own work and research. AppEngine is great for some types of projects, but EC2 can be used for anything.

I have a text mining experiment that have been planning for a while, and today I have some free time to start setting it up. I have 3 old desktop computers (with a reasonable amount of memory and disk) that I usually haul out of my closet, run "headless," and set up for text mining and machine learning projects. Although I own these boxes, there is a drawback to leaving them running for several weeks in my home office: noise, heat generation, messing up my work environment, etc. I did a quick calculation and estimated that if I instead use one EC2 instance, a reasonably large ESB disk volume, and Elastic MapReduce when I need it to make Hadoop Map Reduce runs, the cost over a few week period is a small business expense. Anyway, I am setting up a work environment on an EC2 instance "as we speak."

My wife and I often evaluate what we really need in our home (we like to keep our small house tidy and elegant) and I would like to get rid of old computer hardware that I can do without - and, in the USA, our schools are poorly funded so contributing old hardware to our local high school with the latest Ubuntu installed could be a good thing.

Note: I would like to thank Amazon for providing a grant to me to cover my EC2, S3, and Electric MapReduce expenses for writing my last Ruby book's examples and permanently providing a MCI with all of the book's examples.

Giving something back

Kai-Fu Lee is leaving Google (he managed their China operations) to form an angel investing fund for young Chinese entrepreneurs.

I have had some interest in Kai-Fu Lee's career since purchasing a copy of his doctoral thesis in the late 1980s on the Sphinx real time speech recognition system. I was looking at using time delayed neural networks for speech recognition, and Kai-Fu Lee's thesis was both interesting and inspiring.

Saturday, September 05, 2009

great video talk: "Innovation in Search and Artificial Intelligence"

Peter Norvig's recent talk at UC Berkeley discussed how the effects of large data sets and increasing computer resources make it possible to achieve increasingly better modeling and predictive results. Well worth an hour to listen to.

There were a lot of gems in this talk, but one that I may put to immediate use is using non-text data in map reduce, specifically using the protocol buffer tools. I have been using Hadoop more frequently and it is worth looking the effects of binary data for intermediate results. His comment that using map reduce is not necessarily incompatible with indexing data was also interesting. There is an overhead for creating indices, but it seems like there are opportunities to use indices for access to global information in a data set while making a complete sweep through the input data set during the map phase.

Monday, August 31, 2009

Easy installation is a form of elegance

Often my work tasks are relatively easy: read requirements, use my previous experience and perhaps some new research to identify the best tools/frameworks to use, do a quick design, and get the job done. In order for this to be a quick and efficient process, it is important to be able to install software tools and keep them up to date - while taking a minimum amount of my own time.

I am currently running Ubuntu on all of my servers and customer servers, and it makes it faster to just remember how to do things for one distro. Using apt-get is fine for stable software installs, but when evaluating new tools it is usually best to get the most recent stable releases.

I am evaluating both Tokyo Cabinet and Cassandra for a task right now, and needed to install Cassandra. Evan Weaver, who works for Twitter, has written a Ruby gem that downloads the Java code for Cassandra the first time you try to start Cassandra:
gem install cassandra --no-ri --no-rdoc
cassandra_helper cassandra
Love it - a major time saver, and elegant :-)

Kirk Haines has a good writeup on using Cassandra with Ruby.

BTW, another time saver is to build your own Amazon Machine Images (AMIs) that have all of the tools that you usually use pre-installed. Starting a new project takes virtually no setup time, and after modifying an AMI for a customer, let them copy it in addiiton to delivering docs and source code.

Notes on using PowerLoom with SBCL Common Lisp

A while ago, I wrote Java wrappers for easily using PowerLoom from Java (see my Java AI book (free PDF download)).

I am evaluating the use of PowerLoom on a customer project and spent a while this morning experimenting with PowerLoom (version powerloom-3.2.50) using SBCL Common Lisp. Since it took me a while to find how to do the things in Lisp that I am used to doing in Java, I thought that I would make some notes on what I did:

Download and unpack the PowerLoom distribution. We will be using the example knowledge base file kbs/business.plm so you might want that open in a text editor to read through it. Start by running SBCL (lots of output removed for brevity):
$ cd powerloom-3.2.50
$ sbcl
This is SBCL 1.0.29, an implementation of ANSI Common Lisp.
* (load "load-powerloom.lisp")
* (STELLA::LOAD "kbs/business.plm")
* (PLI:S-ASSERT-PROPOSITION "(and (company c1) (company-name c1 \"Moms Grocery\"))" "BUSINESS" nil)

* (PLI:S-ASSERT-PROPOSITION "(and (company c2) (company-name c1 \"Dads Grocery\"))" "BUSINESS" nil)

* (let ((iter (pli:s-retrieve "all (company-name ?x ?y)" "BUSINESS" nil)))
(loop while (stella::next? iter)
do (print (pli::%pl-iterator.value iter))))

(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/C1 |L|"Dads Grocery")
(|i|/PL-KERNEL-KB/PL-USER/BUSINESS/C1 |L|"Moms Grocery")
The argument "BUSINESS" is the knowledge base module name defined in kbs/business.plm. The last nil argument specifies a PowerLoom environment.

The example file kbs/business.plm defined the concept company and defined instances. I asserted two additional companies and listed them.

It is easy enough to use PowerLoom using the interactive shell but it is more difficult embedding PowerLoom in Java and Common Lisp applications. The above example will at least get you started interfacing between your Lisp runtime environment and the embedded PowerLoom environment. API functions that start with "s-" are the most convenient because they take string input arguments.

Sunday, August 30, 2009

I have been very busy

Since my last book was published early this summer, I have had a full consulting work load - something that I very much appreciate, given the economic situation in the USA. I am helping two people with their start ups, building a modeling language in Common Lisp, and will soon start helping another company with semantic indexing tasks. I like to work between 25 and 30 hours a week, and I am in that sweet spot.

On the educational front, I have just bought two books on subjects that I already know a lot about (Hadoop and Natural Language Processing (NLP)) but that I want to learn even more. Hadoop rocks, and I have been interested and working with NLP since 1984. I am enjoying working through both books.

Carol and I have been enjoying our summer in Sedona: hiking, kayaking, gardening, and cooking.

Wednesday, August 05, 2009

Yes! Google Java AppEngine Plugin is now available for Eclipse 3.5

This is good news! The lack of this plugin has been holding me back from upgrading from Eclipse 3.4 to 3.5. For me, working with the AppEngine platform has really put the fun back into doing Java development.

Monday, August 03, 2009

Good advice: enjoy whatever you are doing at the moment

I was working on a customer's Rails application this morning, and I was totally enjoying myself (Ruby + Rails + TextMate + Solr), and I was perfectly happy with my work and tools.

This afternoon, I was coding in Emacs + Slime + Gambit-C Scheme, finishing up the examples for a DevX article - once again perfectly happy with the tools that I was using.

A long time ago, my Dad (a great guy, great scientist, and member of the national academy of science) gave me some very good advice: enjoy whatever you are working on. It was a very long time ago, but I remember the context clearly: I was in high school, complaining about too much calculus homework, and my Dad gave me "the advice." By the way, I am taking a trip to see my parents next week :-)

Sunday, August 02, 2009

Google AppEngine and OpenID

This is good news. Existing support for Google universal login for AppEngine web applications is great, and new support for OpenID is better.

I do have real security and privacy concerns when one account will log you into many services. I keep an encrypted text file on my laptop with all my web and server logins and use different random passwords for different services. Having many different logins does make online work more secure but this has to be balanced against saved time and greater convenience. At the end of the day, I feel more comfortable keeping logins for online banking, health record access, etc. separate, while enjoying single login experiences with services that are less sensitive.

Friday, July 31, 2009

Tools: experiment with many, master a few of them

I am admittedly a tinkerer: I really enjoy reading other people's code, experimenting with new languages and technologies. That said, I make the effort to really only master a few technologies (e.g., Ruby, Java, and Lisp for programming languages (my paid for work is split fairly evenly between these languages), and specialize in AI, cloud deployments, Rails and Java-based web apps).

I may be transitioning to adopting two new core technologies. I have been using relational databases since 1986 and have a long term liking for PostgreSQL (less so, MySQL). The "non-SQL" meme has become popular with a lot of justification: for many applications you can get easier scalability and/or better performance on a single server using other types of data stores. Google's AppEngine datastore (built on their big table infrastructure) is clearly less convenient to develop with than a relational database but it may be well worth the extra effort to get scalability and very low hosting fees.

I have been spending a fair amount of time using the AppEngine data store this year, and more recently I am learning to effectively use Tokyo Cabinet with Ruby and Java. Different non-relational data stores have different strengths and weaknesses and in an ideal world I would learn several. In a practical world, this is not possible, so I am looking at Tokyo Cabinet as my "new PostgreSQL" - very much good enough for a wide range of projects where a relational database is not a good fit.

I was talking with a new customer yesterday about Ruby deployments, and efficiency issues in general. Ruby is a "slow" language, but Rails apps do not need to run slowly since so much of the heavy lifting is done with highly efficient external processes like Sphinx, PostgreSQL, Tokyo Cabinet, nginx, etc.

Monday, July 27, 2009

If you have a Google Wave account, then try my Robot extension

I wrote a Robot extension that reads text on a wave (and any child blips added to it), and adds its own child blips with some NLP analysis (entity extraction and auto-tagging). No big deal, but fun to write.

Give it a try:

Saturday, July 25, 2009

Have fun with the AMI containing the examples from my latest Ruby book

I have prepared an Amazon Machine Image (AMI) with most of the examples in my Ruby book Scripting Intelligence: Web 3.0 Information, Gathering and Processing. Because I will be periodically updating the AMI, you should search for the latest version. This is simple to do: after you log in to the Amazon Web Services (AWS) Management Console, select “Start an AMI,” then choose the “Community AMIs” tab and enter markbookimage in the AMI ID search field. Choose the AMI with the largest index.

I have Ruby, Rails, Sesame, Redland, AllegroGraph, D2R, Hadoop, Solr, PostgreSQL, Tomcat, Nutch, etc. pre-installed and configured. I use this AMI for new projects and for new experiments because it contains most of the tools and frameworks that I use. If you know how to use Amazon AWS, it is easy to clone your own copy with whatever additional software you need, hook up a persistent disk volume, etc. If you have not yet learned how to effectively use AWS, this might be a good time to do so. I like AWS because I can start a server, use it for development all day, and have it cost less than a dollar.

There are README files for the examples. For more details on how the examples work, please consider buying my book if you have fun with the AMI.

Writing Wave robots that use blip titles and text

If you follow the Java Wave robot tutorial it is reasonably easy getting started. It took me a short while to get access to the titles and text of both new root blips (i.e., the start of a new Wave object) and child blips (i.e., new blips added to a root blip). Here is some code where I re-worked some of the example code (this is in the servlet that handles incoming JSON encoded messages from the Wave platform):
  public void processEvents(RobotMessageBundle events) {
Wavelet wavelet = events.getWavelet();

if (events.wasSelfAdded()) {
Blip blip = wavelet.appendBlip();
TextView textView = blip.getDocument();
textView.append("I'm alive and ready for testing");

for (Event event : events.getBlipSubmittedEvents()) {
// some of my tests:
Blip blip = event.getBlip();
if (!blip.getBlipId().equals(wavelet.getRootBlipId())) {
String text = blip.getDocument().getText();
makeDebugBlip(wavelet, "blip submitted events: child blip: " + text);

for (Event event: events.getEvents()) {
Blip eventBlip = event.getBlip();
// from original example:
if (event.getType() == EventType.WAVELET_PARTICIPANTS_CHANGED) {
Blip blip = wavelet.appendBlip();
TextView textView = blip.getDocument();
String s3 = textView.getText();
textView.append("Hello, everybody - a test... " + s3);
if (eventBlip.getBlipId().equals(wavelet.getRootBlipId())) {
String title = wavelet.getTitle();
String text = eventBlip.getDocument().getText();
makeDebugBlip(wavelet, "blip submitted events: root blip: title: " + title+" text: " + text);

public void makeDebugBlip(Wavelet wavelet, String text) {
Blip blip = wavelet.appendBlip();
TextView textView = blip.getDocument();
There are APIs for changing the data in blips. This example simply adds new child blips to a wave. Note that child blips do not have their own title. Here, I am dealing with blips that are just text but blips can also contain images, video, sounds, etc.

One bit of advice: the Wave platform is definitely cool (in many ways) but it is being actively developed and modified. Sometimes things just stop working for a while, so I have adopted the practice of walking away from my Wave development experiments for a few hours if my robots stop working. Twice, things simply started working again with no changes. For debugging robot code (Java or Python AppEngne web apps), make sure to enable DEBUG output in logging: then the AppEngine Logs page is your new friend.

Thursday, July 23, 2009

Wave may end up being the new Internet coolness

I continue having fun "kicking the tires." I do wish that I had a completely local Wave robot development environment, but I expect that will be forthcoming. The edit, compile, run cycle takes a while because I need to:
  • Modify robot code
  • Build and upload the code to Java AppEngine
  • Create new test waves, invite the robot, etc.
The development cycle for Gadgets is quicker if you can simply remotely edit a Gadget XML file on whatever server you use to publish it.

I am having a bit of an AppEngine performance issue. I am used to being able to cache (reasonably) static data in memory (loaded from JAR files in WEB-INF/lib). With AppEngine your web app can run on any server and web app startup time should be very quick (and doing on-startup data loading into memory from JAR files is not quick). I am not so happy doing this, but I may keep frequently used static data in the data store. I don't think that using JCache + memcached is an option because if I look up a key and it is not in memcached I don't know if the key is not defined or if the key has expired from memcached.

Tuesday, July 21, 2009

Google Wave gadgets

The gadget tutorial was easy to follow. I am starting with the state-full counter example and experimenting with that. The makeRequest API can be used to call remote web services inside gadgets. Other APIs let you process events inside a wave (from user actions, new or changed content, etc.) Cool stuff. There are many gadget containers but I was never interested in writing them myself until I started experimenting with the Wave platform.

Cool: just wrote my first Google Wave "robot" JSON web service

It is a placeholder, for now, but it will eventually use my KBtextmaster code to perform natural language processing on new replies to a wave that has my robot added as a participant. By following these instructions it only took about 30 minutes to get this going (would have been 20 minutes, but I compiled the Java AppEngine web JSON web service with JDK 1.5 - a re-build with JDK 1.6, and everything worked as advertised).

I have been working on the Common Lisp version of KBtextmaster in the last week, and the Java version badly needs a code cleanup also (both versions contain some of my code going back over ten years). I'll post the public URL for my robot in a week or so when I get a new version of KBtextmaster plugged in.

Monday, July 20, 2009

Book project, Google Wave, and a kayaking video

Except for some consulting work, my big project is a new book on using AllegroGraph for writing Semantic Web applications. Lots of work, but also a lot of fun.

I received a Google Wave Sandbox invitation today. I am going to try to spend an hour or two a day with Wave to get up to speed. Fortunately, I am 100% up to speed using the Java AppEngine (initially, Wave Robots, etc. get hosted on AppEngine, either Java or Python versions) and I have some experience with GWT - so I should already be in good shape -- but I need to write some code :-)

My wife took a short video of me kayaking yesterday.

Sunday, July 19, 2009

Gambit-C Scheme has become my new C

I might be writing an article about this soon: Scheme is a high level language - great for all around development, and Gambit-C can (once an application is developed in a very productive Emacs + Slime + Gambit-C environment) be used to create small and very efficient native applications. BTW, if you use an OS X or Windows installer, also get the source distribution for the examples directory.

In Unix tradition, I like to build a set of tools as command line applications, and Gambit-C is very nice for this.

Saturday, July 18, 2009

Common Lisp RDFa parser and work on my new AllegroGraph book

I am working on a 'three purpose' task this morning: writing an RDFa parser in Common Lisp. I need this for my new book project (semantic web application programming with AllegroGraph), I need this for one of my own (possibly commercial) projects, and to release as an open source project. I am building this on top of Gary King's CL-HTML-Parser, so Gary did the heavy lifting, and I am just adding the bits that I need.

Thursday, July 09, 2009

Measurement promotes success

Computer science involves effort measuring things: profiling code, tracking memory use, looking for inefficiencies in network connections, determining the number of database queries are required for rendering a typical web page in an application, etc.

I have started also measuring something else: how I spend my time. I used to just track billable time and leave time spent learning new languages, new frameworks, writing experimental code, etc. as unmeasured time. I now use a time tracking application on my Mac Book to track 16 different categories (billable, and learning/research - I also track time on Reddit, Slashdot, etc.) The overhead for these measurements is probably about 2 or 3 minutes a day, plus a few minutes to look at time spent at the end of a day, end of a week, etc. For me, this is useful information.

Wednesday, July 08, 2009

Continuing to work on my AllegroGraph book

I started this book late last year, but set it aside to write my Apress Ruby book Scripting Intelligence: Web 3.0 Information, Gathering and Processing.

I don't think that the market will be large for an AllegroGraph (AG) book, but after using AG on one customer project and experimenting (off and on) with it for several years, I decided that it was Semantic Web technology worth mastering. AG is a commercial product, but a free server version (supports Lisp, Ruby, Java, and Python clients) is available that is limited to 50 million RDF triples (a large limit, so many projects can simply use the free version).

AG supports the Sesame (an open source Java RDF data store) REST style APIs so if you stick with SPARQL and only RDFS reasoning, you get portability to also use a BSD licensed alternative. That said, my reason for using AG is all of the proprietary extra goodies!

In addition to a few Lisp, Python, Ruby, and Java client examples, I am going to incorporate a lot of useful Common Lisp utilities for information processing that I have been working on for many years: this will motivate me to package up a great deal of my Common Lisp code and release it with an open source license. I plan on releasing the book for free as a PDF file and as a physical book for people who want to purchase it. The book and the open source examples should be available before the end of this year.

Tuesday, July 07, 2009

W3C killing off XHTML2 in favor of HTML5: bad for the Semantic Web?

As a practical matter, HTML5 looks good for writing human facing next generation web applications with multimedia support and more intuitive elements like <header>, <nav>, <section>, <footer>, etc.

The problem that I have with the W3C's decision (assuming that I understand it correctly) is that at least in my opinion the value of the web goes way beyond supporting manual web browsing and enjoying digital media assets. I think that the web should evolve into a ubiquitous decision support system - this needs software agents that can help you no matter who's computer you may be using, what type of small device (phone, web pad) you may be using, etc. In this context, decision support means help in making dozens of decisions each day. User specific information filters, search agents, and personalized information repositories will require machine readable data with well defined semantics.

One approach is to have content management systems like Drupal and Plone publish information in parallel, both:
  • HTML5 web pages for human consumption
  • RDF/RDFS/RDFS+/OWL for consumption by software agents
It is very easy (like a few lines of Ruby) to convert either entire databases or subsets from SQL queries to RDF and since many web pages are created from information in relational databases, it might be OK to use this "dual publishing" scheme. Similarly, data in RDF repositories can be used, instead of relational databases, to publish web pages. However, I would prefer generating one format of web page with semantic information embedded as, for example, RDFa and micro formats.

My vision of the web is an increasing amount of data and information that we need customizable software agents or interfaces to cull out just what an individual user needs.

HTML5 needs a well designed notation for embedding extensible semantic information that does not rely on XML's extensibility.

Wednesday, July 01, 2009

PragPub - free monthly magazine for developers

While I love writing, I also like to read other people's efforts. I find that I learn a lot reading code that other people write. I started seriously reading other people's code in the 1970s - a habit I never outgrew. When I read what other people write, in addition to the content, I also pay attention to their writing technique: how they introduce a topic, make points, provide examples, the level of detail they use, etc. Check out the new PragMag - good reading.

Tuesday, June 30, 2009

My new APress book was released today: "Scripting Intelligence: Web 3.0 Information, Gathering and Processing"

APress web page for my book.

And, an Amazon link..

There is a lot of challenging material in this book but I am hoping to save my readers some setup effort because they can use a pre-configured Amazon machine image (AMI) that I created with the book examples and required infrastructure software. Of course, you can also build the examples on your laptop or servers.

I am setting up a book support web page with errata information. I plan on making entries for most questions from my readers that I answer via email.

Monday, June 29, 2009

USA: return to 'robustness'

With all of the problems that my country is facing, I am still optimistic, if:

Parents do their job and turn off the TV after dinner. When I was in high school I did 2 to 3 hours a night of homework on week nights - that should, I think, be the norm for the new young generation.

Young people do their job and squeeze every bit of value from the educational opportunities that they have at their disposal.

Adults do their job and realize that education and job skills are something that they need to develop continually throughout their working lives. Be productive and prosper.

Congress and our president: suck it up, stop being bought off, and do what is right. Look out for your karma, try not not be total assholes.

Financial elite: realize that no matter how much money you accrue, your children and grandchildren need to live in this world so you should not ruin the world that they will need to live in. Suck it up and try doing the right thing for change. Look out for your karma, try not not be total assholes.

As a country, we need to work together and everyone do their part.

Tough choice in the USA

Here in the USA, we face a tough choice: to survive with any kind of lifestyle and robustness, we need to defund government spending on:
  • Huge government subsidies to the Insurance companies, Beef Industry, etc. Subsidies may also take the form of not collecting a fair tax burden and for "under-regulating" corporations that strongly act against the public interest.
  • Drastic curtailment of defense spending
  • Elimination of the large amounts of money we give/loan to other countries to buy weapon systems from companies in the USA
Why are these tough choices? Well, because Congress (and the executive branch under Bush and Obama) mostly look out for corporate interests and not citizens' interests - that is just the way it is. Also a tough choice because almost everyone is simply too lazy to spend the effort to personally lobby their elected representatives.

Perhaps people get the government that they deserve.

Saturday, June 27, 2009

My Java AppEngine article published; my wife's video; more good experiences with Heroku

I wrote an article for DevX on Java AppEngine that was just published. The example application implements simple document storage and search. I still very much like AppEngine, but most of my work right now involves Rails development. I am still waiting for a customer who needs AppEngine development :-)

My wife has been helping a local non-profit organization (Connections) that uses Equine (horse) therapy to help both children and adults with disabilities - good work. I put her latest video on Youtube if you are interested.

I continue to be very happy with Heroku - for a relatively small cost, my customer gets a good deployment platform that takes almost none of my time to use, thus saving them money. When it comes to deploying software, I do like control -- so, it seems strange that I am so happy with Heroku and AppEngine where you have to give up control in return for saving time and money.

Friday, June 19, 2009

ClioPatria semantic search web-server

Between 2003-2005 I often used Swi-Prolog for semantic web experiments before I more or less settled on using Sesame (and occasionally AllegroGraph and Redland).

I just saw a link to the ClioPatria semantic search web-server project. Assuming that you have a fairly recent copy of Swi-Prolog installed, trying ClioPatria only takes a few minutes:
git clone git://
cd ClioPatria/project-iswc
Then, point your browser to http://localhost:3020/, load, for example, a OWL (XML format) data file, and try some queries. The default query language is SeRQL which I don't use, so I set the query language to SPARQL and all seems to work fine.

One good thing about Swi-Prolog and the bundled SemWeb library is that loading RDF data and performing queries seems very quick compared to Sesame which is what I usually use. As a result, the ClioPatria web application is also very quick.

Thursday, June 18, 2009

Opera Unite is an interesting idea

Unite is an interesting idea, letting non-techies easily share materials with friends and relatives. Opera provides an intermediary service: they map your account to whatever temporary IP address you have, deals with NATs, acting as a proxy between the web server running on your laptop and friends' and family members' browsers.

While I have been getting very interested in cloud computing in the last 6 months (using Amazon's and Google's offerings, and I spent an hour with someone at Sun yesterday giving them usability feedback on their soon to be released cloud services web interface), I am also interested in peer to peer systems. With some inter mediation, Unite is sort-of peer to peer (except for the reliance on an intermediate proxy service, although in some circumstances UPnP is used and no proxy is required). It is fine that Opera provides the back end services for Unite inter mediation. I would also like to see open source implementations (could be as simple as a little proxy server written in a scripting language and a Firefox plugin - if there is not already something like this, I expect to see some projects soon). For example, I might want to run my own intermediary service for friends and family. Anyway, ignoring the hype, I think that Unite is a good and also interesting idea.

Saturday, June 13, 2009

Heroku: Rails hosting done right

I just did a test deployment for a customer on this morning. Lots of non-standard things in the web app, but it still deployed nicely after a short remote SMTP setup. I have been reading about Heroku's architecture, implementation on EC2, etc. for a long while, so getting to use Heroku was fun!

For general development and flexibility I still like a semi-managed VPS ( is the best I have found so far) because I can run a mixture of Java, Rails, Squeak+Seaside, etc. and have a home for master git and svn repositories. That said, for deployment, a custom deployment architecture on EC2, or an abstract scalable platform like AppEngine or Heroku really does make a lot of sense. I enjoyed talking to two principles of Engine Yard (Ezra Zygmuntowicz and Yehuda Katz) at Merb Camp last year but I have not yet had an opportunity to use their platform on a customer job.

Tuesday, June 09, 2009

Google Translator Toolkit: wow!

My good friend Tom Munnecke (he took the great picture on my web site of my wife and I in front of the Taj Mahal) recorded an interesting interview last month with Peter Norvig who talked a lot about the Google's translation services and how they work (Tom has not posted his video interview yet - I will add a link when one is available).

Anyway, it is a real pleasure this morning to actually get to experiment with the Translator Toolkit - very impressive. I have fairly good reading knowledge of French so I am experimenting with translating English to French. My Spanish is very rusty (and my FORTRAN is rustier still :-) but I will try that also.

Monday, June 08, 2009

Ruby client for search and spelling correction using Bing's APIs

I noticed that Microsoft allows free use of their search and spelling correction APIs. I just played with the APIs for a few minutes. Here is a Ruby code snippet that I just wrote:

require 'rubygems' # needed for Ruby 1.8.x
require 'simple_http'
require 'json'

def search query
uri = "{API_KEY}&Market=en-US&Query=#{CGI.escape(query)}&Sources=web+spell&Web.Count=4"

def correct_spelling text
uri = "{API_KEY}&Market=en-US&Query=#{CGI.escape(text)}&Sources=web+spell&Web.Count=1"
You need a free Bing API key - notice that I set the key value in my environment. If you get a key, then try:
search "semantic web java ruby lisp"
correct_spelling "semaantic web jaava ruby lisp"
The first method does spelling correction before search.

Sunday, June 07, 2009

I avoid installing software with sudo

As a Linux user since the early 1990s (and a longtime OS X user), it was easy for me to get in the "./configure; make; sudo make install" habit, but I don't think that this is such a good idea for two reasons:
  • Security: have you really read the source code to see what might be executed during "sudo make install"? I am constantly installing Ruby gems, infrastructure software, etc. and I often read code as an educational experience, but not for security. It is best to not run other peoples code as root.
  • It is much easier for me to rebuild systems from backups when I "./configure --prefix=/home/mark/bin" (or wherever, but in my home directory).
I used to like to keep my home directory fairly small so backups take up less space but now costs of external disks, remote storage like S3, etc. are so small, that it makes more sense to have my home directory to be more self contained.

I also like to develop customer projects under a single master directory. It is nice to have everything in one place: my application code, nginx, PostgreSQL (with data), Ruby, gems, Java, Tomcat, Sesame, Erlang, CouchDB, etc. - whatever a project requires to run. A top level shell script can set up the environment for each different project. This also makes cloning a customer's system to one of their alternative servers just a quick rsync away...

Friday, June 05, 2009

Chrome browser betas for OS X and Linux - very fast!

Very nice - even the betas that were just released are very fast. Interesting to see how fast the final releases will be...

I had not tried Chrome before because of WAS (Windows Avoidance Syndrome). I like the minimalist Chrome UI - very nice. When I boot OS X, I also like to use the new Safari 4 beta.

I have been very busy this year - not too much free time to try new things. Now that I am done working on my new book, I hope to have time to experiment with some technologies that I have not tried before like writing a web application that optionally uses Google Gears for local storage.

Tuesday, June 02, 2009

Open source, the gift economy, and the new world order

I just made a small donation to Canonical (good shepard for Ubuntu Linux) while I was installing some security updates. A good investment.

As a few very large corporations continue to control resources and major infrastructure, I expect to see a trend towards small agile enterprises covering rapidly changing technology and business niches. I expect to see a three-way synergy between mega-size corporations, small agile businesses, and a mobile highly educated work force: all three sides win big. The losers in this new world order are the poor and the poorly educated workers who can not adapt to changing situations.

I think that open source software, other key infrastructure supported by the users of the infrastructure, and a general gift economy will continue to reduce down to a minimum the cost of doing business. Again, the winners are both people who are well educated and prepared on a global scale to move quickly to take advantage of new situations, or people who prepare themselves for work in high-value local jobs like health care, critical government services (fire control, police, etc.), and support of local physical infrastructure.

I believe that both my country (USA) and most of the world are going through a historic transformation created mostly by a new higher level of transparency. Throughout history, the ultra rich and powerful have worked behind the scenes to amass more power and wealth by starting wars, etc. The same things still happen, but now society better understands what is really happening: corporate ownership of governments, who benefits financially from planned wars, strife, and the manipulation of the world's financial infrastructure.

I believe that the "information genie" is out of the bottle, and is not going back in.

Predicting the future is tricky, and I will not try. Still, it will be interesting to see how the general quality of world-scale governance improves or degrades in a future that blends meritocracy (those who get great educations and are major producers will compete with conventional multi-generational dynasties of controllers), greater transparency mostly due to the Internet, and near absolute corporate control and manipulation of governments and news media.

Ideally, in the new world order governments become "infrastructure" in the sense that governments compete to provide:
  • Highest quality of physical infrastructure services for the lowest tax base
  • Highest quality of resources for education (which needs to cover people throughout their entire lives)
  • Physical security for people living in and businesses in their jurisdictions, at the lowest cost
  • etc.
We all live in interesting times :-)

Sunday, May 31, 2009

Scala really is a "better Java"

I have been so busy this year that I have slacked off on fully getting up to speed on Scala (and Haskell, for that mater). A few people have been working on a Sinatra clone in Scala. (Sinatra is a very nice minimalist Ruby web framework that I like a lot.) I grabbed a version of the Step project off the web today and I just had some time to play with it.

You could not really write something like Step in Java without writing an interpreter. With Scala, you can add syntactic sugar similarly to Ruby. Step nicely emulates parts of Sinatra; here is a code block wrapped to handle an HTTP GET request mapped to the route "/form":
  get("/form") {"Step: Form Post Example",
<form action='/post' method='POST'>
Your name: <input name='name' type='text'/>
<input type='submit'/>
<pre>Route: /form</pre>)
With Scala's ability to define operators and deal with code blocks, and native XML support, Step was defined in less than 100 lines of Scala code. Very nice.

Friday, May 29, 2009

Google's Wave platform

Yes, I should have gone to the Google I/O conference: I would have a sandbox Wave developers account right now. Oh well. It almost gives me a headache thinking about the server resources that will be required to support a world wide deployment and large scale adoption of applications built on Wave, end users of the basic Wave platform, etc. That said, I don't have to worry about how Google implements AppEngine, Wave, etc., and the details of how Amazon implements AWS. That is the point:

Cloud resources place an abstraction barrier between developers and deployment concerns.

As someone who actually enjoys dealing with server deployment issues, this is still a very good thing. Anything that lowers cost and makes development faster is a good thing, even when we have to leave doing some fun work behind us.

I like that there are already two other companies besides Google that are implementing Wave protocols and services (I want access to that Emacs Wave client :-) Open source implementations and multiple service vendors make the platform a safer technology choice. I liked the bit in the demo video on privacy and security (related: see Amazon's white paper on dealing with HIPAA privacy laws in EC2 hosted systems).

I'll obviously wait until I get a Wave developer's account and spend a 100 hours (or so) to kick the tires (some good documentation is already becoming available), but I might decide to write a short book on the Wave platform. I usually just write about things that I already use in my work, but since using resources like Amazon EC2, AppEngine, and Wave will be so important to my work and businesses in the future, time spent on Wave should be a good investment. BTW, I am working on a DevX article right now on the Google Java AppEngine. My APress book Scripting Intelligence: Web 3.0 Information, Gathering and Processing (should be in print in about 4 weeks) has some material on Amazon's EC2, S3, and Elastic MapReduce.

Saturday, May 16, 2009

Google's (eventual) support for RDFa

I am glad to see this because it will encourage more web developers to add semantic markup. The last appendix in my new book (APress: "Intelligent Scripting for Web 3.0) briefly covers RDFa. In principle I prefer publishing separate RDF feeds but with support in Drupal for RDFa (and, I hope, other CMS systems like Plone) RDFa may become commonly used - a good thing if it happens.

Sunday, May 03, 2009

My Ubuntu/OS X MacBook

I continue to be happy with my decision to set up a "work only" Ubuntu Linux partition on my MacBook. I have been using Linux since I downloaded a mini-Slackware distro in 1992 over a 2400 baud modem. Desktop Linux has come a long way! I did a few things to make my working environment nicer:
  • Set the CAPS LOCK key to another control key
  • Installed Google Desktop and set the instant search popup hot keys to "Hyper space" (command space, same as popup search on OS X)
  • Set up IntelliJ/RubyMine/Eclipse for Ruby, Java, Scala, and App Engine development
I did not even install the developer tools on my new OS X partition. I am just using OS X for video editing, photos, and video conferencing with my family.

Friday, May 01, 2009

I just switched my MacBook over to Ubuntu

I made three image backups of the hard disk on my MacBook (yeah, I am careful like that) and just used BootCamp to set up a small OS X partition and an Ubuntu 9.04 partition. This worked for me, but I make no guarantees for you (<grin>):
  • Boot with OS X install DVD, wipe the disk, and installed to the entire disk
  • Without doing anything else, I ran BootCamp and grabbed about half the disk for "Windows"
  • I inserted a Ubuntu install CDR, and chose "advanced" partitioning. I deleted the fat partition on /dev/sd3 and made it a bootable ext3 partition with "/" as the mount point
  • When I boot my Mac, I hold down the option key to switch between OS X and "Windows" (ha!).
  • This is really non-optimal, but I had a false start: the first time I ran the Ubuntu installer, it recognized OS X and offered an express install. Anyway, I ended up with a small 2 gig partition /dev/sda4 that BootCamp boots the first Ubuntu install. I simply edited the /boot/grub/menu.lst file to set /dev/sd3 as the boot disk.
  • Now I can "option key choose my OS
  • Now that both operating systems work fine, I took the time to install all security updates for both operating systems
I would have liked to go through these steps again, leaving out the first false-step Ubuntu install, but I need to install my dev tools on Ubuntu and get back to work.

Thursday, April 30, 2009

I switched to using RubyMine for all of my Rails development

I bought version 1.0 when it was released this week and since then I have stopped using TextMate (or occasionally NetBeans) for Rails development. The things that won me over: very fast interface, auto-complete, command-B to jump to method declarations, command-F12 for a popup mehtod list for the current file, pretty much full refactoring support, jumping between view and controller action code, etc. I still think that TextMate is much better for browsing large projects, but if you are coding, I think that RubyMine is much more productive.

Tuesday, April 28, 2009

Good article on the economies of scale

Although this article on Google and cloud computing sounds like a bit of an advertisement (it is!) it is also a good read. There is no way that individuals and all but a few companies can compete with Google on cost per computing unit (CPU, memory, data storage).

I am a huge fan of Amazon's EC2 services: easy to use and very flexible. I am not sure how well this will work, but I want to try using Amazon's Elastic MapReduce to produce data for a Java of App Engine application that I hope to have time to write in the next month or two (, that I mentioned in my last blog). Amazon charges for bandwidth reading from S3 and there will be a cost pushing data into Google's App Engine data store. It may well not make sense, cost wise, to split a system between two competing platforms.

My book is almost done

I just sent in Chapter 15 for "Scripting Intelligence for Web 3.0" to Apress yesterday. Now I just have to review some final edits. This has been a very fun book to write, and I must admit some sadness over finishing this project and having to move on. I cover a wide range of topics: text processing (NLP), Semantic Web and linked data, strategies for deploying Semantic Web applications, several strategies for implementing search, "scaling up", use of Hadoop for distributed data crunching (with material on using Amazon's Elastic MapReduce), etc.

One of the examples in my book is a rewrite of something that I have been playing around with for years using Common Lisp with WebActions and Portable AllegroServe: a personal system for keeping track of things of interest. I took years of sporadic Lisp hacking, took some of the best ideas, and ended up with a concise Rails application. I am thinking of writing a third iteration of this and making it public: I have a placeholder web app running on the Java edition of App Engine. I need to spend a few weeks clearing a backlog of customer work and then I want to take a crack at version 3. In the last couple of years, I have been doing a lot of Rails web UI programming using the Rails AJAX helpers and some custom Javascript. I don't have very much experience with GWT so working on will be a good excuse to study GWT and compare it with using Rails. The App Engine is an interesting platform and I think it is a fun challenge working within a software stack that limits developer options in return for high efficiency and very low hosting costs. The first thing I need to work out is implementing local search on top of JDO and the non-relational data store. Using Lucene is not a possibility, but it should be fairly easy to support full word and prefix match search on top of JDO.

Saturday, April 25, 2009

'Getting Stuff Done', new Ubuntu 9.04, OS X

I was an early Mac enthusiast (I wrote a successful Mac application in 1984) and long before that I bought a very early Apple II (serial number 71) and I wrote the simple little chess program that Apple gave away on the demo cassette tape for the Apple II. Anyway, I am pretty much into Apple products. During the later "dark ages" before Apple released OS X, I did use Windows NT (and later Windows 2000) and Linux for work and play. During this time, I developed a great 'getting stuff done' strategy: I booted NT for a few customer projects that needed Windows and when I wanted to play - otherwise I booted into a very stripped Linux install that only had what I need installed for work spurts.

After I finish work on my new book for Apress (soon!, probably in the next 2 weeks :-) except for ongoing work for two customers, I want to concentrate on a new business venture that only requires a development setup for Ruby, Rails, and Java. I work almost exclusively on my Mac laptop, using my desktop Mac only video editing (huge amount of disk space and memory) and a local Linux box when I need to test networked applications. BTW, I am writing this on a new Ubuntu 9.04 installation - a nice 6 month upgrade from the last release.

This experiment may not last more than a few months, but I want to have a only small OS X partition on my laptop with fun stuff, and a larger Ubuntu partition with Java, Ruby, IntelliJ, RubyMine, and a minimal set of tools that I need. I tend to work in 2 to 3 hour spurts on both customer projects and my own stuff. I don't like to check email and I have my wife screen my telephone calls during the work spurts. You can quote me on this: multitasking is overrated!

My Mac laptop is a great do-everything system, but I think that having different "fun" and "work" environments helps get more productive work done in less time. As long as I am sharing some personal philosophy on work and life, I find that minimizing the time watching TV also helps make more time for friends and family, sports, enjoying nature, gardening, cooking, reading, etc. As a computer scientist, I am into performance analysis, and as a person, the same kind of performance analysis is good to evaluate the benefit of time spent on various activities.

Thursday, April 23, 2009

Apache Mahout Scalable Machine Learning first public release

The Mahoot project has just made their first public release of scalable machine learning tools for the Hadoop platform. With Amazon's Elastic MapReduce, it is possible (for example) to make an 8 server instance 1 hour run for about a dollar - combined with Mahoot, I think that this is really going to open the door for individuals and small organizations to more effectively use machine learning. Good stuff! I have started to take a quick look at the code but I won't have time to try it out on Elastic MapReduce for a few weeks (I am finishing the last Chapter of my Intelligent Scripting for Web 3.0 book and then I have some production work to do - so no free time for a while!)

It is interesting in life how things often come together just when you need them. I have a business idea that I want to pursue using EC2 and Mahout will probably help with a small part of the system.

Tuesday, April 21, 2009

Configuring nginx for both Rails and Tomcat

This is easy to do but I had a few missteps in setting up nginx with both Rails and Tomcat so it is worthwhile documenting what I did. I have two cooking and recipe web portals, an older one written in Java ( and a newer one that uses the USDA nutritional database that runs on Rails ( These are low volume sites so I wanted to run both on a single inexpensive VPS.

Usually I run a few virtual domains on one Tomcat instance (requires a simple mapping in conf/server.xml between virtual domain names and subdirectories of $TOMCAT/webapps) but in this case, I only need to add a single virtual domain for my Java web app on a system that is already configured for Rails. So, no changes to server.xml are required and I deployed to $TOMCAT/webapps/ROOT. Then, the only thing left to do was add a section in my nginx.conf file for Tomcat running on port 8080:
    server {
listen 80;
root /usr/local/tomcat/webapps/ROOT;

access_log off;
rewrite_log on;

location / {
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
Both my wife and I are very good cooks, and writing two cooking/recipe web apps has been a fun hobby.