Thursday, April 30, 2009

I switched to using RubyMine for all of my Rails development

I bought version 1.0 when it was released this week and since then I have stopped using TextMate (or occasionally NetBeans) for Rails development. The things that won me over: very fast interface, auto-complete, command-B to jump to method declarations, command-F12 for a popup mehtod list for the current file, pretty much full refactoring support, jumping between view and controller action code, etc. I still think that TextMate is much better for browsing large projects, but if you are coding, I think that RubyMine is much more productive.

Tuesday, April 28, 2009

Good article on the economies of scale

Although this article on Google and cloud computing sounds like a bit of an advertisement (it is!) it is also a good read. There is no way that individuals and all but a few companies can compete with Google on cost per computing unit (CPU, memory, data storage).

I am a huge fan of Amazon's EC2 services: easy to use and very flexible. I am not sure how well this will work, but I want to try using Amazon's Elastic MapReduce to produce data for a Java of App Engine application that I hope to have time to write in the next month or two (, that I mentioned in my last blog). Amazon charges for bandwidth reading from S3 and there will be a cost pushing data into Google's App Engine data store. It may well not make sense, cost wise, to split a system between two competing platforms.

My book is almost done

I just sent in Chapter 15 for "Scripting Intelligence for Web 3.0" to Apress yesterday. Now I just have to review some final edits. This has been a very fun book to write, and I must admit some sadness over finishing this project and having to move on. I cover a wide range of topics: text processing (NLP), Semantic Web and linked data, strategies for deploying Semantic Web applications, several strategies for implementing search, "scaling up", use of Hadoop for distributed data crunching (with material on using Amazon's Elastic MapReduce), etc.

One of the examples in my book is a rewrite of something that I have been playing around with for years using Common Lisp with WebActions and Portable AllegroServe: a personal system for keeping track of things of interest. I took years of sporadic Lisp hacking, took some of the best ideas, and ended up with a concise Rails application. I am thinking of writing a third iteration of this and making it public: I have a placeholder web app running on the Java edition of App Engine. I need to spend a few weeks clearing a backlog of customer work and then I want to take a crack at version 3. In the last couple of years, I have been doing a lot of Rails web UI programming using the Rails AJAX helpers and some custom Javascript. I don't have very much experience with GWT so working on will be a good excuse to study GWT and compare it with using Rails. The App Engine is an interesting platform and I think it is a fun challenge working within a software stack that limits developer options in return for high efficiency and very low hosting costs. The first thing I need to work out is implementing local search on top of JDO and the non-relational data store. Using Lucene is not a possibility, but it should be fairly easy to support full word and prefix match search on top of JDO.

Saturday, April 25, 2009

'Getting Stuff Done', new Ubuntu 9.04, OS X

I was an early Mac enthusiast (I wrote a successful Mac application in 1984) and long before that I bought a very early Apple II (serial number 71) and I wrote the simple little chess program that Apple gave away on the demo cassette tape for the Apple II. Anyway, I am pretty much into Apple products. During the later "dark ages" before Apple released OS X, I did use Windows NT (and later Windows 2000) and Linux for work and play. During this time, I developed a great 'getting stuff done' strategy: I booted NT for a few customer projects that needed Windows and when I wanted to play - otherwise I booted into a very stripped Linux install that only had what I need installed for work spurts.

After I finish work on my new book for Apress (soon!, probably in the next 2 weeks :-) except for ongoing work for two customers, I want to concentrate on a new business venture that only requires a development setup for Ruby, Rails, and Java. I work almost exclusively on my Mac laptop, using my desktop Mac only video editing (huge amount of disk space and memory) and a local Linux box when I need to test networked applications. BTW, I am writing this on a new Ubuntu 9.04 installation - a nice 6 month upgrade from the last release.

This experiment may not last more than a few months, but I want to have a only small OS X partition on my laptop with fun stuff, and a larger Ubuntu partition with Java, Ruby, IntelliJ, RubyMine, and a minimal set of tools that I need. I tend to work in 2 to 3 hour spurts on both customer projects and my own stuff. I don't like to check email and I have my wife screen my telephone calls during the work spurts. You can quote me on this: multitasking is overrated!

My Mac laptop is a great do-everything system, but I think that having different "fun" and "work" environments helps get more productive work done in less time. As long as I am sharing some personal philosophy on work and life, I find that minimizing the time watching TV also helps make more time for friends and family, sports, enjoying nature, gardening, cooking, reading, etc. As a computer scientist, I am into performance analysis, and as a person, the same kind of performance analysis is good to evaluate the benefit of time spent on various activities.

Thursday, April 23, 2009

Apache Mahout Scalable Machine Learning first public release

The Mahoot project has just made their first public release of scalable machine learning tools for the Hadoop platform. With Amazon's Elastic MapReduce, it is possible (for example) to make an 8 server instance 1 hour run for about a dollar - combined with Mahoot, I think that this is really going to open the door for individuals and small organizations to more effectively use machine learning. Good stuff! I have started to take a quick look at the code but I won't have time to try it out on Elastic MapReduce for a few weeks (I am finishing the last Chapter of my Intelligent Scripting for Web 3.0 book and then I have some production work to do - so no free time for a while!)

It is interesting in life how things often come together just when you need them. I have a business idea that I want to pursue using EC2 and Mahout will probably help with a small part of the system.

Tuesday, April 21, 2009

Configuring nginx for both Rails and Tomcat

This is easy to do but I had a few missteps in setting up nginx with both Rails and Tomcat so it is worthwhile documenting what I did. I have two cooking and recipe web portals, an older one written in Java ( and a newer one that uses the USDA nutritional database that runs on Rails ( These are low volume sites so I wanted to run both on a single inexpensive VPS.

Usually I run a few virtual domains on one Tomcat instance (requires a simple mapping in conf/server.xml between virtual domain names and subdirectories of $TOMCAT/webapps) but in this case, I only need to add a single virtual domain for my Java web app on a system that is already configured for Rails. So, no changes to server.xml are required and I deployed to $TOMCAT/webapps/ROOT. Then, the only thing left to do was add a section in my nginx.conf file for Tomcat running on port 8080:
    server {
listen 80;
root /usr/local/tomcat/webapps/ROOT;

access_log off;
rewrite_log on;

location / {
proxy_redirect off;
proxy_set_header Host $host;
proxy_set_header X-Real-IP $remote_addr;
proxy_set_header X-Forwarded-For $proxy_add_x_forwarded_for;
Both my wife and I are very good cooks, and writing two cooking/recipe web apps has been a fun hobby.

Sunday, April 19, 2009

IntelliJ does a fairly good job on supporting Scala Lift projects

The first few times that I tried experimenting with Lift I was disappointed with the development environment options. I ended up just running Maven from a command line with the tutorial projects open in TextMate.

I had a much better experience this morning when I decided to check out the recently released version 1.0 of Lift. This worked fairly well for me:
  • Create a demo project as per the getting started instructions
  • Build and run the project using mvn jetty:run from the command line, then stop jetty
  • Create a new IntelliJ project from existing sources (I had problems creating a project from a Maven POM file, but it was probably something that I was doing wrong)
  • Open the IntelliJ Maven Projects tab and click the "circular arrows" menu bar icon to look for POM files in the current project
  • I was then good to go: editing Scala code, editing Lift HTML template files, and running the mvn jetty:run target in the IntelliJ Maven Projects tab
To be honest, Rails is by far my favorite web app framework but I do find myself going back to server side Java for much better runtime performance for some projects. Since I have been working through the excellent Programming in Scala book, and generally enjoying Scala, Lift is perhaps looking like a good option for higher performance web applications where performance is more important that minimizing development costs. One thing that may convince me to stay with plain old server side Java (POSSJ) is any issues with running under the Java version of App Engine (which I am starting to like). Deployment platform selection (choosing between managed servers, App Engine, or Amazon EC2) needs to be done early because App Engine does have limitations that prevents its use for some projects, no matter how cost effective it is. POSSJ web applications that use JDO (or JPA) can be sort-of portable on and off App Engine, but Lift's use of Scala actors (usually implemented with threads) may not play well with App Engine. On the other hand, Amazon has been consistently lowering costs for using EC2, etc. so having greater flexibility may win out over some cost savings with App Engine. (By the way, if you write and use Hadoop map reduce data crunching applications, Amazon Elastic MapReduce really is incredible - the more I use it, the better I like it!)

Monday, April 13, 2009

Lots of choices: managed hosting, EC2, App Engine; also unfair Sun criticism of App Engine's Java restrictions

I use a (semi) managed hosting company (RimuHosting) for customer and my own work: so much more convenient than running a co-located server! If a raid disk fails, I don't have to deal with it. I am also finding Amazon's EC2 to be a very low hassle hosting option (although Elastic MapReduce has given me some grief today but that was my own fault - once I learned how to use it, Elastic MapReduce is great). I have tried deploying one Java JSP-based web app to Google's Java version of App Engine - very straight forward and a fun experience.

One thing I read today that rubbed me the wrong way: Simon Phipps (Sun) was criticizing Google for restricting the JVM by not allowing some libraries. There are other issues like no threading and outgoing sockets that he did not mention. I totally understand why Google needs to protect their infrastructure by sand-boxing the Java runtime environment. (They had to do the same for Python.) Phipps must know this. Sun has announced recently their own "cloud platform" and it will be interesting to see if they restrict the Java runtime - if they don't I don't see how they can compete on price/performance. Can't live with Google's Java runtime restrictions? Then, use something more expensive :-)

Thursday, April 09, 2009

Good reading: Insoshi Rails source code

I saw today that the software for the Inoshi social networking web app was re-licensed from AGPL to MIT because the company decided stop trying to monetize this system. I have read the source code to many Rails applications (customer work and sometimes just to learn new techniques). Anyway, this source code base looks clean and well written and makes a good "read" if you are into studying other people's code.

Wednesday, April 08, 2009

I just tried the Java version of Google App Engine

Very nice. I just installed the Eclipse plugin for the Java version of Google App Engine and created a new web app using JSPs and some static content. I have not yet tried using the JDO and JPA based Java persistence libraries (with Big Table being the underlying data store). The deployment was very easy. The application dashboard and administration web applications are well done and the people who wrote the Eclipse plugin did a very good job - really slick.

With free pricing for low volume web applications and moderate pricing once you go over the free quotas, I would bet that a lot of Java developers will jump off of higher priced Java web app hosting services. For my own use, it is an open question how much I will use this service. I really enjoy configuring Linux servers, installing just what I need. I lease two Linux servers for running Rails and Java web apps for customers, my own stuff, and general experimenting and fun. App Engine may be too "abstracted" of an environment for me - I may change my mind if it is easy to:
  • Use BigTable via JDO or JPA
  • Easy to assign my own domains to App Engine web apps (I am sure that it will be)
  • That response is good (the admin web application provides statistics to measure this)
  • It is easy to upload data into BigTable, and export it
Although I base a lot of my daily work flow around various Google services, I always do like to feel that I am not locked in. For example, I like that it is very easy to backup my GMail and Google Documents to my laptop. So far, I don't see why Java web applications can not be written in a portable way making it easy to run on either App Engine or my own servers.

I don't even know where to begin comparing App Engine with Amazon's EC2 because I see using them for rather different purposes. I have a lot of learning curve time invested in EC2, S3, etc. and no time at all using the Java version of App Engine but hopefully in a month or two I can make an informed comparison. Using EC2 seems closer to the metal in the sense that you need to design an architecture that deals with failing server resources. My first impression is that App Engine abstracts away handling failures at the application level - a benefit, but with a loss of flexibility. So EC2, for me, has more of a "build it myself" feeling. If you enjoy hacking/configuring Linux then EC2 would seem to be your best bet.

Bummer, I missed out on the Java for Google App Engine beta

I guess that I need to be patient :-)

I have been pretty much into Amazon's Web Services (EC2, S3, etc.) but I have also been eagerly looking forward to trying Java for Google App Engine on a Java project.

I remember it taking a few weeks to get an original App Engine invite, so I may not have to wait too long.

Update: I received an invitation to the beta program 5 hours after I wrote this original blog entry :-)

Sunday, April 05, 2009

You never really know what technologies will win in the market place

Unless you have hindsight :-)

For a few decades, I have used Lisp map and reduce functions (map*** and reduce functions in Common Lisp and Scheme) and more recently the equivalents in Ruby (with the niceties of using code blocks).

Who would have predicted how important this pattern would be for scaling data crunching? Recently I have been getting (back) into Hadoop (a very high quality open source implementation of Google's file system and parallel map/reduce framework that lets you add you own map and reduce functions and not worry much about the scaling infrastructure) and also CouchDB that implements structured data views of partially structured data using map/reduce.

At the time (many years ago), I thought that Connection Machine style parallel data crunching would take over the world (*Lisp was very cool) but I was wrong about that: the Connection Machine relied on expensive proprietary hardware, and as often happens, technologies or markets don't develop until the price gets squeezed down.

Anyway, Alan Kay famously said that the best way to predict the future is to invent it, but for most of us there is always hindsight :-)

Friday, April 03, 2009

RubyMine 1.0 beta

I have been using the pre-release versions of RubyMine occasionally, just to try them out. The new new 1.0 beta is a large improvement, mostly because of much faster responses while editing, running tests, etc.

I have been doing a lot of Ruby development lately (a customer job and my book project) but I have mostly been using TextMate. In the past, I used NetBeans+Ruby a lot, but it was just not responsive enough (but a great IDE, none the less). I am too busy right now to change anything in my workflow, but when I get a chance I will switch over to using RubyMine.

Code completion works great for built in classes. It also seems to catch just about everything that it can from local code context. For example:
class String
include Stemmable
Now code completion on a string or variable set to a string picks up the Stemmable mix-ins. Yeah!

I also like that RubyMine come pre-configured with cvs, svn, and git.

Thursday, April 02, 2009

Amazon Elastic MapReduce

This is a good idea: Amazon has integrated Hadoop with a S3 data store back end. I think that this will be great for companies that only need to occasionally perform large scale parallel data processing. Server instances are billed in one hour increments so with some experience, it may be possible to estimate how many server instances to rent to get large jobs done using just less than an hour.

I have been enjoying using Amazon EC2, first on a customer job, and more recently preparing a custom Amazon Machine Image (AMI) with all of the example programs and systems pre-installed and configured for my new book "Intelligent Scripting for Web 3.0" for APress. It looks like everything will run fine in a small server instance so readers of my book can experiment with the example programs for ten cents an hour. (Or, spend the time to install everything on their own development system.) I am working on Chapter 13, so just two more chapters to finish :-)