Mark Watson's artificial intelligence and Lisp hacking blog

Posts

Showing posts from 2009

My tech industry predictions for 2010

December 31, 2009

As both an author and as a technical consultant, I am fairly opinionated in terms of what I expect for next year (tomorrow!). Here are my predictions: There will be pressure to reduce IT expenditures. Increasing trend to favor outsourcing deployment platform (e.g., Google AppEngine and Heroku), infrastructure (e.g., Amazon AWS, RackSpace, both relational and NoSLQ data stores), and software as a service (e.g., CRM, etc.). Cloud computing gets more real. Research will be concentrated on shorter term profits rather than long strategy. China might be an exception to this: a friend reports that he has seen willingness in China to fund very long term Artificial Intelligence research. More people will spend more time using web based information and recreational resources using portable devices. There will be a shortage of wireless bandwidth in some areas. Use of the Java platform will stay strong, but with more emphasis on alternative languages like JRuby, Scala, and Clojure. Skills and educ

$1.5 trillion a year for "defense" spending, little money left for local governments; living locally; Happy Holidays

December 24, 2009

If you factor in the cost of the debt to pay for our military spending then I think that a reasonable estimate of our yearly defense spending is about $1.5 trillion. The amount we spend on defense swamps other US government spending, including social programs. This is just my opinion, but I believe that we could keep our country relatively safe (compared to other countries) and spend far less money. The problem is basically banana republic style corruption: too much money is made by special interests for there to be any meaningful reform of military spending. The same comment is also true of other corporate interests like Wall Street, industrialized food production, pharmaceutical companies, insurance, etc. This is all enabled by total corporate control of the news media and shameful corruption in the lobbying industry and our federal government. Fortunately, for most of us, life is still very good despite corruption of the world's "elite." Again, this is just my personal

Building the EtherPad system and perusing the source code

December 18, 2009

The EtherPad collaboration-enabled online system is now open source . Cool. Google bought the company, the developers are joining the Wave team, and their product is now released under the Apache 2 license. Be sure to follow the instructions (failing to set the environment variable for the path to a MySQL client JAR file produces a strange "cp -f" error that has hung up a few people trying to build the system, as reported on Hacker News). It only took about 15 minutes to download the source and build the system - simple enough. After running the system and trying it, I used IntelliJ 9 to set up a project (choose project from existing source, main directory etherpad/trunk) and I am spending some time perusing the Scala and JavaScript code. Really nice looking code base, and reading through the Scala code will be an education. My JavaScript skills are a little weak, but I might still take a careful look at the JavaScript code to understand how they used Comet.

Amazon Elastic Load Balancing (ELB) is pretty cool

December 14, 2009

Using this service costs $0.025/hour so it may make sense to just run HAProxy yourself on a EC2 instance, but then you have to worry about fault tolerance/recovery if that instance fails. The ELB cost is small in comparison to running a cluster of EC2 instances and "outsourcing" as much of your system as possible to AWS (e.b., SimpleDB, Elastic Load Balancing, Relational Database Server, EBS, etc.) can certainly reduce both the complexity of your application architecture and also your implementation costs. Here are my notes for a simple ELB setup for an AMI that contains a Rails web application: export EC2_PRIVATE_KEY=pk-....pem # different on your system export EC2_CERT=/Users/markw/.ec2/cert-...pem # different on your system ec2run -k gsg-keypair ami-e767ds71 ec2run -k gsg-keypair ami-e767ds71 Note: specifying "gsg-keypair" matches later doing ssh -i ~/.ssh/id_rsa-gsg-keypair ... elb-create-lb MarkTest123 --headers --listener "lb-port=80,instance-port=42

The cost of commoditization of computing: infrastructure and software

December 14, 2009

Discipline, a new view of system architecture, and rigorous automation procedures are required to take advantage of Amazon EC2, Google AppEngine, and other infrastructure as a service providers. Last week a customer commented on the rapid pace of Amazon's innovation. Yesterday they announced a new way to generate revenue from unused server instances by letting users bid on a spot market for unused EC2 instances. Discipline When you own your own server farm, even if it is just a few back room servers, you can spread out applications over your servers in a haphazard way and usually get away with some sloppiness in your process architecture. When you are dealing with someone else's infrastructure a more disciplined approach is just about mandatory. New view of system architecture Both Google and Amazon have published papers on dealing with very large scale geographically disperse systems comprised of many components, some of which are guaranteed to fail. These companies have al

IntelliJ version 9.0

December 13, 2009

This week JetBrains gave me an upgrade license for version 9 of IntelliJ. I don't do too much Java development anymore - mostly maintenance on some of my old projects and new AppEngine and Google Wave development. Overall, version 9 is a nice upgrade: nicer source code repository integration, built in task management, the IDE seems faster, etc. I use the JetBrains RubyMine product almost every day. Long term, my use of IntelliJ 9 will depend on how good the AppEngine support is. As a test, I generated a new AppEngine project, added some home page material, edited the appengine-web.xml file to specify a registered app name, set the version, then deployed to Google's servers with no problems. I did have a problem importing an AppEngine project from Eclipse but eventally realized that I needed to go to Module Settings -> Artifacts, and drag "Available elements" from the right window pane to the <output root> tree display in the left pane. I tried writing a simpl

Balkanisation of Ruby?

December 07, 2009

When I first started using Ruby, Matz's C-Ruby was mostly the only game in town. I am also an enthusiastic user Ruby 1.9.x, JRuby, and MacRuby. Seeing the Ruby spec being developed in Japan under government funding, and that it is for Ruby 1.8.7, I get a feeling of déjà vu as a long time Lisp user. The balkanisation of Lisp has been more than a small nuisance for me in the last 25 years. I would prefer that all Ruby implementations eventually implement a common specification, but I would rather it be for something that looks like 1.9.x.

Privacy and Security in the Internet Age

December 06, 2009

Just some advice that I give friends and family: Delete all cookies in your browser every week - it is easy enough to sign in again to web sites that require authentication. People who do not delete their cookies never see what sites are tracking them. It is easiest to do a 'delete all cookies' operation and not to try to save the 5 or 10 cookies out of thousands that are stored in your local browser data. Keep a text file with all passwords in encrypted form - and, do not use the same password for different purposes. Every time you use your super market's discount card (or possibly pay with a credit card), your purchases are permanently associated with you - do you care? maybe or maybe not. I do use a lot of web services that track what I do (GMail, for example) but I make the decision to give up privacy vs. benefits on a service by service basis.

Coolness! good instructions for trying Rails 3.0pre

December 06, 2009

This is worth sharing: easy to follow instructions for installing Rails 3.0pre . Thanks to Oscar Del Ben! Yesterday I spent 90 minutes trying a fresh install of Ruby 1.9.1 and Rails 3.0pre with no luck, so thanks to Oscar for his writeup. BTW, I Used Ruby 1.8.7 when following Oscar's directions.

New Amazon Web Services feature: boot from EBS

December 03, 2009

Awesome - in the past I have had to write my own code/scripts to manage attaching and EBS file volume to a new EC2 instance. This new feature will make it a lot easier to manage your own EC2 based services. If you would prefer a PDF with the complete documentation, then use this link . This new feature will make using EC2 even easier for some applications - a welcome change. That said, I have my own scheme for automatically mounting EBS volumes, assigning ElasticIP addresses, etc. and for some deployments I will continue to use temporary boot volumes and pull AMIs from S3.

Playing with Chrome OS

November 25, 2009

I have to say that even in its alpha (or beta?) version, Chrome OS looks good. I like the home "apps page" and except for not having an application that is a terminal emulator, it is fairly complete: my calendar, google docs, etc., are all available, and web browsing is fast. Again, given a good terminal program for use cases where I need to quickly SSH to a server, I can see a light weight and low power netbook nicely augmenting my laptop + external monitor setup.

I am watching the live Chrome OS Webcast

November 19, 2009

For end users, Chrome OS is a great idea - I would argue that most of my friends and relatives would be better off not running Windows, OS X, or (full) Linux. What about software developers: still useful, but not a replacement for a laptop. In a pinch, assuming a terminal window to run remote bash shells, Emacs, etc., I could still get work done while travelling. Still, for me, a MacBook with Ubuntu and OS X with RubyMine, IntelliJ, Eclipse, OmniGraffle, etc. is just about perfect for my workflow. That said, I will buy a Chrome OS netbook when they are available.

Hosted MongoDB and CouchDB

November 17, 2009

After I finish up some client work this morning, I am planning on finishing a DevX article on using Heroku as a deployment platform. Since deploying to Heroku is so simple and so well documented, you might think that I would have a difficult time writing new material :-) After a short tutorial on getting started, I am writing mostly about using both CouchDB and MongoDB as data store, either hosted yourself on EC2 (or another server external to Heroku, which is itself hosted on EC2) or commercial managed solutions like Cloudant for CouchDB and MongoHQ for a managed MongoDB service. I like to manage my own and customer deployments on EC2 - frankly, it is fun :-) That said, I think that there are sometimes business reasons for using hosted solutions like Heroku, Cloudant, and MongoHQ. It is a balance between development and admin costs and paying for managed platform as a service offerings.

nice: Rubymine 2.0 released

November 17, 2009

I use Rubymine for most of my Ruby/Rails/Sinatra development on Ubuntu, and use it in conjunction with TextMate on OS X. I find it convenient enough to alternate between TextMate when I don't need IDE features, and Rubymine when I do. One of the biggest improvements is that indexing now occurs in the background and auto-complete and other features become available that depend on knowledge of an application and the gems that it uses. This is subjective, but once Rubymine 2.0 loads up and is done with any background indexing then the CPU use is minimal, and I think improved from earlier versions (nice to not have the fan kick in on my laptop when the CPU cores heat up). For the Rails application that I am coding on right now, Rubymine is using about 360MB of resident memory - this is OK with me.

MongoDB has good support for indexing and search, including prefix matching for AJAX completion lists

November 11, 2009

I have been spoiled by great support for indexing and search in relational databases (e.g., Sphinx, native search in PostgreSQL and MySQL, etc.) I was pleased to discover, after a little bit of hacking this morning, how easy it is to do indexing and search using the MongoDB document-centered database. I have two common use cases for search, and MongoDB seems to handle both of them fairly well: Search for words inside of text fields Efficient word prefix search to support AJAX "suggest" style lists My approach does require combining search results for multiple search terms in application code, but that is OK. Assuming the use of MongoRecord, here is a code snippet: class Recipe < MongoRecord::Base collection_name :recipes fields :name, :directions, :words def to_s "recipe: #{name} directions: #{directions[0..20]}..." end def Recipe.make collection, name, directions collection.insert({:_id => Mongo::ObjectID.new, :name => name,

How to install CouchDB + nginx + basic authentication on EC2, including a Ruby client

November 10, 2009

Please note that if want to more secure installation, SSL should also be installed following these instructions (I used these instructions and another web blog to create the following abbreviated instructions). For my purposes, basic HTTP authentication is good enough. I assume that you are used to using nginx and CouchDB and either installed them from source or using apt-get . I am using Ubuntu, so you might have to modify these instructions slightly. On my laptop, I created a simple crypt program because OS X does not include one: #!/usr/bin/perl print crypt($ARGV[0],$ARGV[0])."\n"; After giving this script execute permissions, I created an encrypted password: crypt my12398pass61 You should save the output because on your EC2 instance you need to, as root or sudo, edit the file /etc/nginx/htpasswd adding a line: couchclient:myEKNgP2ivVVo where myEKNgP2ivVVo was the output from crypt for the plain text password my12398pass61. Then edit nginx.conf file adding something lik

"always on" MongoDB installation on my laptop

November 08, 2009

I spend a lot of time experimenting with infrastructure software, sometimes for customer jobs and sometimes just because it is fun to learn new things. For non-SQL data stores, I have spent a lot of time in the last year experimenting with and using CouchDB, AppEngine datastore, Tokyo Cabinet, MongoDB, Cassandra, and SimpleDB. Tokyo Cabinet and SimpleDB store hash values as strings, and don't have the great client APIs that the others have because limitations in string-only hash values. That said, for an Amazon hosted application SimpleDB can be a good choice and Tokyo Cabinet is light weight and easy to install and use. Casandra looks great, and as I have written about here before , Cassandra is easy to use from ruby and has great features. MongoDB has great performance and similar capabilities as Casandra. Chris Kampmeier has a great writeup that covers installing MongoDB on OS X, including setting it up as a system service. I followed Chris's directions. A pleasant surprise

Using nailgun for faster JRuby startup

October 27, 2009

I finally got around to trying nailgun tonight. On OS X with JRuby 1.4.0RC2, I built nailgun using: cd JRUBY_HOME/tool/nailgun ./configure make # I ignored the warning "no debug symbols in executable (-arch x86_64)" In one terminal window just leave a nailgun server running: $ jruby --ng-server NGServer started on all interfaces, port 2113. When you want to run JRuby as a railgun client, try something like: jruby --ng text-resource.rb On my MacBook, this cuts about 5 seconds of JRuby startup time off of running this test program. Sweet. For small programs, using ruby is still faster than jruby but this makes developing with JRuby faster.

I just tried Amazon's new Relational Database Service (RDS)

October 27, 2009

Amazon just released a beta of their Relational Database Service (RDS). You pay by the EC2 instance hour, about the same cost as a plain EC2, but about $0.01/hour more for a small instance, plus some storage costs, and bandwidth costs if you access the database outside of an Amazon availability zone. RDS MyQL compatible (version 5.1) and is automatically monitored, restarted, and backed up. Currently, there is no master slave replication, but this is being worked on (RDS beta just started today). Here are my notes on my first use of RDS: Install the RDS command line tools rds-create-db-instance --db-instance-identifier marktesting123 --allocated-storage 5 --db-instance-class db.m1.small --engine MySQL5.1 --master-username marktesting123 --master-user-password markpasstesting123 Wait a few minutes and see if the RDS instance is ready: rds-describe-db-instances Open up ports for external access, if required (note, here I am opening up for world wide access just for this test): rds-autho

Securing your Mac laptop

October 27, 2009

Laptops get lost and stolen a lot. I am extra careful with my laptop because I keep so much of my and my customer's private data on it. I take a few steps to protect this information that I want to share with you (Mac OS X specific): I keep a small encrypted disk image that contains all my passwords and other sensitive information. It also contains my .ec2, .s3cfg, .profile, .ssh, .gnupg, and .heroku files. Then in my home directory I make soft links ln -s ... to these files. I do not keep the password for this disk image in my OS X keychain! It is a very small hassle: each time I boot up, I mount this image so my .ssh, etc. files are available. This adds 10 seconds of "overhead" to each time I boot my laptop. Whenever I start working for a new customer, I ask them if they would like me to also keep their working materials encrypted (some overhead involed, so I like to ask them if I should spend the time doing this). Update: a reader pointed out that this is

More getting stuff done by doing what I most want to do experiments

October 24, 2009

I read an interesting article a few weeks ago (sorry, no attribution - can't find the article again) about trying to always do what you want to be doing. I used to do "round robin" style scheduling of my time: keeping a single to-do list and cycling through it (and sometimes just finishing small tasks outright). I have always thought that I needed to apply some meta-level discipline to get tasks that I don't enjoy as much done in a timely way. Scheduling work is not so difficult because I usually have just 3 or 4 active customers, and I enjoy most of my work. Other things like yard work (I prefer new projects over maintenance) got the round-robin treatment, and even recreation (I like to hike, cook/eat, read, and watch movies) activities used to be scheduled round-robin style to a (very) small degree. Lately, I have been experimenting with not doing any meta-level scheduling. Now when I finish an activity I start the new activity that is what I most want to do. The r

RDF datastores are noSQL also - always keep an RDF data store service running

October 23, 2009

We tend not to use things that are not "ready at hand." RDF datastores are noSQL also :-) I always keep Sesame running as a service just as I run PostgreSQL and MySQL services. Some things are better stored, queried, and maintained in a graph database. If you always have something like Sesame (or the free edition of AllegroGraph ) running as a service, and if you have client libraries installed for your favorite programming languages then it is easier to quickly choose the best data store for any given task. BTW, I also always keep a CouchDB service running.

Cloud computing options and portability

October 18, 2009

I listened to Paul Miller's podcast with Rackspace's president of their Cloud Division Lew Moorman this morning. I mostly agree with his comments on easy portability between Rackspace cloud services and Amazon's EC2. I have not yet used Rackspace's cloud offerings, so my comments here are based on their documentation and a conversation I had with one of their support engineers (for one of my steady customers: I declined some work tasks to move to Rackspace because I don't like to spread myself too thin: I spend a lot of effort staying up to speed on Amazon and AppEngine, so I prefer to specialize on those two deployment platforms). The advantage of Rackspace is the binding of a persistent disk volume with their virtualized server instances (really, they offer a standard sort of VPS hosting service) where with Amazon it takes a little extra work to manage EBS volumes separately. For me, I like the benefit of Amazon's SQS, S3, and Elastic MapReduce - that said, I

Switching an AppEngine project from JRuby+Sinatra to Java+JSP

October 17, 2009

It is a bit of a pain to take several hours to convert a working codebase in one language/platform to another. I kept having small problems with JRuby and Sinatra that were just AppEngine specific (Ruby (or JRuby) and Sinatra are awesome). I am only about 20% into development, and I decided that I wanted really solid tools/platform. Also, converting working code in one language to another is simple. What convinced me to make the switch is that Java + Eclipse plugins support is just so good for AppEngine development, that for now the change seems like a good decision. For my next AppEngine project, I'll probably go back to JRuby + Sinatra since the support is getting better.

I built the open source IDEA 9.0 git snapshot - works fine

October 17, 2009

Something to do while watching TV :-) With the Apache 2.0 license, it will be interesting to see how it is used. It is a large git clone, but built easily using ant. I get a free commercial license for IntelliJ IDEA (as I used to get free Enterprise JBuilder licenses from Borland) but I still plan on following the open source IDEA project - hopefully interesting things will happen! I use Eclipse a lot just because the Java AppEngine support is so very good, but for plain old Java coding, I like IDEA. The open source edition of IDEA is great for plain old Java coding, BTW, but is missing JSP + Tomcat development support (but NetBeans does a good job for J2EE-- development, and who does J2EE development anymore :-) It takes a while to do a git clone and build the IDE (builds versions for OS X, Windows, and Linux as the default ant build target) and since the build process is so easy, it was not much fun, so you might as well just download a built version for your OS platform if you don&

Nice tool for writing and maintaining documentation: YMUL web service and yumlcmd Ruby gem

October 14, 2009

Although some customers request using a Word Processor for producing documentation, if it is my choice I like Latex and OmniGraffle for producing diagrams. Latex is the fastest tool (that I use) for producing great looking print or PDF documents. I am experimenting with something else this morning: the YUML web app for creating UML diagrams and the yumlcmd Ruby gem (add http://gemcutter.org to your gem source and then gem install yumlcmd ). Thanks to Under the Hat for pointing these tools out - check their blog for directions. Although YUML hardly replaces OmniGraffle, it is cool to have documentation text based (Latex files and YUML files): faster, and less work.

Some frustration with JRuby + Rails on Google AppEngine

October 11, 2009

A few engineers at Google and other developers are doing some good work towards getting Rails running on AppEngine both robustly and in a way that provides a good local development environment. One problem is simply that if your web app is not active, initializing JRuby + Rails + and all required gems can time out (30 second window for handling requests). The Java and Python support for AppEngine is fantastic, but for two projects I want to do (my own projects, but may be revenue generating :-) I want a more agile programming language that Java and while my Python skills are sort-of OK, my knowledge of Django is very light. I should probably just bite the bullet and spin up on Django, but I would strongly prefer working in Ruby. I have been experimenting with the JRuby + Sinatra + ERB + datamapper combination and at least an inactive web application spins up well within the 30 second request timeout window. I very much like datamapper (object identity issues) and it should not be too d

Designing for scalability and platform portability

October 10, 2009

Once an application is designed and at least partially implemented, options for scalability and portability are reduced. If a system's usage profile can not be predicted, then deploying to physical servers is a real problem because you have to pay for support for peak usage periods - however, relying on cloud infrastructure can very much limit platform portability. It helps to consider scalability up front! Relying on scalable data store infrastructure like Googles AppEngine datastore or Amazon's SimpleDB can make life easier. For server side Java, coding to JPA makes it possible with some work to be portable between AppEngine datastore, SimpleDB, or using a traditional database on your own server. Some care needs to be taken to code to a subset of JPA (e.g., no cross domain queries in SimpleDB) if portability is important. In the Rails world, using Datamapper provides similar flexibility for portability between AppEngine datastore, SimpleDB, or a conventional database. And, ta

Storing Lucene indices in Cassandra; cloud versus running your own server farm

October 05, 2009

The Lucandra project looks very interesting, but is incomplete at this time (see the "to be dones" at the bottom of the linked page). Cassandra is a great project. I almost incorporated it into the design of a customer project recently, but we decided to host on Amazon so using their EC2, S3, SQS, and Electric Map Reduce services won out over rolling a custom stack. I think that this must be a start up dilemma: long term, it is probably least expensive running one's own small server farm, but when you are just getting started a "pay as you go" cloud approach using very solid infrastructure tools like EC2, S3, SQS, SimpleDB, etc. makes sense. I can't say this from personal experience, but my gut feeling is that if you can live within the constraints of Google's AppEngine, then it is probably less expensive using AppEngine than running your own server farm - even long term. BTW, if you have not read my DevX article on implementing search on the Java ve

Interesting new book: "Networks, Crowds, and Markets: Reasoning about a Highly Connected World"

October 01, 2009

This book will be published in 2010 but a complete pre-publication draft is available here . There is a PDF download link for the entire book near the top of the page. If you enjoyed reading Albert-László Barabási's classic book "Linked: The New Science of Networks" then this new book looks like a great followup. I have not got too far into David Easley's and Jon Kleinberg's new book yet, but the range of topics in this 800 page book looks like it will make a good read.

Using Facebook Connect just got a lot easier

October 01, 2009

A new set of tools makes it much easier to integrate Facebook Connect in your web sites. Good job Facebook. Their old APIs and support were somewhat difficult to work with. I would have added another example to my last book (about Web 3.0 stuff) if this had been available a few months ago.

Nice: RubyMine 2.0 will be a free upgrade

September 29, 2009

In a world filled with great free IDEs, JetBrains keeps being competitive with commercial IDE offerings. I think that their RubyMine product is hands down the best Ruby development environment (although I do sometimes use GEdit on Linux and TextMate on OS X). Offering the 2.0 upgrade for free (to be released in a few weeks) is a nice way to say thanks to their customers. A beta 2.0 download is available here . I'm running the beta right now, and it looks like a good upgrade.

My DevX article "Using Gambit-C Scheme to Create Small, Efficient Native Applications" is now online

September 12, 2009

My article is a quick introduction to Scheme, and then some examples building small compiled applications in Scheme. Gambit-C Scheme compiles to C, and the generated C code is then compiled and linked. When I need to use Lisp, I tend to use Common Lisp for large applications and Gambit-C Scheme for small utilities. For me, being able to use a high level and expressive language like Scheme to build efficient and compact applications is a big win :-) I find the development environment of Gambit-C Scheme with Gambit-C's Emacs support to be very productive. Marc Feeley, the developer of Gambit-C, mentioned to me that several companies are doing product develop in Gambit-C Scheme. I have a NLP toolkit that I have ported to Gambit-C and I hope to get the time to finish and "ship" it sometime this year.

Adventures of living in the mountains: heavy monsoon rains and flooding

September 10, 2009

We live in the mountains of Central Arizona (Sedona) - great area with mountains, trees, water for kayaking, etc. We do have our problems though. I was hiking with friends this morning, beautiful day. Fast forward to this afternoon: very heavy monsoon rains with lots of flooding: wash below our house overflowed into yards below us. We had water flowing through our yard, but it did not make it into our house house. We also ended up with several inches of accumulated hail on our deck and parts of our yard: the white ice looks like snow if you don't look too carefully.

I just read the text for Obama's speech to school kids: it is non-political and strong on American values

September 07, 2009

This is the text of the speech he is scheduled to give in a few days. Well worth reading since some crazy right wingers have been telling lies and sowing so much disinformation. I have very much appreciated some conservative pundits who have publicly called this disinformation "stupid." Not all conservatives put the well being of their political party above the well being of our country. Good for them for speaking up! When I was in grade school, a friend's father, who was conservative, helped arrange for our whole class to get to see President John F. Kennedy speak. I think that my friend's Dad disagreed with President Kennedy on things political, but he wanted his son and his son's classmates to hear a President speak. There is a lot of anti-American rhetoric coming from some conservatives - it is up to the rest us, the majority I think, to speak up and point out stupidity when we hear it.

Very much liking Amazon EC2

September 06, 2009

I remain very enthusiastic about Google's AppEngine (and also I am very much enjoying my developer's Wave account). That said, Amazon's AWS services are having a much larger effect on my work for customers and my own work and research. AppEngine is great for some types of projects, but EC2 can be used for anything. I have a text mining experiment that have been planning for a while, and today I have some free time to start setting it up. I have 3 old desktop computers (with a reasonable amount of memory and disk) that I usually haul out of my closet, run "headless," and set up for text mining and machine learning projects. Although I own these boxes, there is a drawback to leaving them running for several weeks in my home office: noise, heat generation, messing up my work environment, etc. I did a quick calculation and estimated that if I instead use one EC2 instance, a reasonably large ESB disk volume, and Elastic MapReduce when I need it to make Hadoop Map Reduc

Giving something back

September 06, 2009

Kai-Fu Lee is leaving Google (he managed their China operations) to form an angel investing fund for young Chinese entrepreneurs. I have had some interest in Kai-Fu Lee's career since purchasing a copy of his doctoral thesis in the late 1980s on the Sphinx real time speech recognition system. I was looking at using time delayed neural networks for speech recognition, and Kai-Fu Lee's thesis was both interesting and inspiring.

great video talk: "Innovation in Search and Artificial Intelligence"

September 05, 2009

Peter Norvig's recent talk at UC Berkeley discussed how the effects of large data sets and increasing computer resources make it possible to achieve increasingly better modeling and predictive results. Well worth an hour to listen to. There were a lot of gems in this talk, but one that I may put to immediate use is using non-text data in map reduce, specifically using the protocol buffer tools. I have been using Hadoop more frequently and it is worth looking the effects of binary data for intermediate results. His comment that using map reduce is not necessarily incompatible with indexing data was also interesting. There is an overhead for creating indices, but it seems like there are opportunities to use indices for access to global information in a data set while making a complete sweep through the input data set during the map phase.

Easy installation is a form of elegance

August 31, 2009

Often my work tasks are relatively easy: read requirements, use my previous experience and perhaps some new research to identify the best tools/frameworks to use, do a quick design, and get the job done. In order for this to be a quick and efficient process, it is important to be able to install software tools and keep them up to date - while taking a minimum amount of my own time. I am currently running Ubuntu on all of my servers and customer servers, and it makes it faster to just remember how to do things for one distro. Using apt-get is fine for stable software installs, but when evaluating new tools it is usually best to get the most recent stable releases. I am evaluating both Tokyo Cabinet and Cassandra for a task right now, and needed to install Cassandra. Evan Weaver, who works for Twitter, has written a Ruby gem that downloads the Java code for Cassandra the first time you try to start Cassandra: gem install cassandra --no-ri --no-rdoc cassandra_helper cassandra Love it - a

Notes on using PowerLoom with SBCL Common Lisp

August 31, 2009

A while ago, I wrote Java wrappers for easily using PowerLoom from Java (see my Java AI book (free PDF download)). I am evaluating the use of PowerLoom on a customer project and spent a while this morning experimenting with PowerLoom (version powerloom-3.2.50) using SBCL Common Lisp. Since it took me a while to find how to do the things in Lisp that I am used to doing in Java, I thought that I would make some notes on what I did: Download and unpack the PowerLoom distribution. We will be using the example knowledge base file kbs/business.plm so you might want that open in a text editor to read through it. Start by running SBCL (lots of output removed for brevity): $ cd powerloom-3.2.50 $ sbcl This is SBCL 1.0.29, an implementation of ANSI Common Lisp. * (load "load-powerloom.lisp") * (STELLA::LOAD "kbs/business.plm") NIL * (PLI:S-ASSERT-PROPOSITION "(and (company c1) (company-name c1 \"Moms Grocery\"))" "BUSINESS" nil) |i|/PLI/@PL-IT

I have been very busy

August 30, 2009

Since my last book was published early this summer, I have had a full consulting work load - something that I very much appreciate, given the economic situation in the USA. I am helping two people with their start ups, building a modeling language in Common Lisp, and will soon start helping another company with semantic indexing tasks. I like to work between 25 and 30 hours a week, and I am in that sweet spot . On the educational front, I have just bought two books on subjects that I already know a lot about (Hadoop and Natural Language Processing (NLP)) but that I want to learn even more. Hadoop rocks, and I have been interested and working with NLP since 1984. I am enjoying working through both books. Carol and I have been enjoying our summer in Sedona: hiking, kayaking, gardening, and cooking.

Yes! Google Java AppEngine Plugin is now available for Eclipse 3.5

August 05, 2009

This is good news! The lack of this plugin has been holding me back from upgrading from Eclipse 3.4 to 3.5. For me, working with the AppEngine platform has really put the fun back into doing Java development.

Good advice: enjoy whatever you are doing at the moment

August 03, 2009

I was working on a customer's Rails application this morning, and I was totally enjoying myself (Ruby + Rails + TextMate + Solr), and I was perfectly happy with my work and tools. This afternoon, I was coding in Emacs + Slime + Gambit-C Scheme, finishing up the examples for a DevX article - once again perfectly happy with the tools that I was using. A long time ago, my Dad (a great guy, great scientist, and member of the national academy of science) gave me some very good advice: enjoy whatever you are working on. It was a very long time ago, but I remember the context clearly: I was in high school, complaining about too much calculus homework, and my Dad gave me "the advice." By the way, I am taking a trip to see my parents next week :-)

Google AppEngine and OpenID

August 02, 2009

This is good news . Existing support for Google universal login for AppEngine web applications is great, and new support for OpenID is better. I do have real security and privacy concerns when one account will log you into many services. I keep an encrypted text file on my laptop with all my web and server logins and use different random passwords for different services. Having many different logins does make online work more secure but this has to be balanced against saved time and greater convenience. At the end of the day, I feel more comfortable keeping logins for online banking, health record access, etc. separate, while enjoying single login experiences with services that are less sensitive.

Tools: experiment with many, master a few of them

July 31, 2009

I am admittedly a tinkerer: I really enjoy reading other people's code, experimenting with new languages and technologies. That said, I make the effort to really only master a few technologies (e.g., Ruby, Java, and Lisp for programming languages (my paid for work is split fairly evenly between these languages), and specialize in AI, cloud deployments, Rails and Java-based web apps). I may be transitioning to adopting two new core technologies. I have been using relational databases since 1986 and have a long term liking for PostgreSQL (less so, MySQL). The "non-SQL" meme has become popular with a lot of justification: for many applications you can get easier scalability and/or better performance on a single server using other types of data stores. Google's AppEngine datastore (built on their big table infrastructure) is clearly less convenient to develop with than a relational database but it may be well worth the extra effort to get scalability and very low hosting

If you have a Google Wave account, then try my Robot extension

July 27, 2009

I wrote a Robot extension that reads text on a wave (and any child blips added to it), and adds its own child blips with some NLP analysis (entity extraction and auto-tagging). No big deal, but fun to write. Give it a try: knowledge-books@appspot.com

Have fun with the AMI containing the examples from my latest Ruby book

July 25, 2009

I have prepared an Amazon Machine Image (AMI) with most of the examples in my Ruby book Scripting Intelligence: Web 3.0 Information, Gathering and Processing . Because I will be periodically updating the AMI, you should search for the latest version. This is simple to do: after you log in to the Amazon Web Services (AWS) Management Console, select “Start an AMI,” then choose the “Community AMIs” tab and enter markbookimage in the AMI ID search field. Choose the AMI with the largest index. I have Ruby, Rails, Sesame, Redland, AllegroGraph, D2R, Hadoop, Solr, PostgreSQL, Tomcat, Nutch, etc. pre-installed and configured. I use this AMI for new projects and for new experiments because it contains most of the tools and frameworks that I use. If you know how to use Amazon AWS, it is easy to clone your own copy with whatever additional software you need, hook up a persistent disk volume, etc. If you have not yet learned how to effectively use AWS, this might be a good time to do so. I like

Writing Wave robots that use blip titles and text

July 25, 2009

If you follow the Java Wave robot tutorial it is reasonably easy getting started. It took me a short while to get access to the titles and text of both new root blips (i.e., the start of a new Wave object) and child blips (i.e., new blips added to a root blip). Here is some code where I re-worked some of the example code (this is in the servlet that handles incoming JSON encoded messages from the Wave platform): public void processEvents(RobotMessageBundle events) { Wavelet wavelet = events.getWavelet(); if (events.wasSelfAdded()) { Blip blip = wavelet.appendBlip(); TextView textView = blip.getDocument(); textView.append("I'm alive and ready for testing"); } for (Event event : events.getBlipSubmittedEvents()) { // some of my tests: Blip blip = event.getBlip(); if (!blip.getBlipId().equals(wavelet.getRootBlipId())) { String text = blip.getDocument().getText(); makeDebugBlip(wavelet, "blip subm

Wave may end up being the new Internet coolness

July 23, 2009

I continue having fun "kicking the tires." I do wish that I had a completely local Wave robot development environment, but I expect that will be forthcoming. The edit, compile, run cycle takes a while because I need to: Modify robot code Build and upload the code to Java AppEngine Create new test waves, invite the robot, etc. The development cycle for Gadgets is quicker if you can simply remotely edit a Gadget XML file on whatever server you use to publish it. I am having a bit of an AppEngine performance issue. I am used to being able to cache (reasonably) static data in memory (loaded from JAR files in WEB-INF/lib). With AppEngine your web app can run on any server and web app startup time should be very quick (and doing on-startup data loading into memory from JAR files is not quick). I am not so happy doing this, but I may keep frequently used static data in the data store. I don't think that using JCache + memcached is an option because if I look up a key and it is n

Google Wave gadgets

July 21, 2009

The gadget tutorial was easy to follow. I am starting with the state-full counter example and experimenting with that. The makeRequest API can be used to call remote web services inside gadgets. Other APIs let you process events inside a wave (from user actions, new or changed content, etc.) Cool stuff. There are many gadget containers but I was never interested in writing them myself until I started experimenting with the Wave platform.

Cool: just wrote my first Google Wave "robot" JSON web service

July 21, 2009

It is a placeholder, for now, but it will eventually use my KBtextmaster code to perform natural language processing on new replies to a wave that has my robot added as a participant. By following these instructions it only took about 30 minutes to get this going (would have been 20 minutes, but I compiled the Java AppEngine web JSON web service with JDK 1.5 - a re-build with JDK 1.6, and everything worked as advertised). I have been working on the Common Lisp version of KBtextmaster in the last week, and the Java version badly needs a code cleanup also (both versions contain some of my code going back over ten years). I'll post the public URL for my robot in a week or so when I get a new version of KBtextmaster plugged in.

Book project, Google Wave, and a kayaking video

July 20, 2009

Except for some consulting work, my big project is a new book on using AllegroGraph for writing Semantic Web applications. Lots of work, but also a lot of fun. I received a Google Wave Sandbox invitation today. I am going to try to spend an hour or two a day with Wave to get up to speed. Fortunately, I am 100% up to speed using the Java AppEngine (initially, Wave Robots, etc. get hosted on AppEngine, either Java or Python versions) and I have some experience with GWT - so I should already be in good shape -- but I need to write some code :-) My wife took a short video of me kayaking yesterday .

Gambit-C Scheme has become my new C

July 19, 2009

I might be writing an article about this soon: Scheme is a high level language - great for all around development, and Gambit-C can (once an application is developed in a very productive Emacs + Slime + Gambit-C environment) be used to create small and very efficient native applications. BTW, if you use an OS X or Windows installer, also get the source distribution for the examples directory. In Unix tradition, I like to build a set of tools as command line applications, and Gambit-C is very nice for this.

Common Lisp RDFa parser and work on my new AllegroGraph book

July 18, 2009

I am working on a 'three purpose' task this morning: writing an RDFa parser in Common Lisp. I need this for my new book project (semantic web application programming with AllegroGraph), I need this for one of my own (possibly commercial) projects, and to release as an open source project. I am building this on top of Gary King's CL-HTML-Parser , so Gary did the heavy lifting, and I am just adding the bits that I need.

Measurement promotes success

July 09, 2009

Computer science involves effort measuring things: profiling code, tracking memory use, looking for inefficiencies in network connections, determining the number of database queries are required for rendering a typical web page in an application, etc. I have started also measuring something else: how I spend my time. I used to just track billable time and leave time spent learning new languages, new frameworks, writing experimental code, etc. as unmeasured time. I now use a time tracking application on my Mac Book to track 16 different categories (billable, and learning/research - I also track time on Reddit, Slashdot, etc.) The overhead for these measurements is probably about 2 or 3 minutes a day, plus a few minutes to look at time spent at the end of a day, end of a week, etc. For me, this is useful information.

Continuing to work on my AllegroGraph book

July 08, 2009

I started this book late last year, but set it aside to write my Apress Ruby book Scripting Intelligence: Web 3.0 Information, Gathering and Processing . I don't think that the market will be large for an AllegroGraph (AG) book, but after using AG on one customer project and experimenting (off and on) with it for several years, I decided that it was Semantic Web technology worth mastering. AG is a commercial product, but a free server version (supports Lisp, Ruby, Java, and Python clients) is available that is limited to 50 million RDF triples (a large limit, so many projects can simply use the free version). AG supports the Sesame (an open source Java RDF data store) REST style APIs so if you stick with SPARQL and only RDFS reasoning, you get portability to also use a BSD licensed alternative. That said, my reason for using AG is all of the proprietary extra goodies! In addition to a few Lisp, Python, Ruby, and Java client examples, I am going to incorporate a lot of useful Common

W3C killing off XHTML2 in favor of HTML5: bad for the Semantic Web?

July 07, 2009

As a practical matter, HTML5 looks good for writing human facing next generation web applications with multimedia support and more intuitive elements like <header>, <nav>, <section>, <footer>, etc. The problem that I have with the W3C's decision (assuming that I understand it correctly) is that at least in my opinion the value of the web goes way beyond supporting manual web browsing and enjoying digital media assets. I think that the web should evolve into a ubiquitous decision support system - this needs software agents that can help you no matter who's computer you may be using, what type of small device (phone, web pad) you may be using, etc. In this context, decision support means help in making dozens of decisions each day. User specific information filters, search agents, and personalized information repositories will require machine readable data with well defined semantics. One approach is to have content management systems like Drupal and Plone

PragPub - free monthly magazine for developers

July 01, 2009

While I love writing, I also like to read other people's efforts. I find that I learn a lot reading code that other people write. I started seriously reading other people's code in the 1970s - a habit I never outgrew. When I read what other people write, in addition to the content, I also pay attention to their writing technique: how they introduce a topic, make points, provide examples, the level of detail they use, etc. Check out the new PragMag - good reading.

My new APress book was released today: "Scripting Intelligence: Web 3.0 Information, Gathering and Processing"

June 30, 2009

APress web page for my book. And, an Amazon link.. There is a lot of challenging material in this book but I am hoping to save my readers some setup effort because they can use a pre-configured Amazon machine image (AMI) that I created with the book examples and required infrastructure software. Of course, you can also build the examples on your laptop or servers. I am setting up a book support web page with errata information. I plan on making entries for most questions from my readers that I answer via email.

USA: return to 'robustness'

June 29, 2009

With all of the problems that my country is facing, I am still optimistic, if: Parents do their job and turn off the TV after dinner. When I was in high school I did 2 to 3 hours a night of homework on week nights - that should, I think, be the norm for the new young generation. Young people do their job and squeeze every bit of value from the educational opportunities that they have at their disposal. Adults do their job and realize that education and job skills are something that they need to develop continually throughout their working lives. Be productive and prosper. Congress and our president: suck it up, stop being bought off, and do what is right. Look out for your karma, try not not be total assholes. Financial elite: realize that no matter how much money you accrue, your children and grandchildren need to live in this world so you should not ruin the world that they will need to live in. Suck it up and try doing the right thing for change. Look out for your karma, try not no

Tough choice in the USA

June 29, 2009

Here in the USA, we face a tough choice: to survive with any kind of lifestyle and robustness, we need to defund government spending on: Huge government subsidies to the Insurance companies, Beef Industry, etc. Subsidies may also take the form of not collecting a fair tax burden and for "under-regulating" corporations that strongly act against the public interest. Drastic curtailment of defense spending Elimination of the large amounts of money we give/loan to other countries to buy weapon systems from companies in the USA Why are these tough choices? Well, because Congress (and the executive branch under Bush and Obama) mostly look out for corporate interests and not citizens' interests - that is just the way it is. Also a tough choice because almost everyone is simply too lazy to spend the effort to personally lobby their elected representatives. Perhaps people get the government that they deserve.

My Java AppEngine article published; my wife's video; more good experiences with Heroku

June 27, 2009

I wrote an article for DevX on Java AppEngine that was just published. The example application implements simple document storage and search. I still very much like AppEngine, but most of my work right now involves Rails development. I am still waiting for a customer who needs AppEngine development :-) My wife has been helping a local non-profit organization ( Connections ) that uses Equine (horse) therapy to help both children and adults with disabilities - good work. I put her latest video on Youtube if you are interested. I continue to be very happy with Heroku - for a relatively small cost, my customer gets a good deployment platform that takes almost none of my time to use, thus saving them money. When it comes to deploying software, I do like control -- so, it seems strange that I am so happy with Heroku and AppEngine where you have to give up control in return for saving time and money.

ClioPatria semantic search web-server

June 19, 2009

Between 2003-2005 I often used Swi-Prolog for semantic web experiments before I more or less settled on using Sesame (and occasionally AllegroGraph and Redland). I just saw a link to the ClioPatria semantic search web-server project. Assuming that you have a fairly recent copy of Swi-Prolog installed, trying ClioPatria only takes a few minutes: git clone git://eculture.cs.vu.nl/home/git/eculture/ClioPatria.git cd ClioPatria/project-iswc ./configure /run.pl Then, point your browser to http://localhost:3020/ , load, for example, a OWL (XML format) data file, and try some queries. The default query language is SeRQL which I don't use, so I set the query language to SPARQL and all seems to work fine. One good thing about Swi-Prolog and the bundled SemWeb library is that loading RDF data and performing queries seems very quick compared to Sesame which is what I usually use. As a result, the ClioPatria web application is also very quick.

Opera Unite is an interesting idea

June 18, 2009

Unite is an interesting idea, letting non-techies easily share materials with friends and relatives. Opera provides an intermediary service: they map your account to whatever temporary IP address you have, deals with NATs, acting as a proxy between the web server running on your laptop and friends' and family members' browsers. While I have been getting very interested in cloud computing in the last 6 months (using Amazon's and Google's offerings, and I spent an hour with someone at Sun yesterday giving them usability feedback on their soon to be released cloud services web interface), I am also interested in peer to peer systems. With some inter mediation, Unite is sort-of peer to peer (except for the reliance on an intermediate proxy service, although in some circumstances UPnP is used and no proxy is required). It is fine that Opera provides the back end services for Unite inter mediation. I would also like to see open source implementations (could be as simple as a

Heroku: Rails hosting done right

June 13, 2009

I just did a test deployment for a customer on Heroku.com this morning. Lots of non-standard things in the web app, but it still deployed nicely after a short remote SMTP setup. I have been reading about Heroku's architecture, implementation on EC2, etc. for a long while, so getting to use Heroku was fun! For general development and flexibility I still like a semi-managed VPS ( RimuHosting.com is the best I have found so far) because I can run a mixture of Java, Rails, Squeak+Seaside, etc. and have a home for master git and svn repositories. That said, for deployment, a custom deployment architecture on EC2, or an abstract scalable platform like AppEngine or Heroku really does make a lot of sense. I enjoyed talking to two principles of Engine Yard (Ezra Zygmuntowicz and Yehuda Katz) at Merb Camp last year but I have not yet had an opportunity to use their platform on a customer job.

Google Translator Toolkit: wow!

June 09, 2009

My good friend Tom Munnecke (he took the great picture on my web site of my wife and I in front of the Taj Mahal) recorded an interesting interview last month with Peter Norvig who talked a lot about the Google's translation services and how they work (Tom has not posted his video interview yet - I will add a link when one is available). Anyway, it is a real pleasure this morning to actually get to experiment with the Translator Toolkit - very impressive. I have fairly good reading knowledge of French so I am experimenting with translating English to French. My Spanish is very rusty (and my FORTRAN is rustier still :-) but I will try that also.

Ruby client for search and spelling correction using Bing's APIs

June 08, 2009

I noticed that Microsoft allows free use of their search and spelling correction APIs. I just played with the APIs for a few minutes. Here is a Ruby code snippet that I just wrote: API_KEY = ENV['BING_API_KEY'] require 'rubygems' # needed for Ruby 1.8.x require 'simple_http' require 'json' def search query uri = "http://api.search.live.net/json.aspx?AppId=#{API_KEY}&Market=en-US&Query=#{CGI.escape(query)}&Sources=web+spell&Web.Count=4" JSON.parse(SimpleHttp.get(uri))["SearchResponse"]["Web"]["Results"] end def correct_spelling text uri = "http://api.search.live.net/json.aspx?AppId=#{API_KEY}&Market=en-US&Query=#{CGI.escape(text)}&Sources=web+spell&Web.Count=1" JSON.parse(SimpleHttp.get(uri))["SearchResponse"]["Spell"]["Results"][0]["Value"] end You need a free Bing API key - notice that I set the key value in my environment.

I avoid installing software with sudo

June 07, 2009

As a Linux user since the early 1990s (and a longtime OS X user), it was easy for me to get in the "./configure; make; sudo make install" habit, but I don't think that this is such a good idea for two reasons: Security: have you really read the source code to see what might be executed during "sudo make install"? I am constantly installing Ruby gems, infrastructure software, etc. and I often read code as an educational experience, but not for security. It is best to not run other peoples code as root. It is much easier for me to rebuild systems from backups when I "./configure --prefix=/home/mark/bin" (or wherever, but in my home directory). I used to like to keep my home directory fairly small so backups take up less space but now costs of external disks, remote storage like S3, etc. are so small, that it makes more sense to have my home directory to be more self contained. I also like to develop customer projects under a single master directory. It i

Chrome browser betas for OS X and Linux - very fast!

June 05, 2009

Very nice - even the betas that were just released are very fast. Interesting to see how fast the final releases will be... I had not tried Chrome before because of WAS (Windows Avoidance Syndrome). I like the minimalist Chrome UI - very nice. When I boot OS X, I also like to use the new Safari 4 beta. I have been very busy this year - not too much free time to try new things. Now that I am done working on my new book , I hope to have time to experiment with some technologies that I have not tried before like writing a web application that optionally uses Google Gears for local storage.

Open source, the gift economy, and the new world order

June 02, 2009

I just made a small donation to Canonical (good shepard for Ubuntu Linux) while I was installing some security updates. A good investment. As a few very large corporations continue to control resources and major infrastructure, I expect to see a trend towards small agile enterprises covering rapidly changing technology and business niches. I expect to see a three-way synergy between mega-size corporations, small agile businesses, and a mobile highly educated work force: all three sides win big. The losers in this new world order are the poor and the poorly educated workers who can not adapt to changing situations. I think that open source software, other key infrastructure supported by the users of the infrastructure, and a general gift economy will continue to reduce down to a minimum the cost of doing business. Again, the winners are both people who are well educated and prepared on a global scale to move quickly to take advantage of new situations, or people who prepare themselves

Scala really is a "better Java"

May 31, 2009

I have been so busy this year that I have slacked off on fully getting up to speed on Scala (and Haskell, for that mater). A few people have been working on a Sinatra clone in Scala. (Sinatra is a very nice minimalist Ruby web framework that I like a lot.) I grabbed a version of the Step project off the web today and I just had some time to play with it. You could not really write something like Step in Java without writing an interpreter. With Scala, you can add syntactic sugar similarly to Ruby. Step nicely emulates parts of Sinatra; here is a code block wrapped to handle an HTTP GET request mapped to the route "/form": get("/form") { Template.page("Step: Form Post Example", <form action='/post' method='POST'> Your name: <input name='name' type='text'/> <input type='submit'/> </form> <pre>Route: /form</pre>) } With Scala's ability to define op

Google's Wave platform

May 29, 2009

Yes, I should have gone to the Google I/O conference: I would have a sandbox Wave developers account right now. Oh well. It almost gives me a headache thinking about the server resources that will be required to support a world wide deployment and large scale adoption of applications built on Wave, end users of the basic Wave platform, etc. That said, I don't have to worry about how Google implements AppEngine, Wave, etc., and the details of how Amazon implements AWS. That is the point: Cloud resources place an abstraction barrier between developers and deployment concerns. As someone who actually enjoys dealing with server deployment issues, this is still a very good thing. Anything that lowers cost and makes development faster is a good thing, even when we have to leave doing some fun work behind us. I like that there are already two other companies besides Google that are implementing Wave protocols and services (I want access to that Emacs Wave client :-) Open source implementa

Google's (eventual) support for RDFa

May 16, 2009

I am glad to see this because it will encourage more web developers to add semantic markup. The last appendix in my new book (APress: "Intelligent Scripting for Web 3.0) briefly covers RDFa. In principle I prefer publishing separate RDF feeds but with support in Drupal for RDFa (and, I hope, other CMS systems like Plone) RDFa may become commonly used - a good thing if it happens.

My Ubuntu/OS X MacBook

May 03, 2009

I continue to be happy with my decision to set up a "work only" Ubuntu Linux partition on my MacBook. I have been using Linux since I downloaded a mini-Slackware distro in 1992 over a 2400 baud modem. Desktop Linux has come a long way! I did a few things to make my working environment nicer: Set the CAPS LOCK key to another control key Installed Google Desktop and set the instant search popup hot keys to "Hyper space" (command space, same as popup search on OS X) Set up IntelliJ/RubyMine/Eclipse for Ruby, Java, Scala, and App Engine development I did not even install the developer tools on my new OS X partition. I am just using OS X for video editing, photos, and video conferencing with my family.

I just switched my MacBook over to Ubuntu

May 01, 2009

I made three image backups of the hard disk on my MacBook (yeah, I am careful like that) and just used BootCamp to set up a small OS X partition and an Ubuntu 9.04 partition. This worked for me, but I make no guarantees for you (<grin>): Boot with OS X install DVD, wipe the disk, and installed to the entire disk Without doing anything else, I ran BootCamp and grabbed about half the disk for "Windows" I inserted a Ubuntu install CDR, and chose "advanced" partitioning. I deleted the fat partition on /dev/sd3 and made it a bootable ext3 partition with "/" as the mount point When I boot my Mac, I hold down the option key to switch between OS X and "Windows" (ha!). This is really non-optimal, but I had a false start: the first time I ran the Ubuntu installer, it recognized OS X and offered an express install. Anyway, I ended up with a small 2 gig partition /dev/sda4 that BootCamp boots the first Ubuntu install. I simply edited the /boot/grub/men

I switched to using RubyMine for all of my Rails development

April 30, 2009

I bought version 1.0 when it was released this week and since then I have stopped using TextMate (or occasionally NetBeans) for Rails development. The things that won me over: very fast interface, auto-complete, command-B to jump to method declarations, command-F12 for a popup mehtod list for the current file, pretty much full refactoring support, jumping between view and controller action code, etc. I still think that TextMate is much better for browsing large projects, but if you are coding, I think that RubyMine is much more productive.

Good article on the economies of scale

April 28, 2009

Although this article on Google and cloud computing sounds like a bit of an advertisement (it is!) it is also a good read. There is no way that individuals and all but a few companies can compete with Google on cost per computing unit (CPU, memory, data storage). I am a huge fan of Amazon's EC2 services: easy to use and very flexible. I am not sure how well this will work, but I want to try using Amazon's Elastic MapReduce to produce data for a Java of App Engine application that I hope to have time to write in the next month or two ( MyThingsOfInterest.com , that I mentioned in my last blog). Amazon charges for bandwidth reading from S3 and there will be a cost pushing data into Google's App Engine data store. It may well not make sense, cost wise, to split a system between two competing platforms.

My book is almost done

April 28, 2009

I just sent in Chapter 15 for "Scripting Intelligence for Web 3.0" to Apress yesterday. Now I just have to review some final edits. This has been a very fun book to write, and I must admit some sadness over finishing this project and having to move on. I cover a wide range of topics: text processing (NLP), Semantic Web and linked data, strategies for deploying Semantic Web applications, several strategies for implementing search, "scaling up", use of Hadoop for distributed data crunching (with material on using Amazon's Elastic MapReduce), etc. One of the examples in my book is a rewrite of something that I have been playing around with for years using Common Lisp with WebActions and Portable AllegroServe: a personal system for keeping track of things of interest. I took years of sporadic Lisp hacking, took some of the best ideas, and ended up with a concise Rails application. I am thinking of writing a third iteration of this and making it public: I have a plac

'Getting Stuff Done', new Ubuntu 9.04, OS X

April 25, 2009

I was an early Mac enthusiast (I wrote a successful Mac application in 1984) and long before that I bought a very early Apple II (serial number 71) and I wrote the simple little chess program that Apple gave away on the demo cassette tape for the Apple II. Anyway, I am pretty much into Apple products. During the later "dark ages" before Apple released OS X, I did use Windows NT (and later Windows 2000) and Linux for work and play. During this time, I developed a great 'getting stuff done' strategy: I booted NT for a few customer projects that needed Windows and when I wanted to play - otherwise I booted into a very stripped Linux install that only had what I need installed for work spurts. After I finish work on my new book for Apress (soon!, probably in the next 2 weeks :-) except for ongoing work for two customers, I want to concentrate on a new business venture that only requires a development setup for Ruby, Rails, and Java. I work almost exclusively on my Mac lap

Apache Mahout Scalable Machine Learning first public release

April 23, 2009

The Mahoot project has just made their first public release of scalable machine learning tools for the Hadoop platform. With Amazon's Elastic MapReduce, it is possible (for example) to make an 8 server instance 1 hour run for about a dollar - combined with Mahoot, I think that this is really going to open the door for individuals and small organizations to more effectively use machine learning. Good stuff! I have started to take a quick look at the code but I won't have time to try it out on Elastic MapReduce for a few weeks (I am finishing the last Chapter of my Intelligent Scripting for Web 3.0 book and then I have some production work to do - so no free time for a while!) It is interesting in life how things often come together just when you need them. I have a business idea that I want to pursue using EC2 and Mahout will probably help with a small part of the system.

Configuring nginx for both Rails and Tomcat

April 21, 2009

This is easy to do but I had a few missteps in setting up nginx with both Rails and Tomcat so it is worthwhile documenting what I did. I have two cooking and recipe web portals, an older one written in Java ( CJsKitchen.com ) and a newer one that uses the USDA nutritional database that runs on Rails ( CookingSpace.com ). These are low volume sites so I wanted to run both on a single inexpensive VPS. Usually I run a few virtual domains on one Tomcat instance (requires a simple mapping in conf/server.xml between virtual domain names and subdirectories of $TOMCAT/webapps) but in this case, I only need to add a single virtual domain for my Java web app on a system that is already configured for Rails. So, no changes to server.xml are required and I deployed to $TOMCAT/webapps/ROOT. Then, the only thing left to do was add a section in my nginx.conf file for Tomcat running on port 8080: server { listen 80; server_name cjskitchen.com www.csjkitchen.com; root