Friday, February 29, 2008

Importance of understanding business and sales issues; dynamc languages

I tend to view work from a consultant's point of view, but this also probably applies to you if you work for a company: while staying on top of a few technologies is obviously important (I find that the combination of Ruby, Ruby on Rails, Java, and Common Lisp covers most of what I need to get just about any job done using tools that are at least reasonably appropriate), success in any information processing career requires more than just technical savvy:

What is much more difficult, but in some ways more fun and rewarding, is the effort to learn as much as possible about the business and sales processes relevant to each project. Ultimately, most software is written and maintained to meet business and sales goals, so it pays to understand non-IT related issues as well as technical IT issues.

One reason that I like to use Ruby is that the language is so concise and terse, that I can spend more time thinking about the larger issues of problems that I am trying to solve - the technical aspects of writing code are diminished.

Saturday, February 23, 2008

Ruby client code for accessing OpenCalais and Metaweb/Freebase web services

I wrote a Ruby API for accessing OpenCalais this morning. OpenCalais processes text and extracts semantic information.

On a slightly related subject: I have enjoyed experimenting with the Python Metaweb APIs in the last year, and I just wrote about in my AI blog about Christopher Eppstein's new ActiveRecord like API for accessing structured data in Freebase.

Thursday, February 21, 2008

Heavy weight Javascript client applications vs. lighter weight AJAX

I experimented with Mjt last year: Mjt is a client side template system: Javascript is used to merge data from JSON web service calls with templates to generate HTML - all in the browser (except for data fetched from a server). Mjt looks solid and has been fairy widely used; an alternative client side framework is Sun's experimental (not for production use!) Lively Kernel project. If you have not played with Lively Kernal, give it at least a one minute try - it uses the Morphic GUI framework, so if you have used Squeak, it will seem familiar.

The big problem, as I see it, of client side Javascript frameworks is issues of maintainability. I have worked with Javascript heavy web applications that other people originally wrote and they are definitely much more difficult to jump into, understand, and modify compared for example to AJAX heavy Rails applications or GWT web applications.

That said, there is something tidy about the idea of writing web applications in two intertwined but separate tasks:
  • Writing JSON web services and separately unit testing them
  • Interactively developing the client side with a framework like Mjt
I like to recognize technologies as early as possible that I might use in the future. Although I don't (yet) feel really comfortable working with frameworks like Mjt my gut feeling is that this is the future because it makes it easier to work with multiple languages and platforms for implementing web services and makes it easier to mix up data from multiple sources.

Microsoft, Yahoo attempted buyout

I have been following the attempted Yahoo buyout with great interest because I buy into the idea of universal access to online information using many types of devices: PCs, Macs, iPhones, Nokia N800s, secret decoder rings, etc.

In the future that I predict and look forward to, following and exploiting standards will be absolutely required for success. As part of my own research (and fun), I just about continuously try and evaluate every type of online information service (Amazon's web services, Google gdata,,, etc.)

Microsoft's seems to be getting better as far as supporting Mac, Linux, Firefox, etc. The question to me is: how open is Microsoft willing to become?

If I were to sit down and enjoy a beer with Bill Gates and Steve Balmer (unlikely unless they are vacationing in Sedona, Arizona) I would have some good advice for them: do a sea change and embrace open standards, stop selling new versions of Windows and instead sell yearly subscriptions to Windows and Office (slow improvements, no more big "XP", "Vista", etc. releases), and use their resources to make their software and infrastructure flexible, standard, and valuable to users.

If Microsoft does buy Yahoo, it will be interesting to see if they try to force changing to Microsoft infrastructure: they certainly had problems after buying Hotmail and doing a major conversion to Microsoft server side infrastructure. Yahoo is doing some great things with Open Source (Hadoop, Javascript libraries, etc.) and it will be interesting to see if Microsoft will permit using competing infrastructure software for internal systems.

Friday, February 15, 2008

My DevX article "Real-Life Rails: Develop with NetBeans, Deploy on Linux"

My most recent DevX article has just been published. This was fun material to write about because after some experimentation I feel like I have my Ruby on Rails development environment and server deployment strategy just right, at least for my needs. I should mention that although I have been professionally writing Ruby on Rails applications for a few years, I have not yet written an application that will not run nicely on a single server using nginx, memcache, and a few mongrels. I set my development.rb environment for my MacBook and my production.rb environment for the Linux server I am deploying to, and svn is the glue that holds everything together. If you are interested in deploying very large scale applications, my article will not be very useful to you.

IBM's Project Zero

IBM has an interesting idea with Project Zero, which borrows a lot from ideas behind frameworks like Ruby on Rails: use of a dynamic scripting language (Groovy or PHP), use of a "script aware" HTML template language, and built in support for REST and AJAX.

I worked through the tutorial that uses Groovy (instead of the other supported scripting language PHP), and my first impression is that the Eclipse plugin support is well done (although color and syntax support for editing templates would be good) and the framework meets its goals: support building interactive web applications with little required knowledge of the underlying technologies.

I would be more enthusiastic about Project Zero if I were a Groovy enthusiast. For Groovy loving developers, Project Zero looks to be very useful.

Friday, February 08, 2008

NetBeans 6.1 development build: almost there for my work

I just tried the daily dev build (NetBeans 6.1 Dev 200802080008) for OS X. It is almost there for my daily work - my current Java development project (a commercial version of my old NLBean open source project with a new AI NLP module), Scala coding experiments, and new Rails projects all work great. The one problem: I get errors when using existing Rails NetBeans projects (actually, I get the same errors when trying to modify project properties in new Rails projects but new projects can be created with the desired properties). Close, but not quite there. BTW, the Scala NetBeans plugins, which are very new, are looking very good.

Tuesday, February 05, 2008

PostgreSQL 8.3 on OS X: I like the full text indexing/search features

I built the latest version from source, with one problem: I was only able to install readline from source using "--disable-shared" so I ended up also building PostgreSQL statically linked - oh well so much for being in hurry, I have 2 gigs of RAM on my MacBook, so what is a little memory between friends :-)

I have been waiting for version 8.3 because of the full text indexing/search features. Here is the Text Search documentation - enjoy! Here is a little sample of the SQL extensions to support indexing and search:
test=# create table test (id integer, name varchar(30), email varchar(30));

test=# create index test_name_idx on test using gin(to_tsvector('english', name));
test=# insert into test values (1, 'Mark Watson', '');
test=# insert into test values (2, 'Carol Watson', '');
test=# select * from test where to_tsvector(name) @@ to_tsquery('mark');
id | name | email
1 | Mark Watson |
(1 row)

test=# select * from test where to_tsvector(name) @@ to_tsquery('watsons');
id | name | email
1 | Mark Watson |
2 | Carol Watson |
(2 rows)

test=# test=# select * from test where to_tsvector(name) @@ to_tsquery('mark & watson');
id | name | email
1 | Mark Watson |
(1 row)

test=# select * from test where to_tsvector(name) @@ to_tsquery('mark | watson');
id | name | email
1 | Mark Watson |
2 | Carol Watson |
(2 rows)

Obviously, if you were creating a new table with many rows, add the index after the data is added to the table. "gin" refers to a complete inverted word index. Specifying 'english' ensures that a word stemmer if used that understands English language conventions. Note that a search for 'watsons' matches because the search terms are stemmed before search.

The search syntax looks odd, but I expect to get used to it quickly. For Rails: I use "acts_like_ferret" a lot; I'll wait a month to see if any handy plugin is written for PostgreSQL specific search - I would rather that someone else write it. I need to check out acts_as_tsearch, but I don't think that it is updated yet to work with the final 8.3 release.

Monday, February 04, 2008

Snowing in Sedona Arizona

My wife and I took a drive this morning after it stopped snowing. Some nice pictures taken near our home in Sedona:
near Boynton Canyon
Dry Creek Road - looking west

XMPP (Jabber)

I had experimenting with XMPP on my long term list of things to do. I took a 90 minute break from work this afternoon to set up a playground: OpenFire XMPP server and the Ruby XMPP4r client library. Setting up the OpenFire service on one of my leased servers was easy - a very good administration web application and in general an easy install.

I had more problems with XMPP4r but setting Jabber::debug = true helped. I installed the easier to use wrapper library xmpp4r-simple but decided that its API was probably too limited (long term), so I might as well get used to XMPP4r.

I also grabbed the Common Lisp XMPP client cl-xmpp but experimenting with Ruby clients is probably easier. The OpenFire developers also supply a Java client library (Smack) that is on my list of things to try.

I think that XMPP may be a good "push" technology for distributed knowledge sharing systems (an interest of mine). XMPP has a lot going for it: a good security model, straight forward bi-directional communication between any two connected clients, and a publish/subscribe capability like the Java Message System (JMS). The Comet architecture (uses HTTP and JSON, instead of socket connections and XML) looks interesting but XMPP seems to have a head start and I don't think that I need to learn both technologies (yet).

Getting Things Done: a perspective from a work at home programmer

While I like to automate repetitive tasks (server deployments, builds, tests, etc.), I also enjoy "tuning up" my personal work habits, tweaking them to get things just right. Hopefully, you will find something useful here (and please add comments on how you "tune up" your own work flow):
  • I keep three lists of things to do: tasks for today (I include errands to run in the same list as work tasks), things to get done in the next week, and long term things that I would like to do, but might not ever get to. I work on a MacBook and use the "Stickies" Dashboard widget to keep these lists.
  • I schedule a break every 20 minutes using the "3-2-1" Dashboard widget. These breaks last a few minutes and give me an opportunity to walk around, get a glass of water or a coffee, step outside, etc.
  • Control interruptions: my wife does not work and is at home with me for most of my work day. Whenever my "3-2-1" 20 minute alarm goes off for a short break, my wife knows that she can talk with me without interrupting my work flow. Also, my parrot has become accustomed to the sound of my 20 minute alarm and gets excited when it goes off: he often gets his head scratched when I get up to walk around. I also like to avoid reading email more often than a couple of times an hour: unless I am on one of my short every 20 minute breaks, I prefer to not interupt my train of thought. I also have my wife screen my telephone calls for the same reason.
  • Use a pad of paper and a pen/pencil: for me, this is a great way to think, work on algorithms, etc. Computer science does not always have to involve using a computer :-) A pad of paper can save time, delaying coding until I have really thought about the best way to solve a problem.
  • Keep a detailed work log for each project or customer: it may seem counter intuitive, but I find that the 10 minutes a day that I spend maintaining detailed work logs makes me much more productive, long term. Having notes as text files, that you can quickly search, saves a lot of time the "second time" that you need to do something. I have work logs for a few projects that have been actively used for years, and no matter how large these work logs are, I can quickly find information about why decisions were made, how a particular server was set up, etc. -- saves a lot of time!
  • Start work early in the day: I know that this does not work for everyone, but my best work time is early in the morning. I believe that one of the best strategies for getting things done efficiently is taking sufficient breaks. However, as a consultant, I only get paid for the time I spend working with my pad of paper and laptop. By starting work early in the day, I can afford to take longer breaks during the day that keep me efficient. My favorite things to do for long breaks: take a long walk on the hiking trails behind my house, take my wife to a matinee movie, and have a picnic lunch down by Oak Creek (short drive from our house).
  • Don't drink wine every night: I find that I do some of my most creative work later in the evening, an hour or two after eating dinner. For me, working a few evenings a week, sometimes on fun educational projects instead of paid for work, gives me a different perspective. Adding a few evenings a week to my available work time obviously lets me clear more "to do" tasks and allows me to take longer breaks during the day to stay efficient.
  • I vary my work location: this goes against advice from other people to have one "work room", but I find that rotating between three locations inside my house and my deck, that I stay fresher mentally and this helps me relax while I work. When I used to work in an office, I found that I could increase my productivity by having "meetings" while walking around the block; this obviously works best for only 2 or 3 people. Breaking out of your normal working environment is stimulating and short "walking around" meetings can be very productive.
  • Appreciate the value of your work: I find that if I momentarily reflect on how my work helps people, then I can then stay focused on tasks that might not be intellectually interesting to me. In a similar way, I find that taking a few minutes, once or twice a day, to reflect on the blessings in my life, helps keep me in a grateful frame of mind, and be much happier and productive.
  • Tailor eating habits with your work schedule: small meals and healthy snacks are better than eating just a couple of very large meals each day. You will get tired after a very large meal, reducing productivity. Small and healthy meals and snacks keep your energy level up and keep you mentally alert. Avoid eating sugar; having a desert a few times a week is fine, but eating sweets every day is unhealthy, reducing efficiency. Like drinking wine, eating a good desert is more enjoyable if done only occasionally.
  • Try to eliminate things that cause you to worry. Worrying can be a real productivity killer. Two common things that people worry about are health and finances. We can not (yet) control our genetic makeup, but a healthy lifestyle yields a healthier life. Many financial problems can be eased by simply spending less than you earn and have a savings/investment plan. Time not spent worrying can be spent earning money or enjoying life with family and friends.
  • And, saving the most important for last: have a career that you love. It is much more important to work on things that you enjoy than to maximize the amount of money that you earn. You will get more things done if you (mostly) enjoy your work. If you enjoy your work then you may need to spend less money on material things to cheer you up and temporarily make you happy. Remember: there are two kinds of people in the world: consumers and investors.