Sunday, January 24, 2010

A followup on using Windows 7 for Ruby and Java development

I wrote 2 weeks ago about adapting to using my new Windows 7 laptop (inexpensive, well constructed, and 4 cores). This is a followup with some miscellaneous advice.

I was having some problems writing with Latex so I re-installed MikTex doing a complete installation (well over 1 gig). My Latex problems are solved with some brute force. Still, when I know that I will be mostly writing, I still boot my MacBook to use TexShop, etc.

With a much faster laptop (4 cores!), I am very much enjoying using the new IntelliJ 9 for Java, Scala, and Clojure development. I am currently writing code that wraps the Sesame RDF data store, my own geolocation code, and Lucene. (I am writing a book on the AllegroGraph product and I want all of the examples to also run using my wrapper for Sesame, my geolocation stuff, and Lucene.) I am also writing idiomatic wrappers (to my wrapper) in Clojure and Scala and it is great to have everything in one large IntelliJ project.

I continue to rely on E TextEditor (a Windows TextMate clone) for most Ruby development and miscellaneous text editing - highly recommended.

I find that once I get things set up on my Windows 7 laptop, then I have an enjoyable developers experience. However, setting it up has been a terrific pain in some cases, for example: installing PostgreSQL and PostGIS was a real nuisance.

I found the text in the Command Prompt windows where I run bash shells to be a little difficult to read so I increased the default window size, switched to a larger font size, switched the default font to Lucinda Console, and reduced the contrast by making the background a very light blue (almost white) and the text a medium blue.

I also find it useful to remap the CAPS LOC key to act as a third control key - this reduces strain on my left hand when hitting control characters on the left side of the keyboard while using Emacs and other software that makes good use of the control key. This saves wear and tear on the tendons because I don't have to twist my left hand or to use the control key on the right side of the keyboard.

On dual booting Ubuntu: as I mentioned a few weeks ago, I have both a bootable Ubuntu installation and a separate one that runs inside of Windows using VirtuaBox. I am just about ready to figure out how to remove grub and reclaim the separate partition. It is very convenient using VirtualBox so I think I am going to just use it when Linux is best for a work task.

One big advantage of using Windows is the utilities TortoiseSvn and TortoiseGit. Highly recommended to integrate svn and git support into the file explorer.

One huge disadvantage of Windows is that most new computers do not come with Windows install disks. This is awful - shameful behavior, really, on both Microsoft's and computer manufacturers' parts. This is simply lack of trust in their customers. I have burned a few "bootable repair/recover" disks and I am using the built in system backup software (backs up to an external disk), but I won't know if this works until I need it.

The bottom line is: when I am working in a bash shell, SSH'ing to remote servers, or using a heavy weight IDE like IntelliJ then I find Windows 7 to be equally pleasurable for development use as OS X or Linux. For some things, I still like my MacBook. I paid $800 for my new Toshiba laptop and a comparable MacBook Pro for my work flow would have been about $2000. I am not yet sure if the price savings has been worth it because of the time required to set up my new laptop. One advantage though is the flexibility of also having a Windows box handy for testing, etc.

Thursday, January 21, 2010

The beauty of Latex: my AllegroGraph book becomes two books, one for JVM languages and one for Lisp

I have been working on and off for 16 months on a book about Semantic Web (or Linked Data) application programming using the AllegroGraph product. I have decided to substantially increase the scope of this applications/tutorial style book to also include support for Sesame. The figure on the left shows the software architecture road map for the book using JVM languages.

I am splitting the book into two volumes, and using Latex makes this really easy to share small amounts of common material so both books stand on their own. Latex also makes it easy to combine both books into one all-inclusive book, eliminating the duplicated parts. The two volumes are:
  • Volume I: will cover the use of both AllegroGraph and Sesame using JVM languages: Java, Scala, JRuby, and Clojure. I am working on a common wrapper written in Java that supplies my own (rather simple) API to both AllegroGraph and Sesame. My wrapper implements Sesame support for geolocation and free text indexing and search so the wrapper is adequate to run all of the book examples using either AllegroGraph or Sesame "back ends."
  • Volume II: will cover only AllegroGraph using both the embedded and client Lisp APIs.
Both AllegroGraph and Sesame are great development tools, but fill different needs. On projects that can support a several thousand dollar a year per server license fee, I would choose Common Lisp + AllegroGraph for development. AllegroGraph is very scalable and the Lisp APIs are really nice to work with. For Java (or other JVM languages) applications, I would still choose AllegroGraph for the scalability and support if a project can support the license costs. The good thing is that for most small to medium size projects, the free version of AllegroGraph or the open source Sesame project both are good choices, so as a developer you have some real flexibility. There are also other good RDF data store platforms like Jena, Joseki, Kowari, Redland, 4store, Swi-Prolog Semantic Web library, Talis, Virtuoso, etc. but I have relatively little (or in some cases no) experience with these. I use AllegroGraph and Sesame so that is what I write about.

Wednesday, January 13, 2010

Looking towards a universal wrapper/proxy for knowlege and data stores

I often start out by writing code specific for a single project and then refactor it to make working code more generally useful. I am working on an applications book for the AllegroGraph RDF data store services. Since most people (probably) use Java clients with AllegroGraph, the first step is to wrap Franz's APIs with my own interfaces so for my own work (that is, beyond the scope of writing this book), I can write implementations for other back end RDF data stores as I need them.

I also plan on writing "thin" Scala, JRuby, and Clojure friendly interfaces to my Java library. For the purposes of the book, I'll use this library to support example client applications written in Java, JRuby, Scala, and Clojure. I have a lot of material already written with Lisp examples but I think that I am going to set that all aside for a future writing project (that I may, quite honestly, never get back to). I also decided to not support Python in my book: Franz has a good Python interface library and examples and in any case, I am not a Python developer.

So far, I have a fairly clear road map of what I need for this specific book project. Long term, after this book is done, I am aiming to also wrap other knowledge sources like OpenCyc, Freebase, etc. While it is tempting to view most knowledge sources as graph data (RDF), it seems like a poor idea to give up the inferencing available in OpenCyc, all the features of the Freebase MQL query language, etc.

Since I often find myself reusing my own small code examples to access multiple knowledge sources, it may be time soon to step back and decide what can be placed behind common interfaces.

Sunday, January 10, 2010

Using Windows 7 for Ruby and Java development

As I mentioned in my last blog, I surprised my friends and family by buying a Windows 7 laptop. The combination of Windows 7 and Ubuntu is not quite as good as OS X and Ubuntu dual boot, but try buying a Mac laptop with 4 CPU cores.

Of course, the first thing to do to a Windows 7 system is to install cygwin. I installed just about everything available. To avoid confusion, I always run bash in command windows and I set up my .bashrc file for cygwin to mimic my .profile file for OS X and my .bashrc file for Ubuntu. With cygwin installed, life is good.

For basic Ruby and Java development, Rubymine and IntelliJ work identically under OS X, Windows 7, and Ubuntu Linux. I needed a plain text editor: I use TextMate on OS X and GEdit on Ubuntu. I tried, then bought a copy of E TextEditor that works with TextMate plugins. Recommended! I also installed XEmacs.

For writing I installed and Latex (I used the MikTex distribution). Both E TextEditor and XEmacs are fine for editing Latex "source code."

This was some trouble to set up, but my Toshiba laptop is very well constructed, has 4 CPU cores, 4GB of memory, and 500GB of disk. And, it was very inexpensive.

I have not totally made up my mind how I will use Ubuntu on my new laptop. Windows 7 has a nice utility to split the c: disk partition so I did this when I first started the laptop. I have Ubuntu installed twice: on a new partition and also on the Windows 7 file system using Sun's very good VirtualBox. Ubuntu is a little more responsive when I use grub to boot it directly, but it is very convenient running it under VirtualBox because of file system sharing, not having to install everything - just what Linux is best for. Since I use a remote git repository for virtually everything that I work on, it is easy dealing with two Ubuntu installations - I think that I will continue to use both for a few months, then decide which to keep and which to delete. Also: VirtualBox supports all 4 cores with 64 bit Ubuntu running as a guest operating system so the performance hit for using VirtualBox is small.

Tuesday, January 05, 2010

New laptop: Toshiba Satellite U505

I very much like the MacBook that I bought almost three years ago, but it has it limitations (mainly not enough disk space).

I decided to buy a "Windows" laptop because for $800 I could get a laptop with 4 CPU cores (Intel CORE i3), 1/2 terabyte disk, and 4 gigs RAM. Ubuntu Linux uses all the cores :-)

Saturday, January 02, 2010

Running OpenCyc 2.0 on OS X

The latest release of OpenCyc uses a Java runtime so it is portable. I often keep OpenCyc running on one of my servers, but for convenience I also wanted to be able to run it on my MacBook. My MacBook only has 2GB of RAM but this seems to be adequate, especially because I don't run the JVM in server mode on my laptop. Start by downloading the Linux OpenCyc 2.0 distribution.

Make a copy of the file opencyc-2.0/server/cyc/run/bin/ and replace the original file contents with:


java -Xms$MIN_HEAP -Xmx$MAX_HEAP -XX:MaxPermSize=$PERM_SIZE -cp lib/cyc.jar:lib/subl.jar:lib/juni\
t.jar:resource:lib/ext:plugins com.cyc.tool.subl.jrtl.nativeCode.subLisp.SubLMain -f "(progn (load\
\"init/jrtl-release-init.lisp\")))" "$@"
With these settings, OpenCyc 2.0 starts up quickly and has fairly good runtime performance on my MacBook: fine for experimenting with OpenCyc. Run the system using the top level run script (which you do not need to modify):
cd opencyc-2.0
OpenCyc 2.0 consists of a AI reasoning runtime, a "real world" ontology, hundreds of thousands of terms and millions of relationships.