Posts

Showing posts from July, 2009

Tools: experiment with many, master a few of them

I am admittedly a tinkerer: I really enjoy reading other people's code, experimenting with new languages and technologies. That said, I make the effort to really only master a few technologies (e.g., Ruby, Java, and Lisp for programming languages (my paid for work is split fairly evenly between these languages), and specialize in AI, cloud deployments, Rails and Java-based web apps). I may be transitioning to adopting two new core technologies. I have been using relational databases since 1986 and have a long term liking for PostgreSQL (less so, MySQL). The "non-SQL" meme has become popular with a lot of justification: for many applications you can get easier scalability and/or better performance on a single server using other types of data stores. Google's AppEngine datastore (built on their big table infrastructure) is clearly less convenient to develop with than a relational database but it may be well worth the extra effort to get scalability and very low hosting

If you have a Google Wave account, then try my Robot extension

I wrote a Robot extension that reads text on a wave (and any child blips added to it), and adds its own child blips with some NLP analysis (entity extraction and auto-tagging). No big deal, but fun to write. Give it a try: knowledge-books@appspot.com

Have fun with the AMI containing the examples from my latest Ruby book

I have prepared an Amazon Machine Image (AMI) with most of the examples in my Ruby book Scripting Intelligence: Web 3.0 Information, Gathering and Processing . Because I will be periodically updating the AMI, you should search for the latest version. This is simple to do: after you log in to the Amazon Web Services (AWS) Management Console, select “Start an AMI,” then choose the “Community AMIs” tab and enter markbookimage in the AMI ID search field. Choose the AMI with the largest index. I have Ruby, Rails, Sesame, Redland, AllegroGraph, D2R, Hadoop, Solr, PostgreSQL, Tomcat, Nutch, etc. pre-installed and configured. I use this AMI for new projects and for new experiments because it contains most of the tools and frameworks that I use. If you know how to use Amazon AWS, it is easy to clone your own copy with whatever additional software you need, hook up a persistent disk volume, etc. If you have not yet learned how to effectively use AWS, this might be a good time to do so. I like

Writing Wave robots that use blip titles and text

If you follow the Java Wave robot tutorial it is reasonably easy getting started. It took me a short while to get access to the titles and text of both new root blips (i.e., the start of a new Wave object) and child blips (i.e., new blips added to a root blip). Here is some code where I re-worked some of the example code (this is in the servlet that handles incoming JSON encoded messages from the Wave platform): public void processEvents(RobotMessageBundle events) { Wavelet wavelet = events.getWavelet(); if (events.wasSelfAdded()) { Blip blip = wavelet.appendBlip(); TextView textView = blip.getDocument(); textView.append("I'm alive and ready for testing"); } for (Event event : events.getBlipSubmittedEvents()) { // some of my tests: Blip blip = event.getBlip(); if (!blip.getBlipId().equals(wavelet.getRootBlipId())) { String text = blip.getDocument().getText(); makeDebugBlip(wavelet, "blip s

Wave may end up being the new Internet coolness

I continue having fun "kicking the tires." I do wish that I had a completely local Wave robot development environment, but I expect that will be forthcoming. The edit, compile, run cycle takes a while because I need to: Modify robot code Build and upload the code to Java AppEngine Create new test waves, invite the robot, etc. The development cycle for Gadgets is quicker if you can simply remotely edit a Gadget XML file on whatever server you use to publish it. I am having a bit of an AppEngine performance issue. I am used to being able to cache (reasonably) static data in memory (loaded from JAR files in WEB-INF/lib). With AppEngine your web app can run on any server and web app startup time should be very quick (and doing on-startup data loading into memory from JAR files is not quick). I am not so happy doing this, but I may keep frequently used static data in the data store. I don't think that using JCache + memcached is an option because if I look up a key and it is

Google Wave gadgets

The gadget tutorial was easy to follow. I am starting with the state-full counter example and experimenting with that. The makeRequest API can be used to call remote web services inside gadgets. Other APIs let you process events inside a wave (from user actions, new or changed content, etc.) Cool stuff. There are many gadget containers but I was never interested in writing them myself until I started experimenting with the Wave platform.

Cool: just wrote my first Google Wave "robot" JSON web service

It is a placeholder, for now, but it will eventually use my KBtextmaster code to perform natural language processing on new replies to a wave that has my robot added as a participant. By following these instructions it only took about 30 minutes to get this going (would have been 20 minutes, but I compiled the Java AppEngine web JSON web service with JDK 1.5 - a re-build with JDK 1.6, and everything worked as advertised). I have been working on the Common Lisp version of KBtextmaster in the last week, and the Java version badly needs a code cleanup also (both versions contain some of my code going back over ten years). I'll post the public URL for my robot in a week or so when I get a new version of KBtextmaster plugged in.

Book project, Google Wave, and a kayaking video

Except for some consulting work, my big project is a new book on using AllegroGraph for writing Semantic Web applications. Lots of work, but also a lot of fun. I received a Google Wave Sandbox invitation today. I am going to try to spend an hour or two a day with Wave to get up to speed. Fortunately, I am 100% up to speed using the Java AppEngine (initially, Wave Robots, etc. get hosted on AppEngine, either Java or Python versions) and I have some experience with GWT - so I should already be in good shape -- but I need to write some code :-) My wife took a short video of me kayaking yesterday .

Gambit-C Scheme has become my new C

I might be writing an article about this soon: Scheme is a high level language - great for all around development, and Gambit-C can (once an application is developed in a very productive Emacs + Slime + Gambit-C environment) be used to create small and very efficient native applications. BTW, if you use an OS X or Windows installer, also get the source distribution for the examples directory. In Unix tradition, I like to build a set of tools as command line applications, and Gambit-C is very nice for this.

Common Lisp RDFa parser and work on my new AllegroGraph book

I am working on a 'three purpose' task this morning: writing an RDFa parser in Common Lisp. I need this for my new book project (semantic web application programming with AllegroGraph), I need this for one of my own (possibly commercial) projects, and to release as an open source project. I am building this on top of Gary King's CL-HTML-Parser , so Gary did the heavy lifting, and I am just adding the bits that I need.

Measurement promotes success

Computer science involves effort measuring things: profiling code, tracking memory use, looking for inefficiencies in network connections, determining the number of database queries are required for rendering a typical web page in an application, etc. I have started also measuring something else: how I spend my time. I used to just track billable time and leave time spent learning new languages, new frameworks, writing experimental code, etc. as unmeasured time. I now use a time tracking application on my Mac Book to track 16 different categories (billable, and learning/research - I also track time on Reddit, Slashdot, etc.) The overhead for these measurements is probably about 2 or 3 minutes a day, plus a few minutes to look at time spent at the end of a day, end of a week, etc. For me, this is useful information.

Continuing to work on my AllegroGraph book

I started this book late last year, but set it aside to write my Apress Ruby book Scripting Intelligence: Web 3.0 Information, Gathering and Processing . I don't think that the market will be large for an AllegroGraph (AG) book, but after using AG on one customer project and experimenting (off and on) with it for several years, I decided that it was Semantic Web technology worth mastering. AG is a commercial product, but a free server version (supports Lisp, Ruby, Java, and Python clients) is available that is limited to 50 million RDF triples (a large limit, so many projects can simply use the free version). AG supports the Sesame (an open source Java RDF data store) REST style APIs so if you stick with SPARQL and only RDFS reasoning, you get portability to also use a BSD licensed alternative. That said, my reason for using AG is all of the proprietary extra goodies! In addition to a few Lisp, Python, Ruby, and Java client examples, I am going to incorporate a lot of useful Com

W3C killing off XHTML2 in favor of HTML5: bad for the Semantic Web?

As a practical matter, HTML5 looks good for writing human facing next generation web applications with multimedia support and more intuitive elements like <header>, <nav>, <section>, <footer>, etc. The problem that I have with the W3C's decision (assuming that I understand it correctly) is that at least in my opinion the value of the web goes way beyond supporting manual web browsing and enjoying digital media assets. I think that the web should evolve into a ubiquitous decision support system - this needs software agents that can help you no matter who's computer you may be using, what type of small device (phone, web pad) you may be using, etc. In this context, decision support means help in making dozens of decisions each day. User specific information filters, search agents, and personalized information repositories will require machine readable data with well defined semantics. One approach is to have content management systems like Drupal and Plo

PragPub - free monthly magazine for developers

While I love writing, I also like to read other people's efforts. I find that I learn a lot reading code that other people write. I started seriously reading other people's code in the 1970s - a habit I never outgrew. When I read what other people write, in addition to the content, I also pay attention to their writing technique: how they introduce a topic, make points, provide examples, the level of detail they use, etc. Check out the new PragMag - good reading.