Mark Watson's artificial intelligence and Lisp hacking blog

Posts

Showing posts from July, 2008

New cuil.com search site and other alternative search engines

July 29, 2008

As someone who has spent a lot of my own time experimenting with Nutch, I have long desired to create my own "niche" search site that indexed only technology sites with clustered result categories. So, I am a little envious of ex-google employees and their family/friends who (reportedly) had $30 million of venture capital to start the cuil.com search site. Although a lot of the images don't seem to match search results, cuil.com looks pretty good - I especially like the "Explore by Category" tab that works similarly to another favorite search site clusty.com . "Explore by Category" is both cool and useful! It is interesting that new search engines can attract a lot of venture capital: with Google, Microsoft, and Yahoo all making very large investments, it must make investors nervous - but with the upside of large financial gains if any search startup gets a good fraction of the market.

Open data sources like Metaweb, Wikipedia, and SEC Edgar database

July 28, 2008

I just read a few month old blog by Toby Segaran (author of the very useful book Programming Collective Intelligence ) on link information for shared board of directors members between large corporations. Many years ago I did something similar from combined CIA Factbook and SEC Edgar data and I still have a SQL dump file on my Open Source web page. Since Toby works at Metaweb he fetched the corporate director link data from Metaweb (Freebase). Freebase sets a high standard for the ease of finding and extracting information. Other sources like Wikipedia (via custom web scraping or fetching their entire database) or the RDF extraction of Wikipedia ( DBpedia ) are not as simple to use, but still useful. I have a long history of organizing and cataloging information, starting in the 1980s at SAIC. Back in the pre-gopher days, I used to maintain lists (as plain text files) of where to find useful tools and information on FTP sites on the Internet and when someone would ask me where to fin...

Dynamic language 'goodness': comparing JRuby and Java Semantic Web example programs

July 24, 2008

Although there are several Semantic Web libraries or frameworks that I like to use, I had to choose just one for a DevX article that I am finishing up. I chose to use Sesame . After covering what I think are some "big wins" of using RDF/RDFs/OWL (for some applications) I present some example programs that I hope that readers have lots of fun with. The "wrapper" library that I wrote for Sesame works fine for both Java (which Sesame is written in) and JRuby. I must say that for experimenting with Sesame, JRuby is a lot nicer because the example programs are much shorter and with Ruby duck typing it is easier to write callback handlers, etc. for my wrapper library. Being able to work interactively in a JRuby jirb shell is also a big win for experimenting with code, different SPARQL queries, etc.

Programming for small devices

July 17, 2008

Several years ago I did a few projects for the "Java cell phone" (J2ME) platform, and had a lot of fun with that. After recently setting up NetBeans with the Java ME CDC tools and Eclipse with the most recent Android platform tools, late last night and early this morning I installed Apple's latest developer's tools that include the iPhone SDK and Dashcode. Since I very much like my Nokia N800, I am also interested in medium resolution devices (the N800 has a good 800x480 screen). My interest is in writing web portals that support both browsers and small devices. One option is just creating special CSS for different web browser screen sizes, and another option is rendering page view data as XML or JSON and letting rich clients provide the display and handling of forms, etc. (an option I used several years ago on a customer project). Ideally, I would like to be able to support a wide variety of small devices without a very large investment in my time getting (back) up t...

I am evaluating Google's Protocol Buffers for my knowledgebooks.com KB_bundle product

July 13, 2008

I am working on a new Java version of my knowledgebooks.com KB_bundle product (see home page for an overview) that implements an all in one toolbox for Natural Language Processing (NLP), entity extraction from text, text summarizing, text clustering, knowledge extraction to RDF/RDFS, support for document management (file management, index/search), and SPARQL querires of either embedded or external RDF data stores. KB_bundle will be free for non-commercial use and evaluation, and available for a fee for commercial use. While I designed KB_bundle as an embedded Java library, I have always planned for both RESTful and SOAP web service support. I have been looking at Google's Protocol Buffer documentation and examples this weekend and I think that I will also supply a third wrapper for Protocol Buffer RPC support. Earlier this year, a project that I was working on had performance problems due to the overhead of serializing data to XML and then parsing it in a REST based system. The p...

OpenDS 1.0 LDAPv3 server

July 12, 2008

OpenDS 1.0 LDAP server has just been released and was easy to install, configure, and run. One thing that I especially like is that it is set up by default to run nicely in a development environment (including test data to play with) with directions for reconfiguring for production use with replication. I used the JNLP setup file, hitting this link and accepted the standard install options (installed in my home directory in ~/OpenDS). There are test command line clients to test the installation and configuration; for example: markw$ bin/ldapsearch --hostname localhost --port 1389 --baseDN "dc=example,dc=com" --searchScope base "(objectClass=*)" dn: dc=example,dc=com objectClass: domain objectClass: top dc: example and then you can use JNDI APIs for Java client LDAP enabled applications. I think that Sun is going to offer good support for Glassfish + OpenDS (if they don't already). BTW, I have many years of good experiences developing on the Tomcat platform (a...

Search This Blog

Posts

New cuil.com search site and other alternative search engines

Open data sources like Metaweb, Wikipedia, and SEC Edgar database

Dynamic language 'goodness': comparing JRuby and Java Semantic Web example programs

Programming for small devices

I am evaluating Google's Protocol Buffers for my knowledgebooks.com KB_bundle product

OpenDS 1.0 LDAPv3 server

Clojure vs. Scala smackdown

Nice: OpenCyc version 4.0 has been released

Ruby Sinatra web apps with background work threads

Writing a simple SQL data source for the free LGPL version of SmartGWT

Small example app using Ember.js and Node.js

Using the Datomic free edition in a lein based project

And the best JVM replacement language for Java is: Java?

Comparing Clojure + Clojurescript with Scala + Scala.js

Happy New Year

History in the making: first Lee Sedol vs. AlphaGo match game