Building custom data stores

Creating a custom datastore may seem like a bad idea when such great tools like Postgres, MongoDB, CouchDB, etc. are available in their open source goodness as well as good commercial products such as Datomic, AllegroGraph, Stardog, etc. Still, frustration of not having just what I needed for a project (more on requirements later) convinced me to spend some time building my own datastore based on some available open source libraries.

Much of the motivation for my work developing kbsportal.com is to make possible the development of a larger turnkey information appliance. I have been using MongoDB for this, but even with an application specific wrapper MongoDB has been a little awkward for my requirements, which are:

I want a reasonably efficient document store that supports the usual CRUD operations on arbitrary Clojure maps (which can be nested to any depth). Clojure maps are basically what I use to contain and use data so I wanted a datastore that supports this, simply.
I want all text in documents (embedded at any depth in the document) to be searchable.
I need to be able to annotate data stored documents and sometimes relationships between documents.
My preferred notation for annotating data is RDF
I need to be able to efficiently perform SPARQL queries on the RDF annotations.
Coupling between documents and RDF: auto delete of any triples referencing a document ID, if the referenced document is deleted.

Initially I was going to write a wrapper library using two datastores as SaaS products: Cloudant (for CouchDB with Lucene indexing) and Dydra.com (for a RDF datastore, with extras). A small wrapper API would have made this all work but since a lot of what I am doing is in the experimenting phase I decided that I didn't want to use remote web services for coding experiments. Using these services, with a wrapper would be nice for production, but not for hacking.

Anyway, I have built a small project that uses HSQLDB (relational database) and Sesame (RDF :

EDIT: Patrick Logan asked about my use of HSQLDB; not specific to HSQLDB really, but here is the important code (hand edited to try to get it to fit on this web page) for adding documents that are nested maps, indexing them, and searching (note: I usually use Clucy/Lucene for search in Clojure code, but for what I am doing right now, this suffices):

(defn index-if-str [x id]
  (if (= (class x) java.lang.String)
    (sql/with-connection hsql-db
      (doseq [token (map (fn [s] (.toLowerCase s))
                     (clojure.string/split x #"[ ;.,]()"))]
        (if token
          (sql/insert-record "search" {:doc_id id :word token}))))))

(defn insert-doc [map]
  (let [id
        (:id (sql/with-connection hsql-db
               (sql/insert-record
                 "docs" {:json (json/write-str map)})))]
    (postwalk (fn [x] (index-if-str x id)) map)
    id))

;; (insert-doc {:foo "bar" :i 101 :name "sue jones"})

(defn search [s]
  (map
    first
    (let [indices
          (map
            :doc_id
            (let [tokens
                  (apply str (interpose ", "
                     (map (fn [s] (str "'" (.toLowerCase s) "'"))
                       (clojure.string/split s #"[ ;.,]()"))))]
              (sql/with-connection hsql-db
                 (sql/with-query-results results
                    [(str "select * from search where word in (" tokens ")")]
                    (into [] results)))))]
      (sort (fn [a b] (compare (second b) (second a))) (into [] (frequencies indices))))))

Search This Blog

Building custom data stores

Comments

Post a Comment

Popular posts from this blog

AI update: The new Deepseek-R1 reasoning language model, Bytedance's Trae IDE, and my new book

Wonderful book: "Land of Lisp" - Conrad Barski is a great author and communicator

I am moving back to the Google platform, less excited by what Apple is offering

Clojure vs. Scala smackdown

Nice: OpenCyc version 4.0 has been released

Ruby Sinatra web apps with background work threads

Writing a simple SQL data source for the free LGPL version of SmartGWT

Small example app using Ember.js and Node.js

Using the Datomic free edition in a lein based project

And the best JVM replacement language for Java is: Java?

Comparing Clojure + Clojurescript with Scala + Scala.js

Happy New Year

History in the making: first Lee Sedol vs. AlphaGo match game