Some Haskell hacks: SPARQL queries to DBPedia and using OpenCalais web service
For various personal (and a few consulting) projects I need to access DBPedia and other SPARQL endpoints. I use the hsparql Haskell library written by Jeff Wheeler and maintained by Rob Stewart. The following code snippet:
I find the OpenCalais web service for finding entities in text and categorizing text to be very useful. This code snippet uses the same hacks for processing the RDF returned by OpenCalais that I used in my last semantic web book:
NOTE: August 9, 2016: the following example no longer works because of API changes:
You need to have your free OpenCalais developer key in the environment variable OPEN_CALAIS_KEY. The key is free and allows you to make 50K API calls a day (throttled to four per second).
I have been trying to learn Haskell for about four years so if anyone has any useful critiques of these code examples, please speak up :-)
{-# LANGUAGE ScopedTypeVariables,OverloadedStrings #-} module Sparql2 where import Database.HSparql.Connection import Database.HSparql.QueryGenerator import Data.RDF hiding (triple) import Data.RDF.TriplesGraph simpleDescribe :: Query DescribeQuery simpleDescribe = do resource <- prefix "dbpedia" (iriRef "http://dbpedia.org/resource/") uri <- describeIRI (resource .:. "Sedona_Arizona") return DescribeQuery { queryDescribe = uri } doit = do (rdfGraph:: TriplesGraph) <- describeQuery "http://dbpedia.org/sparql" simpleDescribe --mapM_ print (triplesOf rdfGraph) --print "\n\n\n" --print rdfGraph mapM (\(Triple s p o) -> case [s,p,o] of [UNode(s), UNode(p), UNode(o)] -> return (s,p,o) [UNode(s), UNode(p), LNode(PlainLL o2 l)] -> return (s,p,o2) [UNode(s), UNode(p), LNode(TypedL o2 l)] -> return (s,p,o2) _ -> return ("no match","no match","no match")) (triplesOf rdfGraph) main = do results <- doit print $ results !! 0 mapM_ print results
I find the OpenCalais web service for finding entities in text and categorizing text to be very useful. This code snippet uses the same hacks for processing the RDF returned by OpenCalais that I used in my last semantic web book:
NOTE: August 9, 2016: the following example no longer works because of API changes:
module OpenCalais (calaisResults) where import Network.HTTP import Network.HTTP.Base (urlEncode) import qualified Data.Map as M import qualified Data.Set as S import Control.Monad.Trans.Class (lift) import Data.String.Utils (replace) import Data.List (lines, isInfixOf) import Data.List.Split (splitOn) import Data.Maybe (maybe) import System.Environment (getEnv) calaisKey = getEnv "OPEN_CALAIS_KEY" escape s = urlEncode s baseParams = "<c:params xmlns:c=\"http://s.opencalais.com/1/pred/\" xmlns:rdf=\"http://www.w3.org/1999/02/22-rdf-syntax-ns#\"><c:processingDirectives c:contentType=\"text/txt\" c:outputFormat=\"xml/rdf\"></c:processingDirectives><c:userDirectives c:allowDistribution=\"true\" c:allowSearch=\"true\" c:externalID=\"17cabs901\" c:submitter=\"ABC\"></c:userDirectives><c:externalMetadata></c:externalMetadata></c:params>" calaisResults s = do key <- calaisKey let baseUrl = "http://api.opencalais.com/enlighten/calais.asmx/Enlighten?licenseID=" ++ key ++ "&content=" ++ (escape s) ++ "¶msXML=" ++ (escape baseParams) ret <- simpleHTTP (getRequest baseUrl) >>= fmap (take 10000) . getResponseBody return $ map (\z -> splitOn ": " z) $ filter (\x -> isInfixOf ": " x && length x < 40) (lines (replace "\r" "" ret)) main = do r <- calaisResults "Berlin Germany visited by George W. Bush to see IBM plant. Bush met with President Clinton. Bush said “felt it important to step it up”" print r
You need to have your free OpenCalais developer key in the environment variable OPEN_CALAIS_KEY. The key is free and allows you to make 50K API calls a day (throttled to four per second).
I have been trying to learn Haskell for about four years so if anyone has any useful critiques of these code examples, please speak up :-)
Comments
Post a Comment