Ivan’s private site

July 29, 2007

From Wikipedia URI-s to DBpedia URI…

Filed under: Semantic Web, Work Related — Ivan Herman @ 10:49

I hit a small issue the other day, which was clarified by the DBpedia folks. I guess others may have, or will hit a similar issue; it may be worth therefore to document it.

The question I had: what URI should I use for J.S. Bach’s Well-Tempered Clavier? This is a non-trivial question if one wants to use, say, the Music Ontology in cataloging one’s music: indeed there is nothing like, say, ISBN-s in classical music. I am not sure who suggested in the past that using the corresponding Wikipedia URI might be the best option we have. Well, now that we have DBpedia, I thought that the corresponding DBpedia URI (the non-informational resource one, that is) should be a much better choice. But what is the URI?

I typed in “Well tempered clavier” in my Wikipedia search box, and this led directly to

http://en.wikipedia.org/wiki/Well_Tempered_Clavier

which displayed the corresponding article. Fine, so I thought by using a simple replacement, yielding

http://dbpedia.org/resource/Well_Tempered_Clavier

would give me the URI. Wrong, 404…

The issue (and that is where I was wrong) is that Wikipedia uses a funny, non-HTTP based redirection. It displays the right content for the search result, it keeps a kind of phony URI in the browser’s address bar, it just puts a small note into the article saying “(Redirected from XYZ)”. DBpedia does noe keep track of all possible search possibilities, so a URI minted from the redirected page (“redirected” in the Wikipedia sense) is not the right answer.

What to do? Here is what Richard Cyganiak proposed as a general approach:

  1. Get your search term, have your article displayed
  2. If the wikipedia in use is not the English one, click on the English link in the left sidebar (a Richard put it: if there is no English link, you are out of luck… :-(
  3. If the page is redirected (look for the note like the one I referred to), hit the “Article” button on the top of the page. This will redisplay the same article content, but with the “canonical” URI in the browser’s address bar.
  4. Take that URI, and replace the leading http://en.wikipedia.org/wiki by http://dbpedia.org/resource

Voilà! You then get to

http://dbpedia.org/resource/Well-Tempered_Clavier

which can be used as a non-informational resource for this piece of music (note the difference in a hyphen in the URI!).

Thanks to Richard, Sören, and Georgi for enlightening me…

July 24, 2007

OpenDocument Format and RDF

Filed under: Semantic Web, Work Related — Ivan Herman @ 14:04

I must admit this news slipped my attention. I also have not seen it on, say, the blogs aggregated on PlanetRDF (which does not necessarily means it was not there, just that I was careless!). Anyway, it is worth repeating it: the OASIS OpenDocument Technical Committee approved its enhanced metadata support format [pdf] for inclusion in ODF 1.2. What this means for the Semantic Web community is that OpenDocument will rely on RDF for its metadata support. Meaning that, for example, information on documents in ODF 1.2 can be easily integrated with a bunch of other data. Yay!

Luckily, I read Bruce D’Argus’ blog this morning. Worth reading it, it contains much more information…

July 22, 2007

Yet another RDFa converter

Filed under: Code, Semantic Web, Work Related — Ivan Herman @ 9:31
Tags: ,

I realized a week ago that Dave Beckett’s triplr tool (“Stuff in, triples out”) also includes an RDFa converter now, see his news item of 2007-07-17. Ie, I can now use the URI http://triplr.org/rdfa-rdf/http://rdfa.info/ to extract or refer to the RDF content from the RDFa info page’s RDFa statements. Of course (after all, this is Dave’s tool!) I could also put “turtle” in the URI instead of “rdf” to yield, well, turtle.

The converter, of course, is still based on the latest public release of the RDFa syntax, and many things will change as a result of the current work in the RDFa group (which has become real active in the last few months, so I think new and significantly better release of the spec will come soon!). But I am sure an update of triplr will follow that soon afterwards…

July 17, 2007

Over 1 billion interlinked triples…

Filed under: Semantic Web, Work Related — Ivan Herman @ 11:30

Chris Bizer has just published, on the home page of the “Linking Open Data” project, a new image showing the various interlinked databases:

Interlinked data

Over one billion RDF triples, which are interlinked by 180,000 RDF links. Really impressive!

July 15, 2007

PURL to be renewed

Filed under: Semantic Web, Work Related — Ivan Herman @ 15:55

Unless you are a reporter, you rarely read press releases… however, the latest press release of OCLC is worth noting for the Semantic Web community: “OCLC to work with Zepheira to redesign OCLC’s PURL service”.

Two aspect of the announcement really caught my eyes:

  1. The way I understand it, the service will provide an implementation of what became known as HttpRange-14, ie, how to define URI-s for informational and non-informational resources. And this is really great: indeed, the theory of HttpRange-14 is one thing, its practical deployment is another. Unless one has access to the controls of his/her server (eg,to an .htaccess file for an Apache server), it is not that easy to adopt it in practice. With the renewed PURL service this should become a breeze…
  2. The code of PURL will be released as open source. Ie, other services can be set up using the same software and providing similar services. I could see a number of, say, specialized communities making use of that feature in future. I think this will play a very important role. (For example, look at the way UniProt defines its URI-s these days: the URI scheme used in the announcement, ie, http://purl.uniprot.org/{db}/{id}, suggesting that this community can very well make use of a renewed PURL software if they wish)

By the way, if you wonder who “Zepheira” is, look at their team page. Some familiar faces and names there…

July 6, 2007

SPARQL Endpoint interface to Python

Filed under: Code, Python, Semantic Web, Work Related — Ivan Herman @ 12:43
Tags: ,

I played with SPARQL on my local machine, and I also got inspired by Lee’s SPARQL library for Javascript. But, well, I prefer Python… So I made a set of utility classes first for myself, but then I decided to package it more properly. Maybe others can find it useful, too.

The goal is to give some help in turning a SPARQL query into the corresponding HTTP GET Protocol, send it to a SPARQL endpoint somewhere on the Web, and do something with the results. The simplest usage is something like:

from SPARQL import SPARQLWrapper
queryString = "SELECT * WHERE { ?s ?p ?o. }"
sparql = SPARQLWrapper("http://localhost:2020/sparql")
# add a default graph, though that can also be done in the query string
sparql.addDefaultGraph("http://www.example.com/data.rdf")
sparql.setQuery(queryString)
try :
    ret = sparql.query() # ret is a stream with the results in XML, it is a file like object
except:
    deal_with_the_exception() # eg, syntax error

To make it even easier to use, conversions to more Python-friendly formats can also done on the results: eg, turn it into a proper DOM tree if the result is XML, use Bob Ippolito’s simplejson module to convert a return format in JSON into Python dictionary, or parse it with RDFLib and return an RDFLib Graph in case the return is in RDF/XML. Ie, one could have done:

try :
    sparql.setReturnFormat(SPARQL.JSON)
    ret = sparql.query()
    dict = ret.convert()
except:
    deal_with_the_exception()

where “dict” is a Python dictionary. There are some more tricks in the library, but that essentially it…

The code is available from my site; the API documentation is included in the distribution (and is also available online).

It is an early release. There are some problems, and I expect some more. I have primarily tested it with two different SPARQL endpoints running on my local machine (joseki3 and virtuoso) and also with some public SPARQL endpoints. There are some differences on the return media type for, eg, JSON or N3, the non-standard arguments (eg, setting the return format) still diverge a bit, etc. But I would expect these to converge over time. However, I am sure that my code will have problems with some of the endpoints at least on those grounds (or others)…

Blog at WordPress.com.