Ivan’s private site

November 19, 2010

My first mapping from RDB to RDF using a direct mapping

A few weeks ago I wrote a blog on my first RDB to RDF mapping using R2RML; the W3C RDB2RDF Working Group had just published a first public Working Draft for R2RML. That mapping was based on a specific mapping language (i.e., R2RML). R2RML relies on an R2RML processing done by, for example, the database system, interpreting the language, using some SQL constructions, etc. The R2RML processing depends on the specific schema of the database which guides the mapping.

As I already mentioned in that blog, a “direct” mapping was also in preparation by the Working Group; well, the first public Working Draft of that mapping has just been published. That mapping does not depend on the schema of the database: it defines a general mapping of any relational database structure into RDF; only a base URI has to be specified for the database, everything else is generated automatically. The resulting RDF graph is of course much more coarse than the one generated by R2RML; whereas the result of an R2RML mapping may be a graph using well specified vocabularies, for example, this is not the case for the output of the direct mapping. But that is not really a problem: after all, we have SPARQL or RIF to make transformation on graphs! Ie, the two approaches are really complementary.

What I will do in this blog is to show how the very same example as in my previous blog can be handled by a direct mapping. As a reminder: the toy example I use comes from my  generic Semantic Web tutorial. Here is the (toy) table:

which is then converted into an RDF Graph:

(Just as in the previous case I will ignore the part of the graph dealing with the publisher, which has the same structure as the author part. I will also ignore the prefix definitions.)

The direct mapping of the first and second tables is pretty straightforward. The URI-s are a bit ugly but, well, this is what you get when you use a generic solution. So here it is:

@base <http://book.example/> .
<Book/ID=0006511409X#_> a <Book> ;
  <Book#ISBN> "0006511409X" ;
  <Book#Title> "The Glass Palace" ;
  <Book#Year>  "2000" ;
  <Book#Author> <Author/ID=id_xyz#_> .

<Author/ID=id_xyz#_> a <Author> ;
  <Author#ID> "id_xyz" ;
  <Author#Name> "Ghosh, Amitav" ;
  <Author#Homepage> "http://www.amitavghosh.com" .

Simple, isn’t it?

The result is fairly close to what we want, but not exactly. First of all, we want to use different vocabulary terms (like a:name). Also, note that the direct mapping produces literal objects most of the time, except when there is a “jump” from one table to another. Finally, the resulting graph should use a blank node for the author, which is not the case in the generated graph.

Fortunately, we have tools in the Semantic Web domain to transform RDF graphs. RIF is one possible solution; another is SPARQL, using the CONSTRUCT form. Using SPARQL is an attractive solution because, in practice, the output of the direct mapping may not even be materialized; instead, one would expect a SPARQL engine attached to a particular relational database, mapping the SPARQL queries to the table on the fly. I will use SPARQL 1.1 below because that gives nice facilities to generate RDF URI Resources from strings, i.e., to have “bridges” from literals to URI-s. Here is a possible SPARQL 1.1 query/construct that could be used to achieve what we want:

CONSTRUCT {
  ?id a:title ?title ;
    a:year  ?year ;
    a:author _:x .
  _:x a:name ?name ;
    a:homepage ?hp .
}
WHERE {
  SELECT (IRI(fn:concat("http://...",?isbn)) AS ?id)
          ?title ?year ?name
         (IRI(?homepage) AS ?hp)
  {
    ?book a <Book> ;
      <Book#ISBN> ?isbn ;
      <Book#Title> ?title ;
      <Book#Year>  ?year ;
      <Book#Author> ?author .
    ?author a <Author> ;
      <Author#Name> ?name ;
      <Author#Homepage ?homepage .
  }
}

Note the usage of a nested query; this is used to create new variables representing the URI references to be used by the outer query. The key is the IRI operator. (Both the nesting and the AS in the SELECT are SPARQL 1.1 features.)

That is it. Of course, the question does arise: which one would one use? The direct mapping or R2RML? Apart from the possible restriction that the local database system may implement the direct mapping only, it becomes also a question of taste. The heavy tool in R2RML is, in fact, the embedded SQL query; if one is comfortable with SQL than that is fine. But if the user is more comfortable with Semantic Web tools (e.g., SPARQL or RIF) then the direct mapping might be handier.

(Note that these are evolving documents still. I already know that my previous blog is wrong in the sense that it is not in line with the next version of R2RML. Oh well…)

About these ads

7 Comments

  1. > That mapping does not depend on the schema of the database: it defines a general mapping of any relational database structure into RDF

    It actually depends strongly on the schema, as the generated URIs reflects the Tables names and the Column names.

    More generally, the Direct Mapping is a function from *a* relational database to RDF, so it is defined for *any* relational database.

    Comment by Alexandre Bertails — November 19, 2010 @ 18:57

    • That is true. What I meant was that the “processing” depends on the schema but I, as a user, do not have to do anything for that. In the case of R2RML I have to, manually, create a mapping based on the details of the schema, whereas in the direct mapping this is done for you.
      I have the impression that we are in a violent agreement:-)

      Comment by Ivan Herman — November 19, 2010 @ 19:46

      • > I have the impression that we are in a violent agreement:-)

        Don’t worry, I knew that before commenting ;-) The whole post proves you understand it *very* well!

        Comment by Alexandre Bertails — November 19, 2010 @ 20:39

  2. With or without a schema, automated conversions of real databases tend to miss a lot because of semantic information that does not appear in the schema. In your example, how do you know that id_xyz in the first table indicates the same instance as in the second table? Without a foreign key, you can’t know that automatically. You can’t even be sure that you can get that information from the column names, because it’s common for column names for the same thing to differ across tables.

    Sometimes values are coded, thus adding semantic information to one who knows the code. Numeric primary keys may be unique to a specific table, not not be unique across all tables. Denormalized databases may contain contradictory information if the update procedures are buggy or not fully synchronized.

    So yes, direct or otherwise automated translation can be valuable, but for many (if not most) databases, it’s far from enough to get most of the value out of the RDB.

    Comment by Tom Passin — November 19, 2010 @ 22:00

    • I was not precise in my blog. The ‘link’ you refer to, ie, that the id_xyz refers to the same instance as in the second, ie, that object of the triple is a URI is automatically done for foreign keys and only for those. Ie, I should have indicated that those entries in my toy table are supposed to be foreign keys indeed. If they are not foreign keys then it is up to the SPARQL level to make the right connections.

      Comment by Ivan Herman — November 20, 2010 @ 13:37

  3. [...] My first mapping from RDB to RDF using a direct mapping [...]

    Pingback by Scott Banwart's Blog » Blog Archive » Distributed Weekly 78 — November 26, 2010 @ 15:31

  4. [...] My first mapping from RDB to RDF using a direct mapping [...]

    Pingback by Länksprutning – 28 November 2010 – Månhus — November 28, 2010 @ 2:05


RSS feed for comments on this post.

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,618 other followers

%d bloggers like this: