Ivan’s private site

November 2, 2010

My first mapping from RDB to RDF using R2RML

The W3C RDB2RDF Working Group has just published a first public Working Draft for the standardized RDB->RDF mapping language called R2RML. I decided that the only way to understand a specification like that is to try to use it for an example. Caveat: this is a “First Public Working Draft” for R2RML, so many things still have to happen and there will be changes.

For several years now I use a simple example in my generic Semantic Web tutorial (see, e.g., the one at SemTech). It is an artificial example referring to an imaginary bookshop’s table:

which is then converted into an RDF Graph:

(And the tutorial story is how this graph can be merged with a graph coming from another bookshop’s data.) Up until now I always glossed over how this mapping is done. Well, so how could that be done with R2RML?

R2RML defines mappings that describe how an RDB table is mapped on triples. (R2RML is in itself in RDF, b.t.w.) Simply put, in R2RML, each row of a table is mapped to an RDF subject; the individual cells, with the column names, provide the object and the predicates, respectively.

If we look at the middle table in the example, it corresponds to the lower right hand part of the graph. The R2RML mapping has to specify that the homepage column should actually produce an RDF Resource as a literal and not a string. Furthermore, the first column should become a blank node; that has to be specified, too. Here is the way this is all specified:

:Table2 rdf:type rr:TriplesMap ;
    rr:logicalTable "Select  ("_:" || ID) AS pid, Name, ("<" || Homepage || ">) AS Home from person_table";
    rr:subjectMap [ a rr:BlankNodeMap ; rr:column "pid" ; ] ;
    rr:propertyObjectMap [ rr:property a:name; rr:column "Name" ] ;
    rr:propertyObjectMap [ a rr:IRIMap ; rr:property a:homepage; rr:column "Home" ] .

What happens here is:

  1. a mapping is defined that turns the original table into a virtual, “logical” table using SQL. The goal here is to generate a blank node ID on the fly, and a URI in NTriple syntax (note, however, that I am not sure it is o.k. to use that approach in the spec!);
  2. the subject for the triples is chosen to be a cell in a specific column (“pid”, generated by the SQL transform of the previous point), and it is also specified that this is a blank node;
  3. the other two properties are specified (for the same subject); the one for the home page also specifies that the object must be a URI resource (as opposed to a Literal).

That is it. Mapping of the bottom table to the lower left hand corner of the graph is also quite similar, I will not go into this here.

But we still need the “root”, so to say, i.e., the node in the upper right hand corner, the top portion of the graph (with the title and the year) and, mainly, we also have to relate the root to the portion of the graph that is generated from the middle table.

First, the following R2RML part does the job of generating the top part of the graph:

:Table1 rdf:type rr:TriplesMap ;
    rr:logicalTable "Select ('<http:..isbn/' || ISBN || '>') AS isbn, 
                     Author, Title, Publisher, Year from book_table";
    rr:subjectMap [ rdf:type rr:IRIMap ; rr:column "isbn" ] ;
    rr:propertyObjectMap [ rr:property a:title ; rr:column "Title" ; ] ;
    rr:propertyObjectMap [ rr:property a:year ; rr:column "Year" ; ] ;

The only role of the mapping to a logical table is to generate a URI from the ISBN; all the other cells are, conceptually, simply copied on the logical table. The rest is fairly straightforward.

The missing trick is to combine, i.e., to “join”, the two tables on the graph. R2RML has a separate construction for that, referred to as “mapping” the foreign keys. The following additional statements should be added to :Table1:

    rr:foreignKeyMap [ 
       rr:key a:author ; 
       rr:parentTriplesMap :Table2 ; rr:joinCondition "{child}.Author = {parent}.pid"
    ] .

Which combines the nodes defined by :Table1 with those of :Table2. And voilà! We’re done: the R2RML document is ready, i.e., an R2RML engine would generate my example table into my example graph.

Of course, there are more complicated possibilities. Triples, or whole rows, can be explicitly stored in a specific named graph, for example. Or a column defining a predicate could, actually, use a cell in another column as an object. Etc. And, to be honest, I am not even 100% sure that above is correct, I may have misunderstood some details. But the “melody” is still clear.

Note the role the SQL based mapping of the original table to the logical table has. For SQL experts, most of the work can be done there, i.e., the resulting RDF graph can be ready for further usage by an application, to be linked into the LOD, to be used with the right attributes, namespaces, etc. Which is very powerful indeed, provided… the user has the necessary SQL expertise. And, while that is obviously true for database managers, it is not necessarily true for RDF experts. For those, a slightly different model seems to be more appropriate: they would prefer to get an RDF graph ASAP, so to say, without any fancy transformation, and would then use RIF, SWSRL, SPARQL’s CONSTRUCT, etc., to turn it into the RDF graph they eventually want to have. In other words, they may not need the concept of a logical table. That is what is referred to by the group as the “default” mapping. I.e., what graph does one get if nothing is specified? If that is properly defined then, say, RIF experts can use their expertise instead of SQL. This default mapping is not yet fully specified by the group, but it is on its way; it will be published shortly, and will complete the R2RML picture. So watch that space…

About these ads

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,616 other followers

%d bloggers like this: