Ivan’s private site

November 23, 2010

My first mapping from RDB to RDF using a direct mapping, cont.

A few days ago I posted a blog on how the RDB to RDF direct mapping could be used for a simple example. I do not want to repeat the whole blog: the essence of it was that database tables were mapped onto a simple RDF Graph (this is what the direct mapping does) and the resulting graph was transformed into the “target” graph using the following SPARQL 1.1 construct:

CONSTRUCT {
  ?id a:title ?title ;
    a:year  ?year ;
    a:author _:x .
  _:x a:name ?name ;
    a:homepage ?hp .
}
WHERE {
  SELECT (IRI(fn:concat("http://...",?isbn)) AS ?id)
          ?title ?year ?name
         (IRI(?homepage) AS ?hp)
  {
    ?book a  <Book>;
       ?isbn ;
       ?title ;
        ?year ;
       ?author .
    ?author a  <Author>;
       ?name ;
       ?homepage .
  }
}

where the trick was to use a nested SELECT whose main job was to create URI references from strings. I realized that if one uses the latest editors’ version of SPARQL 1.1 (i.e., that version that is much closer to what SPARQL 1.1 will be) then the solution is actually simpler due to the variable assigning possibility that makes the nested SELECT unnecessary:

CONSTRUCT {
  ?id a:title ?title ;
    a:year  ?year ;
    a:author _:x .
  _:x a:name ?name ;
    a:homepage ?hp .
}
WHERE {
  ?book a  <Book>;
     ?isbn ;
     ?title ;
      ?year ;
     ?author .
  ?author a  <Author>;
     ?name ;
     ?homepage .
  BIND (IRI(fn:concat("http://...",?isbn)) AS ?id)
  BIND (IRI(?homepage) AS ?hp)
}

which makes, at least in my view, the mapping even clearer.

But SPARQL is not the only way to transform the graph. Another possibility is to use RIF Core. Essentially the same transformation can indeed be expressed using the RIF Presentation syntax. Here it is (with a little help from Sandro Hawke and Harold Boley):

Forall ?book ?title ?author ?isbn ?year ?id (
  ?id[a:year->?year a:title->?title a:author->?author] :-
    And(
      ?book[rdf:type-> <Book>
             a:isbn->?isbn
             a:title->?title
             a:year->?year
             a:author->?author]
      External(pred:iri-string(?id External( func:concat("http://..." ?isbn ) )))
    )
)
Forall ?author ?name ?hp ?homepage (
 ?author[a:name->?name a:homepage->?hp] :-
   And(
        ?author[rdf:type-> <Author>
                a:name->?name
                a:homepage->?homepage]
        External(pred:iri-string(?hp ?homepage))
  )
)

(as I did in the earlier examples, I did not put the prefix declaration and other syntactic stuffs into the code above.)

The only difference between the two is that I retained the URI for the author, because generating a blank node on the fly in RIF Core does not seem to be possible. A better solution would be, probably, to mint a URI from the ?author variable just like I did for the ISBN value. Other than that, the two solutions are pretty much identical…

November 19, 2010

My first mapping from RDB to RDF using a direct mapping

A few weeks ago I wrote a blog on my first RDB to RDF mapping using R2RML; the W3C RDB2RDF Working Group had just published a first public Working Draft for R2RML. That mapping was based on a specific mapping language (i.e., R2RML). R2RML relies on an R2RML processing done by, for example, the database system, interpreting the language, using some SQL constructions, etc. The R2RML processing depends on the specific schema of the database which guides the mapping.

As I already mentioned in that blog, a “direct” mapping was also in preparation by the Working Group; well, the first public Working Draft of that mapping has just been published. That mapping does not depend on the schema of the database: it defines a general mapping of any relational database structure into RDF; only a base URI has to be specified for the database, everything else is generated automatically. The resulting RDF graph is of course much more coarse than the one generated by R2RML; whereas the result of an R2RML mapping may be a graph using well specified vocabularies, for example, this is not the case for the output of the direct mapping. But that is not really a problem: after all, we have SPARQL or RIF to make transformation on graphs! Ie, the two approaches are really complementary.

What I will do in this blog is to show how the very same example as in my previous blog can be handled by a direct mapping. As a reminder: the toy example I use comes from my  generic Semantic Web tutorial. Here is the (toy) table:

which is then converted into an RDF Graph:

(Just as in the previous case I will ignore the part of the graph dealing with the publisher, which has the same structure as the author part. I will also ignore the prefix definitions.)

The direct mapping of the first and second tables is pretty straightforward. The URI-s are a bit ugly but, well, this is what you get when you use a generic solution. So here it is:

@base <http://book.example/> .
<Book/ID=0006511409X#_> a <Book> ;
  <Book#ISBN> "0006511409X" ;
  <Book#Title> "The Glass Palace" ;
  <Book#Year>  "2000" ;
  <Book#Author> <Author/ID=id_xyz#_> .

<Author/ID=id_xyz#_> a <Author> ;
  <Author#ID> "id_xyz" ;
  <Author#Name> "Ghosh, Amitav" ;
  <Author#Homepage> "http://www.amitavghosh.com" .

Simple, isn’t it?

The result is fairly close to what we want, but not exactly. First of all, we want to use different vocabulary terms (like a:name). Also, note that the direct mapping produces literal objects most of the time, except when there is a “jump” from one table to another. Finally, the resulting graph should use a blank node for the author, which is not the case in the generated graph.

Fortunately, we have tools in the Semantic Web domain to transform RDF graphs. RIF is one possible solution; another is SPARQL, using the CONSTRUCT form. Using SPARQL is an attractive solution because, in practice, the output of the direct mapping may not even be materialized; instead, one would expect a SPARQL engine attached to a particular relational database, mapping the SPARQL queries to the table on the fly. I will use SPARQL 1.1 below because that gives nice facilities to generate RDF URI Resources from strings, i.e., to have “bridges” from literals to URI-s. Here is a possible SPARQL 1.1 query/construct that could be used to achieve what we want:

CONSTRUCT {
  ?id a:title ?title ;
    a:year  ?year ;
    a:author _:x .
  _:x a:name ?name ;
    a:homepage ?hp .
}
WHERE {
  SELECT (IRI(fn:concat("http://...",?isbn)) AS ?id)
          ?title ?year ?name
         (IRI(?homepage) AS ?hp)
  {
    ?book a <Book> ;
      <Book#ISBN> ?isbn ;
      <Book#Title> ?title ;
      <Book#Year>  ?year ;
      <Book#Author> ?author .
    ?author a <Author> ;
      <Author#Name> ?name ;
      <Author#Homepage ?homepage .
  }
}

Note the usage of a nested query; this is used to create new variables representing the URI references to be used by the outer query. The key is the IRI operator. (Both the nesting and the AS in the SELECT are SPARQL 1.1 features.)

That is it. Of course, the question does arise: which one would one use? The direct mapping or R2RML? Apart from the possible restriction that the local database system may implement the direct mapping only, it becomes also a question of taste. The heavy tool in R2RML is, in fact, the embedded SQL query; if one is comfortable with SQL than that is fine. But if the user is more comfortable with Semantic Web tools (e.g., SPARQL or RIF) then the direct mapping might be handier.

(Note that these are evolving documents still. I already know that my previous blog is wrong in the sense that it is not in line with the next version of R2RML. Oh well…)

November 2, 2010

My first mapping from RDB to RDF using R2RML

The W3C RDB2RDF Working Group has just published a first public Working Draft for the standardized RDB->RDF mapping language called R2RML. I decided that the only way to understand a specification like that is to try to use it for an example. Caveat: this is a “First Public Working Draft” for R2RML, so many things still have to happen and there will be changes.

For several years now I use a simple example in my generic Semantic Web tutorial (see, e.g., the one at SemTech). It is an artificial example referring to an imaginary bookshop’s table:

which is then converted into an RDF Graph:

(And the tutorial story is how this graph can be merged with a graph coming from another bookshop’s data.) Up until now I always glossed over how this mapping is done. Well, so how could that be done with R2RML?

R2RML defines mappings that describe how an RDB table is mapped on triples. (R2RML is in itself in RDF, b.t.w.) Simply put, in R2RML, each row of a table is mapped to an RDF subject; the individual cells, with the column names, provide the object and the predicates, respectively.

If we look at the middle table in the example, it corresponds to the lower right hand part of the graph. The R2RML mapping has to specify that the homepage column should actually produce an RDF Resource as a literal and not a string. Furthermore, the first column should become a blank node; that has to be specified, too. Here is the way this is all specified:

:Table2 rdf:type rr:TriplesMap ;
    rr:logicalTable "Select  ("_:" || ID) AS pid, Name, ("<" || Homepage || ">) AS Home from person_table";
    rr:subjectMap [ a rr:BlankNodeMap ; rr:column "pid" ; ] ;
    rr:propertyObjectMap [ rr:property a:name; rr:column "Name" ] ;
    rr:propertyObjectMap [ a rr:IRIMap ; rr:property a:homepage; rr:column "Home" ] .

What happens here is:

  1. a mapping is defined that turns the original table into a virtual, “logical” table using SQL. The goal here is to generate a blank node ID on the fly, and a URI in NTriple syntax (note, however, that I am not sure it is o.k. to use that approach in the spec!);
  2. the subject for the triples is chosen to be a cell in a specific column (“pid”, generated by the SQL transform of the previous point), and it is also specified that this is a blank node;
  3. the other two properties are specified (for the same subject); the one for the home page also specifies that the object must be a URI resource (as opposed to a Literal).

That is it. Mapping of the bottom table to the lower left hand corner of the graph is also quite similar, I will not go into this here.

But we still need the “root”, so to say, i.e., the node in the upper right hand corner, the top portion of the graph (with the title and the year) and, mainly, we also have to relate the root to the portion of the graph that is generated from the middle table.

First, the following R2RML part does the job of generating the top part of the graph:

:Table1 rdf:type rr:TriplesMap ;
    rr:logicalTable "Select ('<http:..isbn/' || ISBN || '>') AS isbn, 
                     Author, Title, Publisher, Year from book_table";
    rr:subjectMap [ rdf:type rr:IRIMap ; rr:column "isbn" ] ;
    rr:propertyObjectMap [ rr:property a:title ; rr:column "Title" ; ] ;
    rr:propertyObjectMap [ rr:property a:year ; rr:column "Year" ; ] ;

The only role of the mapping to a logical table is to generate a URI from the ISBN; all the other cells are, conceptually, simply copied on the logical table. The rest is fairly straightforward.

The missing trick is to combine, i.e., to “join”, the two tables on the graph. R2RML has a separate construction for that, referred to as “mapping” the foreign keys. The following additional statements should be added to :Table1:

    rr:foreignKeyMap [ 
       rr:key a:author ; 
       rr:parentTriplesMap :Table2 ; rr:joinCondition "{child}.Author = {parent}.pid"
    ] .

Which combines the nodes defined by :Table1 with those of :Table2. And voilà! We’re done: the R2RML document is ready, i.e., an R2RML engine would generate my example table into my example graph.

Of course, there are more complicated possibilities. Triples, or whole rows, can be explicitly stored in a specific named graph, for example. Or a column defining a predicate could, actually, use a cell in another column as an object. Etc. And, to be honest, I am not even 100% sure that above is correct, I may have misunderstood some details. But the “melody” is still clear.

Note the role the SQL based mapping of the original table to the logical table has. For SQL experts, most of the work can be done there, i.e., the resulting RDF graph can be ready for further usage by an application, to be linked into the LOD, to be used with the right attributes, namespaces, etc. Which is very powerful indeed, provided… the user has the necessary SQL expertise. And, while that is obviously true for database managers, it is not necessarily true for RDF experts. For those, a slightly different model seems to be more appropriate: they would prefer to get an RDF graph ASAP, so to say, without any fancy transformation, and would then use RIF, SWSRL, SPARQL’s CONSTRUCT, etc., to turn it into the RDF graph they eventually want to have. In other words, they may not need the concept of a logical table. That is what is referred to by the group as the “default” mapping. I.e., what graph does one get if nothing is specified? If that is properly defined then, say, RIF experts can use their expertise instead of SQL. This default mapping is not yet fully specified by the group, but it is on its way; it will be published shortly, and will complete the R2RML picture. So watch that space…

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,616 other followers