Ivan’s private site

June 8, 2010

RDFa API draft

A few weeks ago I have already blogged about the publication of the RDFa 1.1 drafts. An essential, additional, piece of the RDFa puzzle has now been published (as a First Public Working Draft): the RDFa API. (Note that there is no reference to “1.1” in the title: the API is equally valid for versions 1.0 and 1.1 of RDFa. More about this below.)

Defining an API for RDFa was a slightly complex exercise. Indeed, this API has very different constituencies, ie, possible user communities, and these communities have different needs and background knowledge. On the one hand, you have the (for the lack of a better word) “RDF” community, ie, people who are familiar and comfortable with RDF concepts, and are used to handle triples, triple stores, iterating through triples, etc. Essentially, people who either have already a background in using, say, Jena or RDFLib, or who can grasp these concepts easily due to their background. But, on the other hand, there are also people coming more from the traditional Web Application programmers’ community who may not be that familiar with RDF, and who are looking for an easy way to get to the additional data in the (X)HTML content that RDFa provides. Providing a suitable level of abstraction for both of these communities took quite a while, and this is the reason why the RDFa API FPWD could not be published together with RDFa 1.1.

But we have it now. I will give a few examples below on how this API can be used; look at the draft for more details (there more examples there; the examples in this blog use, actually, some of the examples from the document!). The usual caveat applies: this is a working draft with new releases in the future; comments are more than welcome! (Please, send them to public-rdfa-wg@w3.org, rather than answering to this blog.)

The “non-RDF” user

Let me use the same RDFa example as in the previous blog:

<div prefix="relation: http://vocab.org/relationship/ foaf: http://xmlns.com/foaf/0.1/">
   <p about="http://www.ivan-herman.net/foaf#me"
      rel="relation:spouse" resource="http://www.ivan-herman.net/Eva_Boka"
      property="foaf:name">Ivan Herman</p>

Encoding the (RDF) triples:

<http://www.ivan-herman.net/foaf#me> <http://vocab.org/relationship/spouse> <http://www.ivan-herman.net/Eva_Boka> .
<http://www.ivan-herman.net/foaf#me> <http://xmlns.com/foaf/0.1/name> "Ivan Herman" .

So how can one get to these using the API? The simplest approach is to get the collection of property-value pairs assigned to a specific subject. Using the API, one can say:

>> var ivan = document.getItemBySubject("http://www.ivan-herman.net/foaf#me")

yielding an instance of what the document calls a ”Property Group”. This object contains all the property-value pairs (well, in RDF terms, the predicate-object pairs) that are associated with http://www.ivan-herman.net/foaf#me. It is then possible to find out what properties are defined for a subject:

>> print(ivan.properties)

It is also possible relate back to the DOM node that holds the subject via ivan.origin, and use the get method to get to the values of a specific property, ie:

Ivan Herman

Note that, on the level, the user does not really have to understand the details of RDF, of predicates, etc; what one gets back is, essentially, a literal or a representation of an IRI, and that is about it. Simple, isn’t it?

Of course, there is slightly more to it. First of all, finding the property groups only through the subjects may be too restrictive. The API therefore includes similar methods to search through the content via properties (“return all the property groups whose subject has a specific property”) or via type (i.e., rdf:type in RDF terms, @typeof in RDFa terms). One can also search for DOM Nodes rather than for Property Groups. Eg, using

>> document.getElementsByType("http://http://xmlns.com/foaf/0.1/Person")

one can get hold of the elements that are used for persons (e.g., to highlight these nodes or their corresponding subtrees on the screen by manipulating their CSS style). I also used full URI-s everywhere in the examples; it is also possible to set CURIE like prefixes to make the coding a bit simpler and less error prone.

An that is really it for simple applications. Note that many RDFa content (eg, Facebook‘s Open Graph protocol, or Google‘s snippets) include only a few simple statements whose management is perfectly possible with these few methods.

The RDF user

Of course, RDF users may want more and, sometimes, the complexity of the RDF content does require more complex methods. The RDFa API spec does indeed provide a more complex set of possibilities.

The underlying infrastructure for the API is based on the abstract concept of a store. One can create such a store, or can get hold of the default one via:

>> var store = document.data.store

The default store contains the triples that are retrieved from the current page. It is then possible to iterate through the triples, possibly via an extra, user-specified filter. Furthermore, it is possible to  create (RDF) new triples and add them to the store. One can add converter methods that control how typed literals are mapped onto, say, Javascript objects (although an implementation will provide a default set for you). In some ways, fairly standard RDF programming stuff, yielding, for example, to the following code:

>> triples = document.data.store.filter(); // get all the triples
>> for( var i=0; i < triples.size; i++ ) {
>>   print(triples[i]);
>> }
<http://www.ivan-herman.net/foaf#me> <http://vocab.org/relationship/spouse> <http://www.ivan-herman.net/Eva_Boka>
<http://www.ivan-herman.net/foaf#me> <http://xmlns.com/foaf/0.1/name> "Ivan Herman"

There is one aspect that is very well worth emphasizing. The parser to the store is conceptually separated from the store (similarly to, say, RDFLib). What this means is that, though an implementation provides a default parser for the document, it is perfectly possible to add another parser parsing the same document and putting the triples into the same store. The RDFa API document does not require that the parser must be RDFa 1.1; one can create a separate store and use, for example, an RDFa 1.0 parser. But much more interesting is the fact that one can also add a, say, hCard microformat parser that produces RDF triples. The triples may be added to the same store, essentially merging the underlying RDF graphs. Eg, one can have the following:

>> store = document.data.store; // by default, this store has all the RDFa 1.1 content
>> document.data.createParser("hCard",store).parse();

merging the RDFa and the hCard terms in one triple store. I think the possibility to use the RDFa API to, at last, merge the different RDF interpretable syntaxes in this area in one place is very powerful. (It must be noted that the details of the parser interface is still changing; e.g., it is not yet clear how various parsers should be identified. This is for one of the next releases…)

As I said, comments are more than welcome, there is still work to do, but the first, extremely important step has been made!

April 22, 2010

RDFa 1.1 Drafts

Filed under: Semantic Web,Work Related — Ivan Herman @ 15:52
Tags: , , ,

W3C has just published two RDFa 1.1 documents. In the W3C jargon these are called “First Public Working Drafts”, which means that they are by no means complete and features may still change based on community feedback. However, they document the intentions of the Working Group as for the direction it thinks it should take. (Note that another FPWD, namely an RDFa API specification, should follow very soon; hopefully a new version of the HTML+RDFa will be issued soon, too, incorporating these changes.) It may be interesting to summarize some of the important changes compared to the previous version of RDFa; that is what I will attempt to do. By the way, if you want to comment on RDFa 1.1, I would prefer you commented on the Group’s dedicated mailing list rather than on this blog: public-rdfa-wg@w3.org (see also the archives).

So, what are the new things?

1. Separation of RDFa 1.1 Core and RDFa 1.1 XHTML. This is one of the differences visible immediately: instead of one specification (which was the case for RDFa 1.0) we have now two. This comes from the fact that the RDFa attributes may be, and have already been, used in XML languages other than XHTML (SVG or ODF are good examples). Separation of the Core and the XHTML specific features is just a to make such adaptations cleaner. (I will use XHTML examples in this blog, though.)

2. Default term URI (a.k.a. @vocab attribute). RDFa 1.0 has a number of terms that can be used with attributes like @rel or @rev without any further qualifications. Examples are ‘next’ or ‘license’. These values were “inherited” by RDFa 1.0 from HTML; the spec simply assigned a URI for each of those to use them as RDF properties (eg, http://www.w3.org/1999/xhtml/vocab#next or http://www.w3.org/1999/xhtml/vocab#license).

However, a more flexible way of defining such terms, ie, not sticking to a fixed set and URI-s, is very useful for HTML authors in general. That is achieved by the new @vocab attribute: it defines a ‘default term URI’ that is concatenated with any term to form a URI. The mechanism can be applied to most of the RDFa attributes, not only to @rel and @rev. For example, the following RDFa 1.1 code:

<div vocab="http://xmlns.com/foaf/0.1/">
   <p about="#me" typeof="Person" property="name">Ivan Herman</p>

will generate the familiar FOAF triples:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<#me> a foaf:Person;
      foaf:name "Ivan Herman" .

The effect of @vocab is of course valid for the whole subtree starting at <div> (unless another @vocab takes it over somewhere down the line). This makes simple annotations with RDFa very easy for authors who do not want to deal with the intricacies of URI-s and CURIE-s.

3. Profile documents for terms. The problem with the @vocab mechanism is that it works only with one vocabulary. However, in many cases, one want to mix vocabularies: after all, this is one of the main advantages of using RDF (and hence RDFa). Eg, one would like to encode something like:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://vocab.org/relationship/> .
<#me> a foaf:Person;
      foaf:name "Ivan Herman" ;
      rel:spouseOf <http://www.ivan-herman.net/Eva_Boka> .

A solution is to use a profile document. This is a simple RDF document that can be made available in various serializations (though only RFDa is mandatory for an RDFa processor) and describes the mapping of terms to URIs. Eg, one could define the http://example.org/simple-profile.ttl file (using here Turtle syntax for simplicity):

@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
[] rdfa:uri "http://vocab.org/relationship/spouseOf";
  rdfa:term "spouse" .
[] rdfa:uri "http://xmlns.com/foaf/0.1/name";
  rdfa:term "name" .
[] rdfa:uri "http://xmlns.com/foaf/0.1/Person";
  rdfa:term "Person" .

and use that document in an RDFa file as follows:

<div profile="http://example.org/simple-profile.ttl">
   <p about="#me" typeof="Person"
      rel="spouse" resource="http://www.ivan-herman.net/Eva_Boka" 
      property="name">Ivan Herman</p>

yielding the triples we wanted. Note that the @profile attribute allows for several URIs, ie, for several profile documents, and the corresponding term definitions are merged. Here again, a @profile attribute down the tree will supersede term definitions.

Of course, using the profile document is a heavier mechanism and requires, at least conceptually, an extra HTTP request. I am sure there will be community comments on that. However, I personally do not expect average authors to fiddle around much with those profile files. Instead, vocabulary publishers like Google, Yahoo, Facebook, Dublin Core, or Creative Commons may publish the terms they use and understand in the form of such profile documents, and authors could simply refer to those. RDFa tools could (and are encouraged to) cache the information stored in those widely used profile documents which alleviates the HTTP request issue in practice.

4. Profile documents for prefixes. Using profile documents for terms is great but it does not scale very well. If the vocabularies are large then publishers of those vocabularies would have to create and maintain fairly large files. In such cases the CURIE mechanism, already defined in RDFa 1.0, becomes a good alternative: instead of having a separate URI defined explicitly for each term, one can use an abbreviations for the “base” URIs of those vocabularies via prefixes.

In RDFa 1.0 this required an author to add a series of ‘xmlns:XXX’ attributes to the RDFa content. That mechanism is still valid in RDFa 1.1, but one can also define prefixes as part of a profile document. Eg, the previous turtle fragment could have been written as

@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
[] rdfa:uri "http://vocab.org/relationship/";
  rdfa:prefix "relation" .
[] rdfa:uri "http://xmlns.com/foaf/0.1/";
  rdfa:prefix "foaf" .

and the RDFa fragment would then look like:

<div profile="http://example.org/simple-profile.ttl">
   <p about="#me" typeof="foaf:Person"
      rel="relation:spouse" resource="http://www.ivan-herman.net/Eva_Boka" 
      property="foaf:name">Ivan Herman</p>

to generate the same triples as before. This mechanism may become important, as I said, when several large vocabularies (or simply a large number of vocabularies) are used.

5. Usage of @prefix to define CURIE prefixes. Actually, the usage of the xmlns:XXX syntax to set CURIE prefixes was (and is) controversial; there may be host languages that do not work with such attributes. RDFa 1.1 provides therefore an alternative which, though semantically identical, avoids the usage of a ‘:’ character in the attribute name. Using this @prefix attribute an alternative to the previous RDFa could be:

<div prefix="relation: http://vocab.org/relationship/ foaf: http://xmlns.com/foaf/0.1/">
   <p about="#me" typeof="foaf:Person"
      rel="relation:spouse" resource="http://www.ivan-herman.net/Eva_Boka" 
      property="foaf:name">Ivan Herman</p>

which looks very much like an RDFa 1.0 document but using the @prefix attributes instead of @xmlns:foaf and @xmlns:relation. The old @xmlns: approach remains valid, of course, but the new one is preferred from now on.

6. URIs everywhere. The final item I mention is the possibility to use URI-s everywhere, ie, to bypass the CURIE abbreviation mechanism if so desired. Whereas some of the RDFa 1.0 attributes required the usage of CURIE-s (eg, @rel or @property), this is no longer true in RDFa 1.1. The rules are fairly simple: if an attribute value is of the form ‘pp:xxx’ and the ‘pp’ string cannot be interpreted as a CURIE prefix then the string ‘pp:xxx’ is considered to be a URI and is treated as such in the generated RDF. That also means that CURIE-s can be used in @about and @resource without the slightly awkward ‘safe’ CURIE-s.

The development of RDFa 1.1 is obviously not done yet; lots of details are to be checked, and some additional minor features (eg, possible changes on the handling of XML Literals) are still to be worked out. And, first and foremost, community comments on the directions taken are important. But these First Public Working Drafts give the general direction…

Create a free website or blog at WordPress.com.