Ivan’s private site

June 7, 2012

The long journey to RDFa 1.1

RDFa 1.1 Core, RDFa 1.1 Lite, and XHTML+RDFa 1.1 have just been published as Web Standards, i.e., W3C Recommendations, accompanied by a new edition of the RDFa Primer. Although it is “merely” and update of the previous RDFa 1.0 standard (published in 2008), it is a significant milestone nevertheless. RDFa 1.1 has restructured RDFa 1.0 in terms of the host languages it can be used with, and has also added some important features.

It has been a long journey. The development of RDFa (and I include RDFa 1.0 in this) was slowed down more by “social” rather than technical issues. Indeed, RDFa is at the crossroad of two different communitites which, alas!, had very little interaction before. As its name suggests, RDFa is of course closely related to RDF, i.e., to the communites related to the Semantic Web, Linked Data, RDF, etc. On the other hand, the very goal of RDFa is to add structured data to markup languages (primarily the HTML family, of course, but also SVG, Atom, etc.). This means that RDFa is also relevant to all these communities, often loosely referred to as the “Web Application” community. The interaction between these communities was not always easy, and was often characterized by misunderstandings, different engineering patterns, different concerns. To make things even more difficult, RDFa was also caught in the middle of the XHTML2 vs. HTML5 controversy: after all, the first drafts of RDFa were developed alongside XTHML2 and, although the current RDFa has long moved away from this heritage, the image of being part of XHTML2 stayed.

But all this is behind us now, and should be relegated to history. In my view the result, RDFa 1.1, reflects a good balance between the concerns and usage patterns of these communities; and that is what really counts. RDFa 1.1 allows the usage of prefixed abbreviation for URIs (so called CURIEs) that the RDF community had been using and got used to for many years, but (in contrast to RDFa 1.0) its usage is now optional: authors may choose to use full URIs wherever and whenever they wish. By the way, prefixes for CURIEs are not defined through the @xmlns mechanism inherited from XML (this was probably the single biggest stumbling block around RDFa 1.0): instead, the usage of @xmlns is deprecated in favour of a dedicated @prefix attribute. Finally, a number of well-known vocabularies have predefined prefixes; authors are not required to define prefixes for, say, the Dublin Core, FOAF, Schema.org, or Facebook’s Open Graph Protocol terms; they are automatically recognized. Finally, beyond these facilities with prefixed terms, RDFa 1.1 authors also have the possibility to define a vocabulary for a markup fragment (via the @vocab attribute) and forget about URIs and prefixes altogether: simple terms in property names or types will authomatically be assigned URIs in that vocabulary. This is particularly important when RDFa is used with a single vocabulary (Schema.org or OGP usage comes to mind again).

The behaviour of @property has been made richer, which means that in many (most?) situations the structured data can be expressed with @property alone, without the usage of @rel or @rev (although the usage of these latter is still possible). This increased simplicity is important for authors who are new to this world and may not, initially, grasp the difference between the classical usage of @propery (i.e., literal objects) and @rel (i.e., URI References as objects). (Unfortunately, this change has created some corner-case backward incompatibilities with RDFa 1.0.)

There are also some other, though maybe less significant, improvements. For example, authors can also express (RDF) lists succintly; this means that RDFa 1.1 can be used to describe, e.g., author lists for an article (where order counts a lot) or an OWL vocabulary. Also, an awkwardness in RDFa 1.0, related to XML Literals, have been removed.

The structure of RDFa has also changed. Whereas the definition of RDFa 1.0 was closely intertwined with XHTML, RDFa 1.1 separates the core definition from what it calls “Host Languages”. This means that RDFa is defined in a way that it can be adapted to all types of XML languages as well as HTML5. There are separate specifications on how RDFa 1.1 applies to XHTML1 and for HTML5, as well as for XML in general; this means that RDFa 1.1 can also be used with SVG, Atom, or MathML, because those languages automatically inherit from the XML definitions.

Last but not least: the Working Group has also defined a separate “subset” language, called RDFa 1.1 Lite. This is not a separate RDFa 1.1 dialect, just an authoring subset of RDFa 1.1: an authoring subset that makes it easy for authors to step into this world easily, without being forced to use all the possibilities of RDFa 1.1 (i.e., RDF). It can be expected that a large percentage of RDFa usage can be covered by this subset, but it would also provide a good stepping stone when more complex structures (mixture of many different vocabularies, datatypes, more complex graph structures, etc) are required.

As I said, it has been a long journey. Many people were involved in the work, both in the Working Group but also through comments coming from the public and from major potential users. But now that the result is there, I can safely say: it was worth the effort. Recent figures on the adoption of structured data on the Web (see, for example the reports published at the LDOW 2012 Workshop recently by Peter Mika and Tim Potter, as well as by Hannes Mühleisen and Christian Bizer) can be summarized by a simple statement: structured data in Web pages is now mainstream, thanks to its adoption by search engines (i.e., Schema.org) or companies like Facebook. And RDFa 1.1 has a major role to play in this evolution.

If you are new to RDFa: the RDFa Primer is of course a good starting point, but it is well worth checking out (and possibly contribute to!) the rdfa.info web site which contains references to tools, documents; you can also try out small RDFa snippets. Enjoy!

December 16, 2011

Where we are with RDFa 1.1?

English: RDFa Content Editor

Image via Wikipedia

There has been a flurry of activities around RDFa 1.1 in the past few months. Although a number of blogs and news items have been published on the changes, all those have become “officialized” only the past few days with the publication of the latest drafts, as well as with the publication of RDFa 1.1 Lite. It may be worth looking back at the past few months to have a clearer idea on what happened. I make references to a number of other blogs that were published in the past few months; the interested readers should consult those for details.

The latest official drafts for RDFa 1.1 were published in Spring 2011. However, lot has happened since. First of all, the RDFWA Working Group, working on this specification, has received a significant amount of comments. Some of those were rooted in implementations and the difficulties encountered therein; some came from potential authors who asked for further simplifications. Also, the announcement of schema.org had an important effect: indeed, this initiative drew attention on the importance of structured data in Web pages, which also raised further questions on the usability of RDFa for that usage pattern This came to the fore even more forcefully at the workshop organized by the stakeholders of schema.org in Mountain View. A new task force on the relationships of RDFa and microdata has been set up at W3C; beyond looking at the relationship of these two syntaxes, that task force also raised a number of issues on RDFa 1.1. These issues have been, by and large, accepted and handled by the Working Group (and reflected in the new drafts).

What does this mean for the new drafts? The bottom line: there have been some fundamental changes in RDFa 1.1. For example, profiles, introduced in earlier releases of RDFa 1.1, have been removed due to implementation challenges; however, management of vocabularies have acquired an optional feature that helps vocabulary authors to “bind” their vocabularies to other vocabularies, without introducing an extra burden on authors (see another blog for more details). Another long-standing issue was whether RDFa should include a syntax for ordered lists; this has been done now (see the same blog for further details).

A more recent important change concerns the usage of @property and @rel. Although usage of these attributes for RDF savy authors was never a real problem (the former is for the creation of literal objects, whereas the latter is for URI references), they have proven to be a major obstacle for ‘lambda’ HTML authors. This issue came up quite forcefully at the schema.org workshop in Mountain View, too. After a long technical discussion in the group, the new version reduces the usage difference between the two significantly. Essentially, if, on the same element, @property is present together with, say, @href or @resource, and @rel or @rev is not present, a URI reference is generated as an object of the triple. I.e., when used on a, say, <link> or <a> element, @property  behaves exactly like @rel. It turns out that this usage pattern is so widespread that it covers most of the important use cases for authors. The new version of the RDFa 1.1 Primer (as well as the RDFa 1.1 Core, actually) has a number of examples that show these. There are also some other changes related to the behaviour of @typeof in relations to @property; please consult the specification for these.

The publication of RDFa 1.1 Lite was also a very important step. This defines a “sub-set” of the RDFa attributes that can serve as a guideline for HTML authors to express simple structured data in HTML without bothering about more complex features. This is the subset of RDFa that schema.org will “accept”,  as an alternative to the microdata, as a possible syntax for schema.org vocabularies. (There are some examples on how some schema.org example look like in RDFa 1.1 Lite on a different blog.) In some sense, RDFa 1.1 Lite can be considered like the equivalent of microdata, except that it leaves the door open for more complex vocabulary usage, mixture with different vocabularies, etc. (The HTML Task Force will publish soon a more detailed comparison of the different syntaxes.)

So here is, roughly, where we are today. The recent publications by the W3C RDFWA Working Group have, as I said, ”officialized” all the changes that were discussed since spring. The group decided not to publish a Last Call Working Draft, because the last few weeks’ of work on the HTML Task Force may reveal some new requirements; if not, the last round of publications will follow soon.

And what about implementations? Well, my “shadow” implementation of the RDFa distiller (which also includes a separate “validator” service) incorporates all the latest changes. I also added a new feature a few weeks ago, namely the possibility to serialize the output in JSON-LD (although this has become outdated a few days ago, due to some changes in JSON-LD…). I am not sure of the exact status of Gregg Kellogg’s RDF Distiller, but, knowing him, it is either already in line with the latest drafts or it is only a matter of a few days to be so. And there are surely more around that I do not know about.

This last series of publications have provided a nice closure for a busy RDFa year. I guess the only thing now is to wish everyone a Merry Christmas, a peaceful and happy Hanukkah, or other festivities you honor at this time of the year.  In any case, a very happy New Year!

Enhanced by Zemanta

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,616 other followers