Ivan’s private site

May 12, 2010

RIF (Core) and LOD

Linked Data (Semantic Web) candies
Image by reedster via Flickr

W3C has just published a Proposed Recommendation for the Rule Interchange Format (RIF); this means, in the W3C jargon, that the technical work is done, and the W3C asks its members for a seal of approval to publish it as Recommendation.

Somehow the RIF development was not on the radar screen of the Semantic Web community. There may be many reasons for that, and I think we should just accept this as part of history. The future is much more important; we should indeed realize that RIF is an important piece of the Semantic Web technical architecture and let us do our best to get it embraced widely.

RIF Core is the simplest variant of RIF. It is not very complicated. It is a simple rule language; one can define a series of Horn rules, there are some safety features built in so that the rules can be executed, conceptually, by a forward chaining engine, it has the familiar XSD datatypes with the usual operations, it operates on URI-s, and it has a notion analogous to RDF blank nodes. There is a separate document that describes how RIF (Core) rules operate with RDF data and how the various semantics (RIF, RDF(S), OWL) work together. The details are not really important here, suffices it to say that it, essentially, works like one would expect as a layperson… The RIF syntax is a little bit convoluted for the moment, but there may be work coming up to improve that in form of alternative, more readable syntaxes.

So what can it be used for? At the W3C LOD Camp in Raleigh (held as part of the WWW2010 conference), Sandro Hawke already gave a simple example why RIF should be interesting for LOD applications. Let me add a few further examples that might be of interest.

Remember OWL-RL? The OWL Working Group has defined a subset of OWL that can be handled by rules. The rules themselves were also published by the OWL WG, albeit using an abstract notation. Those rules can be described in RIF Core as well; the RIF group has published this mapping in a separate document. Following those rules a RIF Core engine can handle OWL-RL directly.

Why is that interesting?—you might ask. Well, there has been quite some discussions when defining OWL RL on whether the features included in OWL RL represent the right set for users. Some claimed that there are other OWL features that could be added; others said that, on the contrary, the complexity of OWL RL is already too high and the features should be reduced to make them more palatable to users. In some ways, the usage of RIF Core may make this discussion moot. Indeed, users, or user communities, can define the rules they are interested in RIF by cherry picking the rules described by the RIF WG in the document cited above. They can send those rules to their RIF Core reasoner alongside their data, and get what they want. If that rule set consists only of 2-3 OWL rules, because that is all the application cares about, than all the better, the RIF inference engine will just do its job faster. If the user wants to add OWL features that are not in OWL RL, that may also be doable; the OWL 2 RDF-Based semantics specification is such that, in many cases, the extra rules can be extracted fairly easily from the OWL 2 Full semantics, using the patterns in the RIF/OWL RL document (although I have to emphasize that this does not work in all cases!). Note that this model of “sending” the RIF rule set alongside the RDF data to a reasoner is exactly the way RIF reasoning is being defined for SPARQL1.1 in the separate Entailment Regimes document (still in draft). Note also that I referred to OWL RL here, but the same approach could be used with RDFS with, obviously, a smaller RIF Rule set.

Another, albeit related application of RIF came to my mind reading an email discussion on whether inferences should be materialized for large LOD datasets or not and, if yes, which ones. As an answer to Vasiliy Faronov’s question, Leigh Dodds also proposed a text to be added to his Linked Data Patterns book. The resulting discussion thread was really about which inferences should really be materialized. Materializing them all may not be realistic; but if only a selection of the possible inferences is used (eg, subset of RDFS or OWL) how would consumers of the data know? It looks like RIF may come to rescue. Publishers could simply publish the rules they use for materializing their inferences in RIF. (Again, this is not always possible; RIF cannot cover the whole of OWL. But it does cover a very large percentage of the use cases.) Consumers may actually choose whether they want to download all the triples, including the inferenced triples, or whether they choose to download data from the “core” dataset only together with the RIF file, and materialize the inferences locally using a local RIF engine (or use the RIF file with an RIF Entailment aware SPARQL 1.1 engine).

RIF is and should be considered as integral and essential part of the Semantic Web Technology landscape. Let us hope many implementations of, at least, RIF Core will bloom to make this a reality! (There is a public list of existing implementations so far.)

December 3, 2008

Bridge between SW communities: OWL RL

The W3C OWL Working Group has just published a series of documents for the new version of OWL, most of them being so-called Last Call Working Drafts (which, in the W3C jargon, means that the design is done; after this, it will only change in response to new problems showing up).

There are many aspects of the new OWL 2 that are of a great interest; I would concentrate here on only one of those, namely the so-called OWL RL Profile. OWL 2 defines several “profiles”, which are subsets of the full OWL 2; subsets that have some good properties, e.g., in terms of implementability. OWL RL is one of those. “RL” stands for “Rule Language” and what this means is that OWL RL is simple enough to be implemented by a traditional (say, Prolog-like) rule engine or can be easily programmed directly in just about any programming language. There is of course a price: the possibilities offered by OWL RL are restricted in terms of building a vocabulary, so there is a delicate balance here. Such rule oriented versions of OWL have also precedences: Herman ter Horst published, some years ago, a profile calld pD*; a number of triple store vendors have a similar, restricted versions of OWL implemented in their systems already, referred to as RDFS++, OWLPrime, or OWLIM; and there has been some more theoretical work done by the research community in this direction, too, usually referred to by “DLP”. The goal was common to all of these: find a subset of OWL that is helpful to build simple vocabularies, and that can be implemented (relatively) easily. Such subsets are also widely seen as more easily understandable and usable by communities that work with RDF(S) and need only a “little bit of OWL” for their applications (instead of building more rigorous and complex ontologies which requires extra skills they may not have). Well, this is  the niche of OWL RL.

OWL RL is defined in terms of a functional, abstract syntax (defining a subset of DL) as well as a set of rules of the sort “if that and that triple pattern exists in the RDF Graph then add these and these triples”. The rule set itself is oblivious to the DL restrictions in the sense that it can be used on any RDF graphs, albeit with a possible loss of completeness. (There is a theorem in the document that describes the exact situation if you are interested.)

The number of rules is fairly high (74 in total), which seems to deceive the goal of simplicity. But this is misleading. Indeed, one has to realize that, for example, these rules subsume most of RDFS (e.g., what the meaning of domain, range, or subproperty is). Around 50 out of the 74 rules simply codify such RDFS definitions or their close equivalents in OWL (what it means to be “same as”, to have equivalent/disjoint properties or classes, that sort of things). All of these are simple, obvious, albeit necessary rules. There are only around 20 rules that bring real extra functionality compared to RDFS for building simple vocabularies. Some of these functionalities are:

  • Characterization of properties as being (a)symmetric, functional, inverse functional, inverse, transitive,…
  • Property chains, ie, defining the composition of two or more properties as being the sub property of another one. (Remember the classic “uncle” relationship that cannot be expressed in terms of OWL 1? Well, by chaining “brother” and “parent” one can say that the chain is a subproperty of “uncle” and that is it…)
  • Intersection and union of classes
  • Limited form of cardinality (only maximum cardinality and only with values 0 and 1) and  property restrictions
  • An “easy key” functionality, i.e., deducing the equivalence of two resources if a list of predefined properties have identical values for them (e.g., if two persons have the same name, same email address, and the same home page URI, then the two persons should be regarded as identical)

Some of these features are new in OWL 2 (property chaining, easy keys), others are already been present in OWL 1.

Quick and dirty implementations of OWL RL can be done fairly easily. Either one uses an existing rule engine (say, Jena rules) and lets the rule engine take its course or one encodes the rules directly on top of an RDF environment like Sesame, RDFLib, or Redland, and uses a simple forward chaining cycle. Of course, this is quick and dirty, i.e., not necessary efficient, because it will generate many extra triples. But if the rule engine can be combined with the query system (SPARQL or other), which is the case for most triple store implementations, the actual generation of some of those extra triples (e.g., <r owl:sameAs r> for all resources) may be avoided. Actually, some of the current triple stores already do such tricks with the OWL profiles they implement. (And, well, when I see the incredible evolution on the size and efficiency of triple stores these days, I wonder whether this is really an issue on long term for a large family of applications.) I actually did such quick and dirty implementation in Python; if you are curious what triples are generated via OWL RL for a specific graph, you can try out a small service I’ve set up. (Caveat: it has not been really thoroughly tested yet, i.e., there are bugs. Neither it is particularly efficient. Do not use it for anything remotely serious!).

So what is the possible role of OWL RL in developing SW applications? I think it will become very important. I usually look at OWL RL as some sort of a “bridge” that allows some RDF/SW applications to evolve in different directions. Such as:

  • Some applications may be perfectly happy with OWL RL as is (usually combined with a SPARQL engine to query the resulting, expanded graph), and they do not really need more in term of vocabulary expressiveness. I actually foresee a very large family of applications in this category.
  • Some applications may want to combine OWL RL with some extra, application specific rules. They can rely on a rule engine fed with the OWL RL rules plus the extra application rules. B.t.w., although the details are still to be fleshed out, the goal is that a RIF implementation would accept OWL RL rules and produce what has to be produced. I.e., RIF compatible implementation would provide a nice environment for these types of applications.
  • Some applications may hit, during their evolution, the limitations of OWL RL in terms of vocabulary building (e.g., they might need more precise cardinality restrictions or the full power of property restrictions). In which case they can try expand their vocabulary towards more complex and formal ontologies using, e.g., OWL DL. They may have to accept some more restrictions because they enter the world of DL, and they would require more complex reasoning engines, but that is the price they might be willing to pay. While developers of applications in the other categories would not necessarily care about that, the fact that the language is also defined in terms of a functional syntax makes (i.e., that that version of OWL RL is integral part of OWL 2) this evolution path easier.

Of course, at the moment, OWL RL is still a Draft, albeit in Last Call. Feedbacks and comments of the community as well as the experience of implementers is vital to finalize it. Comments to the Working Group can be sent to public-owl-comments@w3.org (with public archives).

October 15, 2008

Semantic Web and uncertainty

The issue of uncertainty on the Semantic Web has been around for a while now, although it is still largely a research issue (Though not only; C&P has an extension of their Pellet tool to handle a particular probabilistic extension of OWL; but I am not aware of any other commercial system of the kind.) Ken and Kathryn Laskey and Paulo Costa have been organizing a series of workshops (the URSW series) on the subject for several years now (there will be one on the coming ISWC2008 conference, too!), and there was also a W3C Incubator Group on the subject that issued a report not a long time ago. But still a lot to be done…

The reason I remembered all that is because I found a survey of Thomas Lukasiewicz and Umberto Straccia[1] that is worth reading if you are interested in the subject. The survey gives a separate description of probabilistic, possibilistic, and fuzzy extensions of the DL dialect that is at the basis of OWL DL, together with further references if one wants to dig deeper (156 of those!). It is not an easy read at all, and I couldn’t say I understood all the details described there, far from it… But as all good surveys do it gives you an idea or, or refreshes your memory on what is happening in the area. And that is always incredibly useful.

The approaches described in the paper are fairly high level in the sense that (as the authors emphasize, too) the extensions are all on top of SHOIN(D), ie, OWL DL. That makes the constructions sometimes quite complex (mainly in the probabilistic case) and they are probably difficult to use for a lambda user. (Although, who knows. They are certainly complex to implement, but maybe the usage is not that bad. I am not sure.) However, the authors themselves refer to alternative approaches on top of simpler DL dialects (without giving too much details). It would be nice to have a survey on the extensions of a level corresponding to simpler OWL profiles like OWL RL that the OWL WG at W3C is also working on now. Just as OWL RL might be a good “entry point” for a large family of users into the world of OWL, an uncertainty extension of that level might be of a great interest, too…

Reading this survey also reminded me of short paper by Fensel and van Harmelen[2], “Unifying Reasoning and Search to Web Scale”, on which I had a very short blog a while ago. I just wonder whether fuzzy or probabilistic reasoning may not be a good approach to the problems they describe there… Althought this is clearly still a long way off.

Anyway. I learned something today…

  1. Lukasiewicz, Thomas, and Umberto Straccia. “Managing uncertainty and vagueness in description logics for the Semantic Web.” Journal of Web Semantics: Science, Services and Agents on the World Wide Web 6, no. 4 (2008). Available on line as a pre-print.
  2. “Unifying Reasoning and Search to Web Scale”, by Dieter Fensel and Frank van Harmelen, IEEE Internet Computing, Volume 11, No. 2, March/April 2007.

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 2,317 other followers