Ivan’s private site

October 30, 2009

ISWC2009 4-5

Filed under: Semantic Web,Work Related — Ivan Herman @ 0:59
Tags: , , , , , ,

Fourth day

Shame on me, but I missed the morning keynote… I was a bit late arriving to the conference site and I got stuck in a conversation at breakfast. Things happen…

The most notable event in the morning, at least for me, was the SPARQL WG panel. All members of the Working Group (me included) were on the panel and the room was full. I mean, full, people were standing in the back. And I regard that as a success by itself, it shows not only the overall importance of SPARQL, but the real interest around the new version, ie, SPARQL 1.1 (in case you have missed it, the first working draft has just been published a few days ago). Lee Feigenbaum (co-chair of the group) gave a quick overview of the new features and then questions came.

The difficulty of the SPARQL 1.1 work is that it has to find a balance between what is realistic to standardize in a relatively short time frame and what could be good to see in a new query language. As a consequence, there are features that the community has discussed but have not made it into the document, or only in a simple format. That came up during the discussion but I had the impression that the audience, by and large, understood this balance. Actually, for some, the set of new features were even too much for an efficient implementation. I have the feeling that  the WG will have to publish a separate conformance document (a bit like OWL 2 has), because there is a certain confusion on whether a conforming SPARQL implementation will have to implement, say, update or inference regimes or not. That clearly came up through the questions. Anyway, remember one email address (yes, it is a bit of a mouthful): public-rdf-dawg-comments@w3.org this is where comments have to be sent on SPARQL 1.1!

I chaired a session on the use track in the afternoon.  The paper of Daniel Elenius et al on reasoning about resources (for military exercises) was interesting to me because it was based on reasoning with relatively large OWL ontologies plus rules. The OWL ‘side’ was not very complex (Daniel referred to DLP, today I would say probably OWL 2 RL) but extended with extra rules. What this shows that when RIF will be finished and published, the combination of OWL with RIF may become very important for tons of practical applications. (As an aside, a nice little joke from Daniel: what is the system used by the military today when planning for exercises? The system is called BOGSAT. It stands for ‘Bunch Of Guys Sitting Around a Table’…)

Roland Stuhmer gave a very different style presentation on how user events (clicks, combination of clicks, etc) can be collected, categorized, and integrated into an application, analyzed with some rules for, eg, targeted ads. The system is based on harvesting not only the structure of the Web page, but annotations appearing in the Web page via RDFa. The result is an RDF structure describing the events that can be sent to a server, analyzed locally, distributed, etc. Nice usage of RDFa, but also important to have a Javascript API that can retrieve the RDF triplets from the RDFa structure attached to a specific node. (B.t.w., the old graphics standards of the 80’s and 90’s, called GKS or PHIGS, had notions of combined event structures with different event types. I do not remember all the details any more, but may it be worth looking at those again in a modern setting?)

Personally, the highlight of the day was the presentation of the semantic web challenge finalists. I was member of the jury, which meant that I had to review the submissions in advance and we had two very enjoyable discussions with the rest of the jury on the submissions. We had the first selection the day before, and this time all finalists gave their presentations and demos. And it was a tough task to choose (that is why we had such long discussions:-) because, well, the submissions were great overall. I do not really want to analyze each of the entries; I do not think it would be appropriate for me in this position. But the winner entry for the challenge, namely TrialX, really made a great impression on me. In short, the application is a consumer-centric tool through which patients can find matching clinical trials where they want to participate; it also helps those who organize those trials, etc. It is some sort of a matchmaking tool using all kinds of medical ontologies and vocabularies, public health record data and the like. We should realize the importance of this: here is a great Semantic Web application, winner of the challenge, which is really an application, not only demonstration, already deployed on the Web (soon as an iPhone app, too), and, to be a bit dramatic, may (and possibly has already) save lives. What else to we want as a proof that this technology is not only an academic exercise any more?

Fifth day

Only a partial day for me, as far as the conference goes, because I had to fly out before the end… But I could listen to the last keynote of the conference, ie, that of Nova Spivack.

Not surprisingly, Nova talked about Twine-2, a.k.a. T2. I did not really know what T2 was to be, I only heard that Twine, ie, T1, is moribund. As Nova acknowledged, it is too complicated, it is too hard for users to really figure it out; in fact, most of the users used it for search. Which is not the strongest feature of T1 in the first place.

So T2 is (well, will be) all about semantically backed search. It semantically indexes the Web, with an attempt to extract semantic information from the pages. The user interface would then be some sort of, essentially, faceted interface that would automatically classify the search hit results into different tabs; the user can use these tabs, drill down along other categories, etc. So far nothing radically new, though the user interface Nova showed was indeed very clean and nice. All this is done, internally, via vocabularies/ontologies, using RDF, RDFS, or OWL.

The interesting aspect of T2 (at least as far as I am concerned) is the incorporation of collective knowledge. First of all, T2 will include a system whereby users can add vocabularies that T2 will use in categorization. Users can get back those ontologies in OWL/RDF, they can improve them, etc. The other tool they will provide is a means to help semantically index pages that are, by themselves, not semantically annotated. This can be done via a Firefox extension; users can identify parts of the web pages (I presume, essentially, the DOM nodes) and associate these with classes of specific ontologies. The extension produces an XSLT transformation that can be sent back to the T2 system. Some social mechanism should of course be set up (eg, webmasters annotating their own pages should get a higher priority than third party annotators) but, essentially, it is some sort of a GRDDL transformation by proxy: T2 will have information on how to find transformation to semantically index specific pages without requiring the modification of the pages themselves (in contrast to GRDDL where such transformation is to be referred to from the page itself).

Of course, the system was a bit controversial in this community; indeed, it was not clear whether T2 would make use of the semantic information that do exist in pages already (microformats, RDFa, …) let alone the Linked Open Data information that is already out there. When asked, Nova did not seem to give a clear answer though, to be fair, he did not specifically say no and he also said that the semantic index might be put back to the public in the form of linked data. To be decided. It is also not fully clear whether those proxy-GRDDL transformations would be available for the community at large (hopefully the answer is yes…). It will be interesting to see how it plays out (T2 comes out in beta sometimes early 2010). Certainly a project to keep an eye on.

From a slightly more general point of view it is also interesting to note that two out of the three Semantic Challenge winners are also semantic search engines with different user interfaces (though sig.ma and VisiNav definitely do use the LOD cloud, no question there…). Definitely an area on the move!

I had the time and, frankly, the energy to really listen to only one more paper in the regular track, namely the paper on functions of RDF language elements, by Bernhard Schandl. A nice idea: imagine a traditional spreadsheet, where each cell is a collection of resources from an RDF Graph, or functions that can manipulate those resources (extract information, produce new set of resources, etc). Just like a spreadsheet, if you modify the underlying graph, ie, the resources in a cell, everything is automatically recalculated. Because, just like for a spreadsheet, a function can refer to the result of another function in another cell, one can do fairly complicated transformation and information extraction quite easily. Neat idea, to be tried out from their site.

That is it for ISWC2009. I obviously missed a lot of papers, partly because social life and hallway conversations sometimes had the upper hand, and sometimes simply because there were too many parallel sessions. But it was definitely an enriching week… See you all, hopefully, at ISWC2010, in Shanghai!

September 29, 2009

OWL 2 RL closure

OWL 2 has just been published as a Proposed Recommendation (yay!) which means, in laymen’s term, that the technical work is done, and it is up to the membership of W3C to accept it as a full blown Recommendation.

As I already blogged before, I did some implementation work on a specific piece of OWL 2, namely the OWL 2 RL Profile. (I have also blogged about OWL 2 RL and its importance before, nothing to repeat here.) The implementation itself is not really optimized, and it would probably not stand a chance for any large scale deployment (the reader may want to look at the OWL 2 implementation report for other alternatives).  But I can hope that the resulting service can be useful in getting a feel for what OWL 2 RL can give you: by just adding a few triples into the text box you can see what OWL 2 RL means. This is, by the way, an implementation of the OWL 2 RL rule set, which means that it can also accepts triples that are not mandated by the Direct Semantics of OWL 2 (a.k.a. OWL 2 DL). Put it another way, it is an implementation of a small portion of OWL 2 Full.

The core of my implementation turned out to be really easy straightforward: a forward chaining structure directly encoded in Python. I use RDFLib to handle the RDF triples and the triple store. Each triple in the RDF Graph is considered, compared to the premises of the rules; if there is a match then new triples are added to the Graph. (Well, most of the rules contain several triples to match with, and the usual approach is to pick one and explore the Graph deeper check against additional matches. Which one to pick is important, it may affect the overall speed, though.) If, through such a cycle, no additional triples are added to the Graph then we are done, the “deductive closure” of the Graph has been calculated. The rules of OWL 2 RL have been carefully chosen so that no new resources are added to the Graph (only new triples), ie, this process eventually stops.

The rules themselves are usually simple. Although it is possible and probably more efficient to encode the whole process using some sort of a rule engine (I know of implementations based on, eg, Jena’s rules or Jess), one can simply encode the rules using the usual conditional constructs of the programming language. The number of rules is relatively high but nothing that a good screen editor would not manage with copy-paste. There were only a few rules that required a somewhat more careful coding (usually to take care of lists) or many searches through the graph like, for examples, the rule for property chains (see rule prp-spo2 in the rule set). It is also important to note that the higher number of rules does really not affect the efficiency of the final system; if no triple matches a rule then, well, it just does not fire. No side effect of the mere existence of an unused rule.

So is it all easy and rosy? Not quite. First of all, this implementation is of course simplistic in so far as it generates all possible deducted triples that include a number of trivial triples (like ?x owl:sameAs ?x for all possible resources). That means that the resulting graph becomes fairly big even if the (optional) axiomatic triples are not added. If the OWL 2 RL process is bound to a query engine (eg, the new version of SPARQL will, hopefully, give a precise specification of what it means to have OWL 2 RL reasoning on the data set prior to a SPARQL query) then many of these trivial triples could be generated at query time only, thereby avoiding an extra load on the database. Well, that is one place where a proof-of-concept and simple implementation like mine looses against a more professional one:-)

The second issue was the contrast between RDF triples and “generalized” RDF triples, ie, triples where literals can appear in subject positions and bnodes can appear as properties. OWL 2 explicitly says that it works with generalized triples and the OWL 2 RL rule set also shows why that is necessary. Indeed, consider the following set of triples:

ex:X rdfs:subClassOf [
  a owl:Restriction;
  owl:onProperty [ owl:inverseOf ex:p ];
  owl:allValuesFrom ex:A
].

This is a fairly standard “idiom” even for simple ontologies; one wants to restrict, so to say, the subjects instead of the objects using an OWL property restriction. In other words that restriction combined with

ex:x rdf:type ex:X .
ex:y ex:p ex:x .

should yield

ex:y rdf:type ex:A .

Well, this deduction would not occur through the rule set if non-generalized RDF triples were used. Indeed, the inverse of ex:p is a blank node, ie, using it in a triple is not legal; but using that blank node to denote a property is necessary for the full chain of deductions. In other words, to get that deduction to work properly using RDF and rules, the author of the vocabulary would have to give an explicit URI to the inverse of ex:p. Possible, but slightly unnatural. If generalized triples are used, then the OWL 2 RL rules yield the proper result.

It turns out that, in my case, having bnodes as properties was not really an issue, because RDFLib could handle that directly (is that a bug in RDFLib?). But similar, though slightly more complex or even pathological examples can be constructed involving literals in subject positions, and that was a problem because RDFLib refused to handle those triples. What I had to do was to exchange all literals in the graph against a new bnode, perform all the deductions using those, and exchange the bnodes “back” against their original literals at the end. (This mechanism is not my invention; it is actually described by the RDF Semantics document, in the section on Datatype entailment rules.) B.t.w., the triples returned by the system are all “legal” triples, generalized triples play a role during the deduction only (and illegal triples are filtered out at output).

Literals with datatypes were also a source of problems. This is probably where I spent most of my implementation time (I must thank Michael Schneider who, while developing the test cases for OWL 2 RDF Based Semantics, was constantly pushing me to handle those damn datatypes properly…). Indeed, the underlying RDFLib system is fairly lax on checking the typed literals against their definition by the XSD specification (eg, issues like minimum or maximum values were not checked…). As a consequence, I had to re-implement the lexical to value conversion for all datatypes. Once I found out how to do that (I had dive a bit into the internals of RDFLib but, luckily, Python is an interpretative language…) it became a relatively straightforward, repetitive, and slightly time consuming work. Actually, using bnodes instead of “real” literals made it easier to implement datatype subsumptions, too (eg, the fact that, say, an xsd:byte is also a xsd:integer). This became important so that the rules would work properly on property restrictions involving datatypes.

Bottom line: even for a simple implementation literals, mainly literals with datatypes, are the biggest headache. The rest is really easy.  (This is hardly the discovery of the year, but is nevertheless good to remember…)

I was, actually, carried away a bit once I got a hold on how to handle datatypes, so I also implemented a small “extension” to OWL 2 RL by adding datatype restrictions (one of the really nice new features of OWL 2 but which is not mandated for OWL 2 RL). Imagine you have the following vocabulary item:

ex:RE a owl:Restriction ;
    owl:onProperty ex:p ;
    owl:someValuesFrom [
      a rdfs:Datatype ;
      owl:onDatatype xsd:integer ;
      owl:withRestrictions (
          [ xsd:minInclusive "1"^^xsd:integer ]
          [ xsd:maxInclusive "6"^^xsd:integer ]
      )
   ] .

which defines a restriction on the property ex:p so that some its values should be integers in the [1,6] interval. This means that

ex:q ex:p "2"^^xsd:integer.

yields

ex:q rdf:type ex:RE .

And this could be done by a slight extension of OWL 2 RL; no new rules, just adding the datatype restrictions to the datatypes. Nifty…

That is it. I had fun, and maybe it will be useful to others. The package can also be downloaded and used with RDFLib, by the way…

April 27, 2009

Simple OWL 2 RL service

The W3C OWL Working group has published a number of OWL 2 documents last week. This included an updated version of the OWL 2 RL profile. I have already blogged about this profile (“Bridge Between SW communities: OWL RL”) when the previous release was published; there are no radical changes in this release, so there is no reason to repeat what was said there.

I have been playing with a simple and naive implementation of OWL 2 RL for a while; I have now decided to live dangerously;-) and release the software and the corresponding service. So… you can go to the OWL 2 RL generator service, give an RDF graph, and see what RDF triples an OWL 2 RL system should generate. It should give you some ideas of what OWL 2 RL is all about.

I cannot emphasize enough that this is not a production level tool. Beyond the bugs that I have not yet found, a proper implementation would, for example, optimize the owl:sameAs triples and, instead of storing them in the graph, would generate those on the fly when, say, a SPARQL request is issued. But my goal was not to produce something optimal; instead, I wanted to see whether OWL 2 RL can be implemented without any sophisticated tool or not. The answer is: yes it can. This also means that if I could do it, anybody with a basic knowledge of the underlying RDF environment and programming language (RDFLib and Python in this case) can do it, too. No need to be familiar with any complex algorithms, rule language implementation tricks, complicated external tools, description logic concepts, whatever…

December 3, 2008

Bridge between SW communities: OWL RL

The W3C OWL Working Group has just published a series of documents for the new version of OWL, most of them being so-called Last Call Working Drafts (which, in the W3C jargon, means that the design is done; after this, it will only change in response to new problems showing up).

There are many aspects of the new OWL 2 that are of a great interest; I would concentrate here on only one of those, namely the so-called OWL RL Profile. OWL 2 defines several “profiles”, which are subsets of the full OWL 2; subsets that have some good properties, e.g., in terms of implementability. OWL RL is one of those. “RL” stands for “Rule Language” and what this means is that OWL RL is simple enough to be implemented by a traditional (say, Prolog-like) rule engine or can be easily programmed directly in just about any programming language. There is of course a price: the possibilities offered by OWL RL are restricted in terms of building a vocabulary, so there is a delicate balance here. Such rule oriented versions of OWL have also precedences: Herman ter Horst published, some years ago, a profile calld pD*; a number of triple store vendors have a similar, restricted versions of OWL implemented in their systems already, referred to as RDFS++, OWLPrime, or OWLIM; and there has been some more theoretical work done by the research community in this direction, too, usually referred to by “DLP”. The goal was common to all of these: find a subset of OWL that is helpful to build simple vocabularies, and that can be implemented (relatively) easily. Such subsets are also widely seen as more easily understandable and usable by communities that work with RDF(S) and need only a “little bit of OWL” for their applications (instead of building more rigorous and complex ontologies which requires extra skills they may not have). Well, this is  the niche of OWL RL.

OWL RL is defined in terms of a functional, abstract syntax (defining a subset of DL) as well as a set of rules of the sort “if that and that triple pattern exists in the RDF Graph then add these and these triples”. The rule set itself is oblivious to the DL restrictions in the sense that it can be used on any RDF graphs, albeit with a possible loss of completeness. (There is a theorem in the document that describes the exact situation if you are interested.)

The number of rules is fairly high (74 in total), which seems to deceive the goal of simplicity. But this is misleading. Indeed, one has to realize that, for example, these rules subsume most of RDFS (e.g., what the meaning of domain, range, or subproperty is). Around 50 out of the 74 rules simply codify such RDFS definitions or their close equivalents in OWL (what it means to be “same as”, to have equivalent/disjoint properties or classes, that sort of things). All of these are simple, obvious, albeit necessary rules. There are only around 20 rules that bring real extra functionality compared to RDFS for building simple vocabularies. Some of these functionalities are:

  • Characterization of properties as being (a)symmetric, functional, inverse functional, inverse, transitive,…
  • Property chains, ie, defining the composition of two or more properties as being the sub property of another one. (Remember the classic “uncle” relationship that cannot be expressed in terms of OWL 1? Well, by chaining “brother” and “parent” one can say that the chain is a subproperty of “uncle” and that is it…)
  • Intersection and union of classes
  • Limited form of cardinality (only maximum cardinality and only with values 0 and 1) and  property restrictions
  • An “easy key” functionality, i.e., deducing the equivalence of two resources if a list of predefined properties have identical values for them (e.g., if two persons have the same name, same email address, and the same home page URI, then the two persons should be regarded as identical)

Some of these features are new in OWL 2 (property chaining, easy keys), others are already been present in OWL 1.

Quick and dirty implementations of OWL RL can be done fairly easily. Either one uses an existing rule engine (say, Jena rules) and lets the rule engine take its course or one encodes the rules directly on top of an RDF environment like Sesame, RDFLib, or Redland, and uses a simple forward chaining cycle. Of course, this is quick and dirty, i.e., not necessary efficient, because it will generate many extra triples. But if the rule engine can be combined with the query system (SPARQL or other), which is the case for most triple store implementations, the actual generation of some of those extra triples (e.g., <r owl:sameAs r> for all resources) may be avoided. Actually, some of the current triple stores already do such tricks with the OWL profiles they implement. (And, well, when I see the incredible evolution on the size and efficiency of triple stores these days, I wonder whether this is really an issue on long term for a large family of applications.) I actually did such quick and dirty implementation in Python; if you are curious what triples are generated via OWL RL for a specific graph, you can try out a small service I’ve set up. (Caveat: it has not been really thoroughly tested yet, i.e., there are bugs. Neither it is particularly efficient. Do not use it for anything remotely serious!).

So what is the possible role of OWL RL in developing SW applications? I think it will become very important. I usually look at OWL RL as some sort of a “bridge” that allows some RDF/SW applications to evolve in different directions. Such as:

  • Some applications may be perfectly happy with OWL RL as is (usually combined with a SPARQL engine to query the resulting, expanded graph), and they do not really need more in term of vocabulary expressiveness. I actually foresee a very large family of applications in this category.
  • Some applications may want to combine OWL RL with some extra, application specific rules. They can rely on a rule engine fed with the OWL RL rules plus the extra application rules. B.t.w., although the details are still to be fleshed out, the goal is that a RIF implementation would accept OWL RL rules and produce what has to be produced. I.e., RIF compatible implementation would provide a nice environment for these types of applications.
  • Some applications may hit, during their evolution, the limitations of OWL RL in terms of vocabulary building (e.g., they might need more precise cardinality restrictions or the full power of property restrictions). In which case they can try expand their vocabulary towards more complex and formal ontologies using, e.g., OWL DL. They may have to accept some more restrictions because they enter the world of DL, and they would require more complex reasoning engines, but that is the price they might be willing to pay. While developers of applications in the other categories would not necessarily care about that, the fact that the language is also defined in terms of a functional syntax makes (i.e., that that version of OWL RL is integral part of OWL 2) this evolution path easier.

Of course, at the moment, OWL RL is still a Draft, albeit in Last Call. Feedbacks and comments of the community as well as the experience of implementers is vital to finalize it. Comments to the Working Group can be sent to public-owl-comments@w3.org (with public archives).

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,617 other followers