Ivan’s private site

June 8, 2010

RDFa API draft

A few weeks ago I have already blogged about the publication of the RDFa 1.1 drafts. An essential, additional, piece of the RDFa puzzle has now been published (as a First Public Working Draft): the RDFa API. (Note that there is no reference to “1.1” in the title: the API is equally valid for versions 1.0 and 1.1 of RDFa. More about this below.)

Defining an API for RDFa was a slightly complex exercise. Indeed, this API has very different constituencies, ie, possible user communities, and these communities have different needs and background knowledge. On the one hand, you have the (for the lack of a better word) “RDF” community, ie, people who are familiar and comfortable with RDF concepts, and are used to handle triples, triple stores, iterating through triples, etc. Essentially, people who either have already a background in using, say, Jena or RDFLib, or who can grasp these concepts easily due to their background. But, on the other hand, there are also people coming more from the traditional Web Application programmers’ community who may not be that familiar with RDF, and who are looking for an easy way to get to the additional data in the (X)HTML content that RDFa provides. Providing a suitable level of abstraction for both of these communities took quite a while, and this is the reason why the RDFa API FPWD could not be published together with RDFa 1.1.

But we have it now. I will give a few examples below on how this API can be used; look at the draft for more details (there more examples there; the examples in this blog use, actually, some of the examples from the document!). The usual caveat applies: this is a working draft with new releases in the future; comments are more than welcome! (Please, send them to public-rdfa-wg@w3.org, rather than answering to this blog.)

The “non-RDF” user

Let me use the same RDFa example as in the previous blog:

<div prefix="relation: http://vocab.org/relationship/ foaf: http://xmlns.com/foaf/0.1/">
   <p about="http://www.ivan-herman.net/foaf#me"
      rel="relation:spouse" resource="http://www.ivan-herman.net/Eva_Boka"
      property="foaf:name">Ivan Herman</p>
</div>

Encoding the (RDF) triples:

<http://www.ivan-herman.net/foaf#me> <http://vocab.org/relationship/spouse> <http://www.ivan-herman.net/Eva_Boka> .
<http://www.ivan-herman.net/foaf#me> <http://xmlns.com/foaf/0.1/name> "Ivan Herman" .

So how can one get to these using the API? The simplest approach is to get the collection of property-value pairs assigned to a specific subject. Using the API, one can say:

>> var ivan = document.getItemBySubject("http://www.ivan-herman.net/foaf#me")

yielding an instance of what the document calls a ”Property Group”. This object contains all the property-value pairs (well, in RDF terms, the predicate-object pairs) that are associated with http://www.ivan-herman.net/foaf#me. It is then possible to find out what properties are defined for a subject:

>> print(ivan.properties)
<http://vocab.org/relationship/spouse>
<http://xmlns.com/foaf/0.1/name>

It is also possible relate back to the DOM node that holds the subject via ivan.origin, and use the get method to get to the values of a specific property, ie:

>>>print(ivan.get("http://xmlns.com/foaf/0.1/name"))
Ivan Herman

Note that, on the level, the user does not really have to understand the details of RDF, of predicates, etc; what one gets back is, essentially, a literal or a representation of an IRI, and that is about it. Simple, isn’t it?

Of course, there is slightly more to it. First of all, finding the property groups only through the subjects may be too restrictive. The API therefore includes similar methods to search through the content via properties (“return all the property groups whose subject has a specific property”) or via type (i.e., rdf:type in RDF terms, @typeof in RDFa terms). One can also search for DOM Nodes rather than for Property Groups. Eg, using

>> document.getElementsByType("http://http://xmlns.com/foaf/0.1/Person")

one can get hold of the elements that are used for persons (e.g., to highlight these nodes or their corresponding subtrees on the screen by manipulating their CSS style). I also used full URI-s everywhere in the examples; it is also possible to set CURIE like prefixes to make the coding a bit simpler and less error prone.

An that is really it for simple applications. Note that many RDFa content (eg, Facebook‘s Open Graph protocol, or Google‘s snippets) include only a few simple statements whose management is perfectly possible with these few methods.

The RDF user

Of course, RDF users may want more and, sometimes, the complexity of the RDF content does require more complex methods. The RDFa API spec does indeed provide a more complex set of possibilities.

The underlying infrastructure for the API is based on the abstract concept of a store. One can create such a store, or can get hold of the default one via:

>> var store = document.data.store

The default store contains the triples that are retrieved from the current page. It is then possible to iterate through the triples, possibly via an extra, user-specified filter. Furthermore, it is possible to  create (RDF) new triples and add them to the store. One can add converter methods that control how typed literals are mapped onto, say, Javascript objects (although an implementation will provide a default set for you). In some ways, fairly standard RDF programming stuff, yielding, for example, to the following code:

>> triples = document.data.store.filter(); // get all the triples
>> for( var i=0; i < triples.size; i++ ) {
>>   print(triples[i]);
>> }
<http://www.ivan-herman.net/foaf#me> <http://vocab.org/relationship/spouse> <http://www.ivan-herman.net/Eva_Boka>
<http://www.ivan-herman.net/foaf#me> <http://xmlns.com/foaf/0.1/name> "Ivan Herman"

There is one aspect that is very well worth emphasizing. The parser to the store is conceptually separated from the store (similarly to, say, RDFLib). What this means is that, though an implementation provides a default parser for the document, it is perfectly possible to add another parser parsing the same document and putting the triples into the same store. The RDFa API document does not require that the parser must be RDFa 1.1; one can create a separate store and use, for example, an RDFa 1.0 parser. But much more interesting is the fact that one can also add a, say, hCard microformat parser that produces RDF triples. The triples may be added to the same store, essentially merging the underlying RDF graphs. Eg, one can have the following:

>> store = document.data.store; // by default, this store has all the RDFa 1.1 content
>> document.data.createParser("hCard",store).parse();

merging the RDFa and the hCard terms in one triple store. I think the possibility to use the RDFa API to, at last, merge the different RDF interpretable syntaxes in this area in one place is very powerful. (It must be noted that the details of the parser interface is still changing; e.g., it is not yet clear how various parsers should be identified. This is for one of the next releases…)

As I said, comments are more than welcome, there is still work to do, but the first, extremely important step has been made!

May 28, 2010

Self-documenting vocabularies using RDFa

Filed under: Semantic Web,Work Related — Ivan Herman @ 18:17
Tags: , , ,

This was one of the use cases some of us had in mind when RDFa was being developed, and it is nice to see that happening in practice… Olaf Hartig and Jun Zhao have recently published a provenance vocabulary. I am not knowledgeable enough to get into the detail of the provenance part. However, what also caught my attention is the way the vocabulary is defined: it is using XHTML+RDFa. So, while the URI above leads to a nice XHTML version of the vocabulary, readable by humans, the same source can also be used to get to the formal, RDF version of the vocabulary. Just use a distiller or extractor of any kind. I.e., do not repeat yourself, even when defining a formal vocabulary… I find this cool.

April 22, 2010

RDFa 1.1 Drafts

Filed under: Semantic Web,Work Related — Ivan Herman @ 15:52
Tags: , , ,

W3C has just published two RDFa 1.1 documents. In the W3C jargon these are called “First Public Working Drafts”, which means that they are by no means complete and features may still change based on community feedback. However, they document the intentions of the Working Group as for the direction it thinks it should take. (Note that another FPWD, namely an RDFa API specification, should follow very soon; hopefully a new version of the HTML+RDFa will be issued soon, too, incorporating these changes.) It may be interesting to summarize some of the important changes compared to the previous version of RDFa; that is what I will attempt to do. By the way, if you want to comment on RDFa 1.1, I would prefer you commented on the Group’s dedicated mailing list rather than on this blog: public-rdfa-wg@w3.org (see also the archives).

So, what are the new things?

1. Separation of RDFa 1.1 Core and RDFa 1.1 XHTML. This is one of the differences visible immediately: instead of one specification (which was the case for RDFa 1.0) we have now two. This comes from the fact that the RDFa attributes may be, and have already been, used in XML languages other than XHTML (SVG or ODF are good examples). Separation of the Core and the XHTML specific features is just a to make such adaptations cleaner. (I will use XHTML examples in this blog, though.)

2. Default term URI (a.k.a. @vocab attribute). RDFa 1.0 has a number of terms that can be used with attributes like @rel or @rev without any further qualifications. Examples are ‘next’ or ‘license’. These values were “inherited” by RDFa 1.0 from HTML; the spec simply assigned a URI for each of those to use them as RDF properties (eg, http://www.w3.org/1999/xhtml/vocab#next or http://www.w3.org/1999/xhtml/vocab#license).

However, a more flexible way of defining such terms, ie, not sticking to a fixed set and URI-s, is very useful for HTML authors in general. That is achieved by the new @vocab attribute: it defines a ‘default term URI’ that is concatenated with any term to form a URI. The mechanism can be applied to most of the RDFa attributes, not only to @rel and @rev. For example, the following RDFa 1.1 code:

<div vocab="http://xmlns.com/foaf/0.1/">
   <p about="#me" typeof="Person" property="name">Ivan Herman</p>
</div>

will generate the familiar FOAF triples:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
<#me> a foaf:Person;
      foaf:name "Ivan Herman" .

The effect of @vocab is of course valid for the whole subtree starting at <div> (unless another @vocab takes it over somewhere down the line). This makes simple annotations with RDFa very easy for authors who do not want to deal with the intricacies of URI-s and CURIE-s.

3. Profile documents for terms. The problem with the @vocab mechanism is that it works only with one vocabulary. However, in many cases, one want to mix vocabularies: after all, this is one of the main advantages of using RDF (and hence RDFa). Eg, one would like to encode something like:

@prefix foaf: <http://xmlns.com/foaf/0.1/> .
@prefix rel: <http://vocab.org/relationship/> .
<#me> a foaf:Person;
      foaf:name "Ivan Herman" ;
      rel:spouseOf <http://www.ivan-herman.net/Eva_Boka> .

A solution is to use a profile document. This is a simple RDF document that can be made available in various serializations (though only RFDa is mandatory for an RDFa processor) and describes the mapping of terms to URIs. Eg, one could define the http://example.org/simple-profile.ttl file (using here Turtle syntax for simplicity):

@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
[] rdfa:uri "http://vocab.org/relationship/spouseOf";
  rdfa:term "spouse" .
[] rdfa:uri "http://xmlns.com/foaf/0.1/name";
  rdfa:term "name" .
[] rdfa:uri "http://xmlns.com/foaf/0.1/Person";
  rdfa:term "Person" .

and use that document in an RDFa file as follows:

<div profile="http://example.org/simple-profile.ttl">
   <p about="#me" typeof="Person"
      rel="spouse" resource="http://www.ivan-herman.net/Eva_Boka" 
      property="name">Ivan Herman</p>
</div>

yielding the triples we wanted. Note that the @profile attribute allows for several URIs, ie, for several profile documents, and the corresponding term definitions are merged. Here again, a @profile attribute down the tree will supersede term definitions.

Of course, using the profile document is a heavier mechanism and requires, at least conceptually, an extra HTTP request. I am sure there will be community comments on that. However, I personally do not expect average authors to fiddle around much with those profile files. Instead, vocabulary publishers like Google, Yahoo, Facebook, Dublin Core, or Creative Commons may publish the terms they use and understand in the form of such profile documents, and authors could simply refer to those. RDFa tools could (and are encouraged to) cache the information stored in those widely used profile documents which alleviates the HTTP request issue in practice.

4. Profile documents for prefixes. Using profile documents for terms is great but it does not scale very well. If the vocabularies are large then publishers of those vocabularies would have to create and maintain fairly large files. In such cases the CURIE mechanism, already defined in RDFa 1.0, becomes a good alternative: instead of having a separate URI defined explicitly for each term, one can use an abbreviations for the “base” URIs of those vocabularies via prefixes.

In RDFa 1.0 this required an author to add a series of ‘xmlns:XXX’ attributes to the RDFa content. That mechanism is still valid in RDFa 1.1, but one can also define prefixes as part of a profile document. Eg, the previous turtle fragment could have been written as

@prefix rdfa: <http://www.w3.org/ns/rdfa#> .
[] rdfa:uri "http://vocab.org/relationship/";
  rdfa:prefix "relation" .
[] rdfa:uri "http://xmlns.com/foaf/0.1/";
  rdfa:prefix "foaf" .

and the RDFa fragment would then look like:

<div profile="http://example.org/simple-profile.ttl">
   <p about="#me" typeof="foaf:Person"
      rel="relation:spouse" resource="http://www.ivan-herman.net/Eva_Boka" 
      property="foaf:name">Ivan Herman</p>
</div>

to generate the same triples as before. This mechanism may become important, as I said, when several large vocabularies (or simply a large number of vocabularies) are used.

5. Usage of @prefix to define CURIE prefixes. Actually, the usage of the xmlns:XXX syntax to set CURIE prefixes was (and is) controversial; there may be host languages that do not work with such attributes. RDFa 1.1 provides therefore an alternative which, though semantically identical, avoids the usage of a ‘:’ character in the attribute name. Using this @prefix attribute an alternative to the previous RDFa could be:

<div prefix="relation: http://vocab.org/relationship/ foaf: http://xmlns.com/foaf/0.1/">
   <p about="#me" typeof="foaf:Person"
      rel="relation:spouse" resource="http://www.ivan-herman.net/Eva_Boka" 
      property="foaf:name">Ivan Herman</p>
</div>

which looks very much like an RDFa 1.0 document but using the @prefix attributes instead of @xmlns:foaf and @xmlns:relation. The old @xmlns: approach remains valid, of course, but the new one is preferred from now on.

6. URIs everywhere. The final item I mention is the possibility to use URI-s everywhere, ie, to bypass the CURIE abbreviation mechanism if so desired. Whereas some of the RDFa 1.0 attributes required the usage of CURIE-s (eg, @rel or @property), this is no longer true in RDFa 1.1. The rules are fairly simple: if an attribute value is of the form ‘pp:xxx’ and the ‘pp’ string cannot be interpreted as a CURIE prefix then the string ‘pp:xxx’ is considered to be a URI and is treated as such in the generated RDF. That also means that CURIE-s can be used in @about and @resource without the slightly awkward ‘safe’ CURIE-s.

The development of RDFa 1.1 is obviously not done yet; lots of details are to be checked, and some additional minor features (eg, possible changes on the handling of XML Literals) are still to be worked out. And, first and foremost, community comments on the directions taken are important. But these First Public Working Drafts give the general direction…

March 24, 2010

Twitter search results in RDF

Filed under: Semantic Web,Work Related — Ivan Herman @ 23:58
Tags: , , ,

This is cool: with the Shredded Tweet site one can search a twitter tag, get back the result in RDFa. Ie, the result can be ran through an RDFa tool to get the data in RDF, ready to be mashed up with other stuffs. For example, one can search the #w3c tag to get to the RDFa page; run it to the RDFa distiller, to yield an RDF data:

<http://twitter.com/Eyalsela/statuses/10981280232> a <sioc:Post> ;
     dcterms:created "2010-03-24T14:23:06+00:00" ;
     sioc:content "At our second meetup: Mobile applications and websites development
        #w3cdf #W3C [English:http://j.mp/bS8qLU Hebrew: http://j.mp/a3i8Tv"@en ;
     sioc:has_creator <http://semantictweet.com/Eyalsela#me> ;
     sioc:has_topic <http://search.twitter.com/search?q=%23W3C>,  ;
     sioc:links_to <http://j.mp/a3i8Tv> . 

<http://twitter.com/JimThijssen/statuses/10979370533> a <sioc:Post> ;
     dcterms:created "2010-03-24T13:40:04+00:00" ;
     sioc:content "Yeahh #www.stagematcher.nl is volledig #html #w3c #certified B-)"@en ;
     sioc:has_creator <http://semantictweet.com/JimThijssen#me> ;
     sioc:has_topic &t;http://search.twitter.com/search?q=%23certified>,
       <http://search.twitter.com/search?q=%23html>,
       <http://search.twitter.com/search?q=%23w3c> .
...

The RDF content also be directed used, eg, in a SPARQL call, but the combined URI-s of the distiller service and the original data; ugly URI, but can be minted programatically fairly easily. This is cool…

Reblog this post [with Zemanta]

December 12, 2009

RDFa usage spreading…

Filed under: Semantic Web,Work Related — Ivan Herman @ 14:53
Tags: ,

It may be that I was not attentive enough, ie, some of these may be old(er) news. But I did hit two interesting RDFa related news yesterday and today (both via twitter, b.t.w.) that I think are really noteworthy.

1. A blog from Priyank Mohan “Online retail : How is using Semantic Technology to define a new trend” reported about a talk given by Jay Myers from Best Buy. Best Buy started using RDFa a while ago already, but they recently added statements using the GoodRelations Ontology that Martin Hepp published. What Jay said (quoting from Priyank’s blog here):

  1. GoodRelations + RDFa improved the rank of the respective pages in Google tremendously… In fact, if you try the query “BestBuy Ferris Bueller” on Google, then the page comes on rank # 1 ahead of the much more established page . This indicates a strong effect of GoodRelations + RDFa on Google’s appreciation of a page.…
  2. Jay also reported a 30 % percent (!) increase in traffic on the BestBuy stores pages
  3. Yahoo observes a 15% increase in the Click-through-Rate (CTR). Nick Cox from Yahoo also recently reported that augmented search results, e.g. those with GoodRelations / RDFa in Yahoo get a 15 % higher Click-through-Rate (CTR).

There has been some discussions on twitter whether those numbers (eg, 30%) are really reliable, and maybe these statements are indeed too good to be fully true. But even if the 30% is only 15%, it is still quite an achievement!

2. This morning I found out that O’Reilly has begun to systemically add RDFa to their catalog pages. Eg, the page on the “Switching to the Mac” book can produce the RDF information using the RDFa distiller. Note the code uses well established vocabularies: Core FRBR, GoodRelations, Foaf, Dublin Core… ie, using this data with other mashup sites become much easier!

Great news. And, by the way, it worth noting that both also relate to Martin’s GoodRelations Ontology. That stuff is really coming to the fore, too…

October 30, 2009

ISWC2009 4-5

Filed under: Semantic Web,Work Related — Ivan Herman @ 0:59
Tags: , , , , , ,

Fourth day

Shame on me, but I missed the morning keynote… I was a bit late arriving to the conference site and I got stuck in a conversation at breakfast. Things happen…

The most notable event in the morning, at least for me, was the SPARQL WG panel. All members of the Working Group (me included) were on the panel and the room was full. I mean, full, people were standing in the back. And I regard that as a success by itself, it shows not only the overall importance of SPARQL, but the real interest around the new version, ie, SPARQL 1.1 (in case you have missed it, the first working draft has just been published a few days ago). Lee Feigenbaum (co-chair of the group) gave a quick overview of the new features and then questions came.

The difficulty of the SPARQL 1.1 work is that it has to find a balance between what is realistic to standardize in a relatively short time frame and what could be good to see in a new query language. As a consequence, there are features that the community has discussed but have not made it into the document, or only in a simple format. That came up during the discussion but I had the impression that the audience, by and large, understood this balance. Actually, for some, the set of new features were even too much for an efficient implementation. I have the feeling that  the WG will have to publish a separate conformance document (a bit like OWL 2 has), because there is a certain confusion on whether a conforming SPARQL implementation will have to implement, say, update or inference regimes or not. That clearly came up through the questions. Anyway, remember one email address (yes, it is a bit of a mouthful): public-rdf-dawg-comments@w3.org this is where comments have to be sent on SPARQL 1.1!

I chaired a session on the use track in the afternoon.  The paper of Daniel Elenius et al on reasoning about resources (for military exercises) was interesting to me because it was based on reasoning with relatively large OWL ontologies plus rules. The OWL ‘side’ was not very complex (Daniel referred to DLP, today I would say probably OWL 2 RL) but extended with extra rules. What this shows that when RIF will be finished and published, the combination of OWL with RIF may become very important for tons of practical applications. (As an aside, a nice little joke from Daniel: what is the system used by the military today when planning for exercises? The system is called BOGSAT. It stands for ‘Bunch Of Guys Sitting Around a Table’…)

Roland Stuhmer gave a very different style presentation on how user events (clicks, combination of clicks, etc) can be collected, categorized, and integrated into an application, analyzed with some rules for, eg, targeted ads. The system is based on harvesting not only the structure of the Web page, but annotations appearing in the Web page via RDFa. The result is an RDF structure describing the events that can be sent to a server, analyzed locally, distributed, etc. Nice usage of RDFa, but also important to have a Javascript API that can retrieve the RDF triplets from the RDFa structure attached to a specific node. (B.t.w., the old graphics standards of the 80′s and 90′s, called GKS or PHIGS, had notions of combined event structures with different event types. I do not remember all the details any more, but may it be worth looking at those again in a modern setting?)

Personally, the highlight of the day was the presentation of the semantic web challenge finalists. I was member of the jury, which meant that I had to review the submissions in advance and we had two very enjoyable discussions with the rest of the jury on the submissions. We had the first selection the day before, and this time all finalists gave their presentations and demos. And it was a tough task to choose (that is why we had such long discussions:-) because, well, the submissions were great overall. I do not really want to analyze each of the entries; I do not think it would be appropriate for me in this position. But the winner entry for the challenge, namely TrialX, really made a great impression on me. In short, the application is a consumer-centric tool through which patients can find matching clinical trials where they want to participate; it also helps those who organize those trials, etc. It is some sort of a matchmaking tool using all kinds of medical ontologies and vocabularies, public health record data and the like. We should realize the importance of this: here is a great Semantic Web application, winner of the challenge, which is really an application, not only demonstration, already deployed on the Web (soon as an iPhone app, too), and, to be a bit dramatic, may (and possibly has already) save lives. What else to we want as a proof that this technology is not only an academic exercise any more?

Fifth day

Only a partial day for me, as far as the conference goes, because I had to fly out before the end… But I could listen to the last keynote of the conference, ie, that of Nova Spivack.

Not surprisingly, Nova talked about Twine-2, a.k.a. T2. I did not really know what T2 was to be, I only heard that Twine, ie, T1, is moribund. As Nova acknowledged, it is too complicated, it is too hard for users to really figure it out; in fact, most of the users used it for search. Which is not the strongest feature of T1 in the first place.

So T2 is (well, will be) all about semantically backed search. It semantically indexes the Web, with an attempt to extract semantic information from the pages. The user interface would then be some sort of, essentially, faceted interface that would automatically classify the search hit results into different tabs; the user can use these tabs, drill down along other categories, etc. So far nothing radically new, though the user interface Nova showed was indeed very clean and nice. All this is done, internally, via vocabularies/ontologies, using RDF, RDFS, or OWL.

The interesting aspect of T2 (at least as far as I am concerned) is the incorporation of collective knowledge. First of all, T2 will include a system whereby users can add vocabularies that T2 will use in categorization. Users can get back those ontologies in OWL/RDF, they can improve them, etc. The other tool they will provide is a means to help semantically index pages that are, by themselves, not semantically annotated. This can be done via a Firefox extension; users can identify parts of the web pages (I presume, essentially, the DOM nodes) and associate these with classes of specific ontologies. The extension produces an XSLT transformation that can be sent back to the T2 system. Some social mechanism should of course be set up (eg, webmasters annotating their own pages should get a higher priority than third party annotators) but, essentially, it is some sort of a GRDDL transformation by proxy: T2 will have information on how to find transformation to semantically index specific pages without requiring the modification of the pages themselves (in contrast to GRDDL where such transformation is to be referred to from the page itself).

Of course, the system was a bit controversial in this community; indeed, it was not clear whether T2 would make use of the semantic information that do exist in pages already (microformats, RDFa, …) let alone the Linked Open Data information that is already out there. When asked, Nova did not seem to give a clear answer though, to be fair, he did not specifically say no and he also said that the semantic index might be put back to the public in the form of linked data. To be decided. It is also not fully clear whether those proxy-GRDDL transformations would be available for the community at large (hopefully the answer is yes…). It will be interesting to see how it plays out (T2 comes out in beta sometimes early 2010). Certainly a project to keep an eye on.

From a slightly more general point of view it is also interesting to note that two out of the three Semantic Challenge winners are also semantic search engines with different user interfaces (though sig.ma and VisiNav definitely do use the LOD cloud, no question there…). Definitely an area on the move!

I had the time and, frankly, the energy to really listen to only one more paper in the regular track, namely the paper on functions of RDF language elements, by Bernhard Schandl. A nice idea: imagine a traditional spreadsheet, where each cell is a collection of resources from an RDF Graph, or functions that can manipulate those resources (extract information, produce new set of resources, etc). Just like a spreadsheet, if you modify the underlying graph, ie, the resources in a cell, everything is automatically recalculated. Because, just like for a spreadsheet, a function can refer to the result of another function in another cell, one can do fairly complicated transformation and information extraction quite easily. Neat idea, to be tried out from their site.

That is it for ISWC2009. I obviously missed a lot of papers, partly because social life and hallway conversations sometimes had the upper hand, and sometimes simply because there were too many parallel sessions. But it was definitely an enriching week… See you all, hopefully, at ISWC2010, in Shanghai!

October 28, 2009

ISWC2009 2-3

Second day

In fact, there is much less to say… In the morning I was on two workshops; I was at the Uncertainty Reasoning on the SW one for a while, but then I was asked to participate at a panel at the Semantics for the Rest of Us one, so I had to switch. This was a bit unfortunate, because I could not really ‘dive in’ to any of the two. And my afternoon was taken up by ‘networking’, catching up with some people on many many issues that are not worth blogging (yet?).

I listened to Kathryn Laskey’s presentation on how to combine probability theory in the mathematical sense (the good old Kolmogorov axiomatic theory on probability that I learned at university in a distant past…) with first order logic. I cannot claim to have really understood all the details but it made me curious enough to put reading her paper on my to do list…

As for the panel “Little vs Large Semantics: What’s next for the Semantic Web languages?”, with Leigh, Kendall and Ora on the panel besides me… it was not that exciting, I must admit. Maybe the main message I take away from it was the passionate request of Chris Welty to re-open RDF (see also Pat’s keynote below!).

Third day (well, first real conference day)

Preamble: I would have wanted to add links to papers. And I couldn’t: I have not found the papers on the Web. Neither on Springer’s site nor elsewhere. I may have missed a reference somewhere, if somebody knows then tell me. But if the papers are not available, I think it is a shame…

The conference began with a keynote of Pat Hayes. Entertaining and also thought provoking; Pat is a great speaker. What really interested me is his talk on ‘RDF Redux’; I was actually anxious to listen to that one at SemTech last June but he had to call this off back then. So he repeated it here. This is typically the kind of talk that needs more thinking afterwards to understand it (and Pat has promised to write it down!), but he essentially proposed to re-think and re-do some of the fundamentals of RDF semantics. Instead of set-based model theory which we have today, and which makes the treatment of b-nodes, shall we say, a bit complicated (some would use harsher words:-) we should consider RDF graphs as ‘things’ on a ‘surface’ (think of it as a real surface on a sheet of paper) and b-nodes are just ‘scratches on that  surface’. (A bit like ‘context’ of a graph?) Because these surfaces are different from one graph to the other, when a merge occurs then in fact a new surface is created where the unified graph is put, and the issue of b-nodes becomes natural (instead of the ‘renaming’ procedure that the current semantics document describes). Pat claims that the whole semantics could be re-written that way and none of the current RDF implementations would change. But one can go one step further: there may be different kinds of surfaces (eg, negations) and surfaces can have a name (a bit like named graphs) and all can be put together to provide a powerful semantics for these entities. His further claim was that such an extended semantics of RDF could be powerful enough to describe, conceptually, RDFS or even OWL, ie, the semantics should not be layered any more.

No way I would accept all this argumentation on face value:-), so I have to think about this and, mainly, read whatever Pat may want to write down to understand it. In the meantime, I may have to look into the concepts of conceptual graphs, and the Peircian notation of logic that Pat referred to as inspiration…

A more general take away (see also Chris’ remark above): maybe it is time to look into RDF again? A scary thought. Touching to something that is fundamental on the SW has to be done with extreme care… We will see.

There were two papers in the same session that were very close in subject and topic: one of Jesse Weaver and Jim Hendler on the parallel materialization of RDFS graphs and the one of Jacopo Urbani et al on using MapReduce for RDFS reasoning. (Sigh…, this is where I would like to put a reference!) Both aimed at similar challenges, namely the materialization of RDFS inference results of a graph using parallel computing methods. And there was one more similarity: both had some sort of a classification of the rules in the rule set described in the RDF Semantics document to help improving the processing. (Eg, to analyze which rules should be duplicated among processing nodes and which one can be handled without, or which one need a special treatment for a map-reduce pair). It seems that it would be worthwhile to see if some of these classifications (‘ontology rules’ and the like) could be extended to OWL 2 RL (Jesse Weaver told me afterwards that they want to look into this).  But, to put things into perspective: we are the points when billions of triples can be expanded with relative ease. Who would have thought a few years ago? There was also a remark on one of Jesse’s slide (I do not remember the exact wording) which said that RDFS is insanely parallelizable:-)  It was a really interesting session.

The SW in use session included  a paper from Landong Zuo et al “Supporting multi-view network analysis to understand company value chains”.  Integrating a bunch of data in the UK on companies, integrating them in an RDF store, and let users get information on the ‘value chains’, ie, how companies relate to one another as producers/consumers. Technically, the interesting point was the fact that users had the possibility to interactively add new relationships, new classifications to the system, essentially new rules that could be evaluated. The whole system seemed to be a really cool, a well engineered and well functioning machinery. As the speaker put it, although all conclusions drawn from the system could be found by the users by analyzing databases, but it would take weeks to do what this system can give them in a few minutes. This is exactly the kind of message we need for the outside  world about the usefulness of Semantic Web technologies.

On another session Martin Szomszor presented an experiment they conducted at the ESWC conference, combining RFID-based personal badges with an underlying SW system. The resulting system could be used to show personal contacts among delegates, could help people find others with similar interest, could retrace later whom one met at what point (“I remember talking to that chap, but I do not remember his name!”), etc. Lots of privacy issues, for example, but I would have liked to see that in practice, that is for sure!

Stéphane Corlosquet’s presentation on SW and Drupal was really exciting. I already knew about the plans of Drupal 7 to incorporate RDF management from the start, that all Drupal 7 pages will be annotated via RDFa. The RDFa community has been  fairly excited about that for a while now. But the work done by Stéphane and others provide some additional modules that makes it easy to add a SPARQL endpoint to a Drupal based site easily, to import other RDF content, or to manage the vocabularies used on the pages and the like. They already have such a system running with the current Drupal, but these modules will become part of the standard Drupal 7 module set that one can download from the drupal site. And that is cool.  It significantly lowers the barrier to build Web sites that are prepared to be part of the Linked Data cloud, even if the system administrators are not SW experts. I expect this to open up quite a lot of possibilities…

Off to the next day! More paper and the presentation of the Semantic Web challenge finalists…

June 19, 2009

SemTech2009 impressions

The first and possibly most important aspect of SemTech 2009 is that… it happened! I must admit that back in April-May, when the conference’s Web Site did not include any news of the program yet, I was a bit concerned that the general economic malaise would kill this year’s conference. O.k., I might have been paranoiac, but I think some level of concern was indeed legitimate. And… not only did the conference happen as planned, but the numbers were essentially the same as last year’s (over 1000). I think that by itself is an important sign of the interest in Semantic Technologies. Kudos to the organizers!

A general trend that was reaffirmed this year: by now, Semantic Web technologies are the obvious reference points for almost all presentations, products, etc, that were presented at the event. RDF(S), RDFa, OWL, SPARQL, etc, have become household names; newer specs like SKOS or POWDER may not have been as widely referred to yet, but I am sure that will come, too. Linked Data (and, more specifically, the Linked Open Data cloud) were almost ubiquitous this year while I do not believe that it was even mentioned last year. That is a huge change (although I still miss real “user facing” applications of LOD to show up; some, like Talis’ system deployed at UK universities, were presented but not as part of the regular conference). All that being said, I somehow seem to have missed more sessions than last year, which make my impressions more patchy. There were several journal interviews that I could not escape, hallway discussions that were great but made me miss a presentation here and there… I guess this is what happens when you have such a number of people around!

Tom Tague (from Open Calais) gave a very nice opening keynote. His talk was actually not on Open Calais (he did that in 2008), but rather on his experience in talking to different people who tried to start up new ventures in the Semantic Web area (a quote from his talk: “in 80% of the discussions I did not understand what the vendors wanted, and I walked away with my cheque book intact… Simplify!”). The main areas that he looked at were tools, social, advertising, search, publishing, user interface. One of the remarks I liked was on search: in his view (and I think I agree with that) Semantic Technologies may not be really interesting for general search (where the statistical, i.e., brute force methods work well) but for specialized, area-specific search tools (things like GoPubMed or applications deployed at, eg, Eli Lilly or experimented with at Elsevier come to my mind as good examples). Similarly, these technologies are not necessarily of interest for general, “robotic” publication tools like Google’s news, but for high quality publishing, with possible editorial oversight (reducing costs and difficulties).

(He also had a nice text on one of his slides: “Web2.0: Take Web 1.0, add a liberal dash of social, generous amounts of user generated content, atomize your content assets and stir until fully confused”:-)

Tom Gruber talked about his newest project: SIRI. A super-duper personal assistant running on an iPhone with conversational (voice directed) interface. The group behind it integrates a bunch of info on the Web (the “usual” stuffs like restaurants and travel sites), categorize them, and hide the complexities behind a sexy user interface. The problem I have is that I just do not see how this would scale. I see one of the major promises of the Semantic Web getting data in RDF out there so that such, essentially mash-up applications would become much easier to create and maintain. Until then, it is really tedious… On a more personal note, I am not sure I would like the voice conversational interface. I know that I have never used the voice commands on my phone for example; I do not feel comfortable with it. But, well, that is probably only me…

Chime Ogbuji made a really nice presentation on the system they have developed at the Cleveland Clinic. Great combination of RDF, OWL, and SPARQL. The interesting aspect (for me) was that usage of a medical expert system called Cyc, which is used to convert the doctor’s question in natural language (insofar as a question full of medical jargon can be considered as “natural”:-) into, essentially, a SPARQL query. The medical ontologies are used to direct this conversion process, and then the triple store could be queried through the generated query. Impressive work. (Part of it was documented in a W3C use case, but this presentation had a different emphasis.)

Unfortunately, I had to skip Peter Mika’s presentation on the SearchMonkey experiences, I will have to look at his slides… But, as a last minute addition to the program, the organizers succeeded in getting Othar Hansson and Kavi Goel to talk about Google’s rich sniplets. I have already blogged on this a few weeks ago but this presentation made the goal of the project way more understandable. Essentially, by recognizing specific microformat or RDFa vocabularies, they can improve the user experience by adding extra information on the search result. It is interesting to observe the difference between Yahoo! and Google in this respect: both of them use microformats/RDFa for the same general goal but, whereas Yahoo! relies on the community providing applications and on users personalizing their own search result page, Google controls the output in a generic way that does not require further user actions. It will be interesting to see how these differences influence people’s usage patterns. There were some discussion on the Google’s choice on vocabularies; the presenters made it quite clear that they are perfectly happy using other vocabularies (eg, vCard or FOAF) if they become pervasive, and this is a discussion that Google plans to engage with the community. There is of course a chicken-and-egg issue there (if a vocabulary is known by Google, then it will be more widely used, too), and this is cleary an area to discuss further. But these are details. The very fact that both Yahoo! and Google look at microformats and RDFa is what counts! Who would have thought just about a year ago?

I was not particularl impressed by the Semantic Search panel. I had the impression that the participants did not really know what they should say and talk about:-(

Nice presentation by Jeffrey Smitz from Boeing on a system called SPARQL Server pages. Essentially: the user can use similar structures like, say, a PHP page, ie, a mixture of HTML tags and server “calls”, except that this “calls” refer to SPARQL queries against a triple store on the server. Their system also includes some rule based OWL reasoning on the server side, although I am not sure I got all the details. All in all, the system seemed a bit complex, but the general approach is interesting! And it is nice to see that a company like Boeing seems to make good use of RDF+OWL+SPARQL; it would be good to know more…

I missed Zepheira’s presentation on freemix which is a shame, but, well, it happens. But I did play with freemix before travelling to San Jose;  I called it “Exhibit for the masses”. And this, I think, is a fair characterization. David Huynh’s exhibit is a really nice tool, but it is not easy to use it. On the other hand, it took me about 2 minutes to make a visualization of a json data set I used for an exhibit page elsewhere…

Andraz Tori talked about Common tag, a small vocabulary that, for example, can be used when marking up texts with tags (something that engines like Zemanta or Open Calais do). Bringing the RDF and the tagging worlds together is really important; I am very curious how successful this initiative will be…

The keynote on the last day was from the New York Times (by Evan Sandhaus and Robert Larson). It was quite interesting to see how a reputable journal like the NYT has developed a tradition of indexing, abstracting, cataloging articles, how these are archived and searched. Impressive. It is also great that the NYT Annotated Corpus has been released to the Research community. I did not know about that and, I presume, this must be a great resource for a lot of people active in the are of, say, natural language processing. Finally they announced their intention to release their thesaurus in a Semantic Web format, to add a “blob” to the Linked Data Cloud. They still have to work out the details (and expect feedback from the community) and I would hope they would publish a SKOS thesaurus and might even annotate the news items on their web site using this thesaurus in RDFa. But something in this space will happen, that is for sure! Other reputable newspapers, like Le Monde, the Guardian, NRC Handelsblatt,  el Pais, will you follow?

I also had my share of talking: gave an intro tutorial to SW, gave an overview of what is happening at W3C (quite a lot this year, including the finalization of POWDER, OWL 2, and SKOS!) and participated at an OWL 2 panel (with Mike Smith, Zhe Wu, Deb McGuinnis, and Ian Horrocks). I was quite happy with the tutorial and the way the panel went; the audience for the talk could have been a bit larger. But, well…

It was a long week, long trips, not much sleep… but well worth it!

Reblog this post [with Zemanta]

May 13, 2009

RDFa, Google

Filed under: Semantic Web,Work Related — Ivan Herman @ 8:31
Tags: , , , ,

Imagine that you have a review of a restaurant on your page. In your HTML, you show the name of the restaurant, the address and phone number, the number of users who have provided reviews, and the average rating. People can read and understand this information, but to a computer it is nothing but strings of unstructured text. With microformats or RDFa, you can label each piece of text to make it clear that it represents a certain type of data: for example, a restaurant name, an address, or a rating. This is done by providing additional HTML tags that computers understand.

No, this is not an extract of a page at W3C explaining the role of RDFa, or some academics’ paper on the role of RDF and RDFa. This is an extract of a Google page written for webmasters and explaining why adding such information to a page is useful. It is followed by:

These don’t affect the appearance of your pages, but Google and any other services that look at the HTML can use the tags to better understand your information, and display it in useful ways—for example, in search results.

This was published while I was sleeping; by the time I woke up in Brisbane twitter, mailing lists, news sites, or the blogosphere already talked about the news: RDFa is adopted by Google! Ie, this blog is hardly a prime news any more:-( But this is such a good news that I still felt compelled to write. After Yahoo’s SearchMonkey announcement last year (gosh, what a news that was, too!), the fact that social sites like SlideShare, systems like Drupal, or public thesauri sites like the Library Congress’ Subject Heading page have all adopted all RDFa, this Google announcement means that RDFa is in the mainstream now, that all the work that people have put into this technology is now paying off at last. I must admit, it is a good feeling…

Of course, lot is still to be done. The quoted Google page refers to some specific vocabularies that, I presume, will be indexed explicitly at first (reviews, people, products, and business and organizations). I would expect other vocabularies will follow in due time, developed by various communities around the globe. (As a first step, it would be great if Google adopted some of the exisiting vocabularies like SIOC, DOAP, FOAF, DC…). Developing the right vocabularies for the right communities is still a challenge that the Semantic Web community has to work on. And what would make me even happier would be to welcome Google’s developers to participate in the definition of those, together with the rest of the community, in line with the decentralized nature of vocabulary definitions.

But this is for tomorrow and the day after tomorrow. Today, I am just happy to have this news, and worry about the next step later…

May 1, 2009

Library of Congress Subject Headings in SKOS on line

We may remember the experimental lcsh.info site that published the LoC vocabularies on line in SKOS. Well, while lcsh.info was closed down at some point, the official version is now up and running at http://id.loc.gov/authorities/.

This is great news. This means that all items in the LoC subject heading have now a stable, dereferancable URI; the URI refers to a (SKOS) Concept and is linked to broader and narrower terms. The implementation follows the LOD principles; the URI can be dereferenced in an HTML browser providing an RDFa annotated HTML page. To take an example, the subject heading for “Semantic Web” has the URI: http://id.loc.gov/authorities/sh2002000569#concept; dereferencing it leads to an XHTML+RDFa page. The RDF content can be accessed describing the concept in SKOS, providing a link to, eg, the concept of World Wide Web. This vocabulary gives a stable target to characterize the subject of various entities using, eg, the Dublin Core “subject” term. The site provides a search page and the whole dataset can also be downloaded in RDF/XML or n triples.

This is a huge service of the Library of Congress to the Semantic Web community. Thanks to Ed Summers and anybody else who took an active role in it!

« Previous PageNext Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 2,545 other followers