Ivan’s private site

April 24, 2012

Moved my RDFa/microdata python modules to github

Filed under: Code,Python,Work Related — Ivan Herman @ 12:37
Tags: , , , ,

In case you were using/downloading my python module for RDFa 1.1 or for Microdata: I have now moved away from the old CVS repository site, and moved to GitHub. The two modules are on the RDFLib/pyrdfa3 and RDFLib/pymicrodata repositories, respectively. Both of these modules are more or less final (there are still some testings happening for RDFa, but not much left) and I am just happy if others chime in in the future of these modules.

Although part of the RDFLib project on GitHub, the two modules are pretty much independent of the core RDFLib library, although they are built on top of it. I hope that, with the help of people who know the RDFLib internal structures better, both modules can become, eventually, part of the core. But this may take some time…

April 18, 2012

Structured Data in HTML in the mainstream

Filed under: Semantic Web,Work Related — Ivan Herman @ 8:31
Tags: , ,

As referred to in my previous blog on LDOW2012, Hannes Hühleisen and Chris Bizer, but also Peter Mika and Tim Potter, published some findings on structured data in HTML based on Web Crawl results and analysis. Both Hannes’ and Peter’ papers are now on line. Hannes and Chris based their results on CommonCrawl, whereas Peter and Tim rely on Bing.

Although there are some controversies as for the usability of these crawls as well as the interpretation of their results (see Martin Hepp’s mail, and the answer by Peter Mika as well as the resulting thread on the mailing list) I think what is really important is the big picture which emerges from both set of results: no one can reasonably dispute the importance of structured data in HTML any more. Although I vividly remember a time when this was was a matter of bitter discussions, I think we can put this issue behind us now. I do not think I can summarize it better than Peter did in another of his emails:

…both studies confirm that the Semantic Web, and in particular metadata in HTML, is taking on in major ways thanks to the efforts of Facebook, the sponsors of schema.org and many other individuals and organizations. Comparing to our previous numbers, for example we see a five-fold increase in RDFa usage with 25% of webpages containing RDFa data (including OGP), and over 7% of web pages containing microdata. These are incredibly impressive numbers, which illustrate that this part of the Semantic Web has gone mainstream.

April 17, 2012

Linked Data on the Web Workshop, Lyon

(See the Workshop’s home page for details.)

The LDOW20** series have become more than workshops; they are really a small conferences. I did not count the number of participants (the meeting room had a fairly odd shape which made it a bit difficult) but I think it was largely over a hundred. Nice to see…

The usual caveat applies for my notes below: I am selective here with some papers which is no judgement on any other paper at the workshop. These are just some of my thoughts jotted down…

Giuseppe Rizzo made a presentation related to all the tools we know have to tag texts and thereby being able to use these resources in linked data (“NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud”), i.e., the Zemanta or Open Calais services of this World. As these services become more and more important, having a clear view of what they can do, how one can use them individually or together, etc., is essential. Their project, called NERD, will become an important source for this community, bookmark that page:-)

Jun Zhao made a presentation (“Towards Interoperable Provenance Publication on the Linked Data Web”) essentially on the work of the W3C Provenance Working Group. I was pleased to see and listen to this presentation: I believe the outcome of that group is very important for this community and, having played a role in the creation of that group, I am anxious to see it succeed. B.t.w., a new round of publication coming from that group should happen very soon, watch the news…

Another presentation, namely Arnaud Le Hors’ on “Using read/write Linked Data for Application Integration — Towards a Linked Data Basic Profile” was also closely related to W3C work. Arnaud and his colleagues (at IBM) came to this community after a long journey working on application integration; think, e.g., of systems managing software updates and error management. These systems are fundamentally data oriented and IBM has embarked into a Linked Data based approach (after having tried others). The particularity of this approach is to stay very “low” level, insofar as they use only basic HTTP protocol reading and writing RDF data. This approach seems to strike chord at a number of other companies (Elsevier, EMC, Oracle, Nokia) and their work form the basis of a new W3C Working Group that should be started this coming summer. This work may become a significant element of palette of technologies around Linked Data.

Luca Costabello talked about Access Control, Linked Data, and Mobile (“Linked Data Access Goes Mobile: Context-Aware Authorization for Graph Stores”). Although Luca emphasized that their solution is not a complete solution for Linked Data access control issues in general, it may become an important contribution in that area nevertheless. Their approach is to modify SPARQL queries “on-the-fly” by including access control clauses; for that purpose, an access control ontology (S4AC) has been developed and used. One issue is: how would that work with a purely HTTP level read/write Linked Data Web, like the one Arnaud is talking about? Answer: we do not know yet:-)

Igor Popov concentrated on user interface issues (“Interacting with the Web of Data through a Web of Inter-connected Lenses”): how to develop a framework whereby data-oriented applications can cooperate quickly, so that lambda users could explore data, switching easily to applications that are well adapted to a particular dataset, and without being forced to use complicated programming or use too “geeky” tools. This is still an alpha level work, but their site-in-development, called Mashpoint is a place to watch. There are (still) not enough work on user-facing data exploration tools, I was pleased to see this one…

What is the dynamics of Linked Data? How does it change? This is the question Tobias Käfer and his friends try to answer in future (“Towards a Dynamic Linked Data Observatory”). For that, data is necessary, and Tobias’ presentation was on how to determine what collection of resources to regularly watch and measure. The plan is to produce a snapshot of the data once a week for a year; the hope is that based on this collected data we will learn more about the overall evolution of linked data. I am really curious to see the results of that. One more reason to be at LDOW2013:-)

Tobias’ presentation has an important connection to the last presentation of the day, made by Axel Polleres (OWL: Yet to arrive on the Web of Data?) insofar as what he presented was based on the analysis of the Linked Data out there. The issue has been around, with lots of controversy, for a while: what level of OWL should/could be used for Linked Data? OWL 2 as a whole seems to be too complex for the amount of data we are talking about, both in terms of program efficiency and in terms of conceptually complexity for end users. OWL 2 has defined a much simpler profile, called OWL 2 RL, which does have some traction but may be still too complex, e.g., for implementations. Axel and his friends analyzed the usage of OWL statements out there, and also established some criteria on what type of rules should be used to make OWL processing really efficient; their result is another profile called OWL LD. It is largely a subset of OWL 2 RL, though it does adopt some datatypes that OWL 2 RL does not have.

There are some features that are left out of OWL 2 RL which I am not fully convinced of; after all their measurement was based on data in 2011, and it is difficult to say how much time it takes for new OWL 2 features to really catch up. I think that keys and property chains should/could be really useful on the Linked Data, and can be managed by rule engines, too. So the jury is still out on this, but it would be good to find a way to stabilize this at some point and see the LD crowd look at OWL (i.e., the subset of OWL) more positively. Of course, another approach would be to concentrate on an easy way to encode Rules into RDF which might make this discussion moot in a certain sense; one of the things we have not succeeded to do yet:-(

The day ended by a panel, on which I also participated; I would let others judge whether the panel was good or not. However, the panel was preceded by a presentation of Chris on the current deployment of RDFa and microdata which was really interesting. (His slides will be on the workshop’s page soon.) The deployment of RDFa, microdata, and microformats has become really strong now; structured data in HTML is a well established approach out there. RDFa and microdata covers now half of the cases, the other half being microformats, which seems to indicate a clear shift towards RDFa/microdata, ie, a more syntax oriented approach (with a clear mapping to RDF). Microdata is used almost exclusively with schema.org vocabularies (which is to be expected) whereas RDFa makes use of a larger palette of various other vocabularies. All these were to be expected, but it is nice to see being reflected in collected data.

It was a great event. Chris, Tim, and Tom: thanks!

December 16, 2011

Where we are with RDFa 1.1?

English: RDFa Content Editor

Image via Wikipedia

There has been a flurry of activities around RDFa 1.1 in the past few months. Although a number of blogs and news items have been published on the changes, all those have become “officialized” only the past few days with the publication of the latest drafts, as well as with the publication of RDFa 1.1 Lite. It may be worth looking back at the past few months to have a clearer idea on what happened. I make references to a number of other blogs that were published in the past few months; the interested readers should consult those for details.

The latest official drafts for RDFa 1.1 were published in Spring 2011. However, lot has happened since. First of all, the RDFWA Working Group, working on this specification, has received a significant amount of comments. Some of those were rooted in implementations and the difficulties encountered therein; some came from potential authors who asked for further simplifications. Also, the announcement of schema.org had an important effect: indeed, this initiative drew attention on the importance of structured data in Web pages, which also raised further questions on the usability of RDFa for that usage pattern This came to the fore even more forcefully at the workshop organized by the stakeholders of schema.org in Mountain View. A new task force on the relationships of RDFa and microdata has been set up at W3C; beyond looking at the relationship of these two syntaxes, that task force also raised a number of issues on RDFa 1.1. These issues have been, by and large, accepted and handled by the Working Group (and reflected in the new drafts).

What does this mean for the new drafts? The bottom line: there have been some fundamental changes in RDFa 1.1. For example, profiles, introduced in earlier releases of RDFa 1.1, have been removed due to implementation challenges; however, management of vocabularies have acquired an optional feature that helps vocabulary authors to “bind” their vocabularies to other vocabularies, without introducing an extra burden on authors (see another blog for more details). Another long-standing issue was whether RDFa should include a syntax for ordered lists; this has been done now (see the same blog for further details).

A more recent important change concerns the usage of @property and @rel. Although usage of these attributes for RDF savy authors was never a real problem (the former is for the creation of literal objects, whereas the latter is for URI references), they have proven to be a major obstacle for ‘lambda’ HTML authors. This issue came up quite forcefully at the schema.org workshop in Mountain View, too. After a long technical discussion in the group, the new version reduces the usage difference between the two significantly. Essentially, if, on the same element, @property is present together with, say, @href or @resource, and @rel or @rev is not present, a URI reference is generated as an object of the triple. I.e., when used on a, say, <link> or <a> element, @property  behaves exactly like @rel. It turns out that this usage pattern is so widespread that it covers most of the important use cases for authors. The new version of the RDFa 1.1 Primer (as well as the RDFa 1.1 Core, actually) has a number of examples that show these. There are also some other changes related to the behaviour of @typeof in relations to @property; please consult the specification for these.

The publication of RDFa 1.1 Lite was also a very important step. This defines a “sub-set” of the RDFa attributes that can serve as a guideline for HTML authors to express simple structured data in HTML without bothering about more complex features. This is the subset of RDFa that schema.org will “accept”,  as an alternative to the microdata, as a possible syntax for schema.org vocabularies. (There are some examples on how some schema.org example look like in RDFa 1.1 Lite on a different blog.) In some sense, RDFa 1.1 Lite can be considered like the equivalent of microdata, except that it leaves the door open for more complex vocabulary usage, mixture with different vocabularies, etc. (The HTML Task Force will publish soon a more detailed comparison of the different syntaxes.)

So here is, roughly, where we are today. The recent publications by the W3C RDFWA Working Group have, as I said, ”officialized” all the changes that were discussed since spring. The group decided not to publish a Last Call Working Draft, because the last few weeks’ of work on the HTML Task Force may reveal some new requirements; if not, the last round of publications will follow soon.

And what about implementations? Well, my “shadow” implementation of the RDFa distiller (which also includes a separate “validator” service) incorporates all the latest changes. I also added a new feature a few weeks ago, namely the possibility to serialize the output in JSON-LD (although this has become outdated a few days ago, due to some changes in JSON-LD…). I am not sure of the exact status of Gregg Kellogg’s RDF Distiller, but, knowing him, it is either already in line with the latest drafts or it is only a matter of a few days to be so. And there are surely more around that I do not know about.

This last series of publications have provided a nice closure for a busy RDFa year. I guess the only thing now is to wish everyone a Merry Christmas, a peaceful and happy Hanukkah, or other festivities you honor at this time of the year.  In any case, a very happy New Year!

Enhanced by Zemanta

April 20, 2011

RDFa 1.1 Primer (draft)

Filed under: Semantic Web,Work Related — Ivan Herman @ 10:21
Tags: , , ,

I have had several posts in the past on the new features of RDFa 1.1 and where it adds functionalities to RDFa 1.0. The Working Group has just published a first draft for an RDFa 1.1 Primer, which gives an introduction to RDFa. We did have such a primer already for RDFa, but the new version has been updated in the spirit of RDFa 1.1… Check it out if you are interested in RDFa!

April 9, 2011

Announcement on rNews

Filed under: Semantic Web,Work Related — Ivan Herman @ 6:38
Tags: , ,
Semantic Web Bus / Bandwagon

Image by dullhunk via Flickr

A few days ago IPTC published a press release on rNews: “Standard draft for embedding metadata in online news”. This is, potentially, a huge thing for Linked Data and the Semantic Web. Without going into too much technical details (no reason to repeat what is on the IPTC pages on rNews, you can look it up there) what this means is that, potentially, all major online news services on the globe, from the Associated Press to the AFP, or from the New York Times to the Süddeutsche Zeitung, will have have their news items enriched with metadata, and this metadata will be expressed in RDFa. In other words, the news items will be usable, by extracting RDF, as part of any Semantic Web applications, can be mashed up with other types of data easily, etc. In short, news item will become a major part of the Semantic Web landscape with the extra specificity to be an extremely dynamic set of data that is renewed every day. That is exciting!

Of course, it will take some time to get there, but we should realize that IPTC is the major standard setting body in the news publishing world. I.e., rNews has a major chance to be largely adopted. It is time for the Semantic Web community to pay attention…

Enhanced by Zemanta

April 1, 2011

2nd Last Call for RDFa 1.1

Filed under: Semantic Web,Work Related — Ivan Herman @ 2:58
Tags: , ,

The W3C RDFa Working Group published a “Last Call” for RDFa 1.1 back at the end of October last year. This was meant to be a “feature freeze” version and was asking for public comments. Well, the group received quite a number of those. Lots of small things, requiring changes of the documents in many places to make them more precise even in various corner cases, and some more significant ones. In some ways, it shows that the W3C process works, ensuring quite an influence of the community on the final shape of the documents. Because of the many changes the group decided to re-issue a Last Call (yes, the jargon is a bit misleading here…), aimed at a last check before the document goes to its next phase on the road of becoming a standard. Almost all the changes are minor for users, though important for, e.g., implementers to ensure interoperability. “Almost all”, because there is one new and, I believe, very important though controversial new feature, namely the so-called default profiles.

I have already blogged about profiles when they were first published back in April last year. In short, profile documents provide an indirection mechanism to define prefixes and terms for an RDFa source: publishers may collect all the prefixes they deem important for a specific application and authors, instead of being required to define a whole set of prefixes in the RDFa file itself, can just refer to the profile file to have them all at their disposal. I think the profile feature was the feature stirring the biggest interest in the RDFa 1.1 work: they are undeniably useful, and undeniably controversial… Indeed, in theory at least, profiles represent yet another HTTP round when extracting RDF from and RDFa file, which is never a good thing. But a good caching mechanism or other implementation tricks can greatly alleviate the pain… (B.t.w., the group has also created some guidelines for profile publishers to help implementers.)

This draft goes one step further by introducing default profiles. These are profiles just like any other, but they are defined with fixed URI-s (namely http://www.w3.org/profile/rdfa-1.1 for RDFa 1.1 in general, and, additionally, http://www.w3.org/profile/html-rdfa-1.1 for the various HTML variants) and the user does not have to declare them in an RDFa source. Which means that a very simple HTML+RDFa file of the sort:

<html>
  <body>
    <p about ="xsd:maxExclusive" rel="rdf:type" resource="owl:DatatypeProperty">
      An OWL Axiom: "xsd:maxExclusive" is a Datatype Property in OWL.
    </p>
  </body>
</html>

(note the missing prefix declarations!) will produce that RDF triple that you might expect. Can’t be simpler, can it?

Why? Why was it necessary to introduce this? Well, the experience shows that many HTML+RDFa authors forget to declare the prefixes. One can look, for example, at the pages that include Facebook’s Open Graph Protocol RDFa statements: although I do not have an exact numbers, I would suspect that around 50% of these pages do not have them. That means that, strictly speaking, those statements cannot be interpreted as RDF triples. The Semantic Web community may ask, try to convince, beg, etc., the HTML authors (or the underlying tools) to do “the right thing”, and we certainly should continue doing so, but we also have to accept this reality. A default profile mechanism can alleviate that, thereby greatly extending the amount of triples that can become part of a Web of Data. And even for seasoned RDF(a) users not having to declare anything for some of the common prefixes is a plus.

Of course, the big, nay, the BIG issue is: what prefixes and terms would those default profiles declare? What is the decision procedure? At this time, we do not have a final answer yet. It is quite obvious that all the vocabularies defined by W3C Recommendations and official Notes and that have a fixed prefix (most of them do) should be part of the list. We may want to add Member Submissions to this list. If you look at the default profile, these are already there in the first table (i.e., the code example above is safe). The HTML variant would add all the traditional @rel values, like license, next, previous, etc.

But what else? At the moment, the profiles include a set of prefixes and terms that are just there for testing purposes (although they do indicate a tendency), so do not take the default profile as the final content. For the HTML @rel values, we would, most probably, rely on any policy that the HTML5 Working Group will define eventually; the role of the HTML default profile will simply be to reflect those. That seems quite straightforward However, the issues of default prefixes is clearly different. For those, the Working Group is contemplating two different approaches

  1. Set up some sort of a registration mechanism, not unlike the xpointer registry. This would also include some accompanying mailing lists where objections can be raised against the inclusion of a specific prefix, etc.
  2. Try to get some information from search engines on the Semantic Web (Sindice, Yahoo!, anyone else?) that may provide with a list of, say, the top 20 prefixes as used on the Semantic Web. Such a list would reflect the real usage of vocabularies and prefixes. (We still have to see whether this is an information these engines can provide or not.)

At this moment it is not yet clear which way is realistic. Personally, I am more in favour of the second approach (if technically feasible), but the end result may be different; this is a policy that W3C will have to set up.

Apart from the content, another issue is the change mode and frequency of the default profile. First of all, the set of default prefixes can only grow. I.e., once a prefix has made it on the default profile, it has to stay there with an unchanged URI. That is obviously important to ensure stability. I.e., new prefixes coming to the fore by virtue of being used by the community can be added to the set, but no prefix can be removed. As for the frequency: a balance has to be found between stability, i.e., that RDFa processors can rely (e.g., for caching) on a not-too-frequent change of the default profiles, and relevance, i.e., that new vocabularies could find their way into the set of default prefixes. Again my personal feeling is that an update of the profiles once every 6 months, or even once a year, might strike a good balance here. To be decided.

As before, all comments are welcome but, again as before, I would prefer if you sent those comments to the RDFa WG’s mailing list rather than commenting this blog: public-rdfa-wg@w3.org (see also the archives).

Finally: I have worked on a new version of my RDFa distiller to include all the 1.1 features. This version of the distiller is now public, so you can try out the different new features. Of course, it is still not a final release, there are bugs, so…

Enhanced by Zemanta

October 27, 2010

Publication of the Last Call for RDFa Core 1.1

The W3C RDFa Working Group has just published the “Last Call Working Draft” for RDFa Core 1.1. As Manu Sporny, the co-chair of the group, said in his tweet, this W3C jargon is equivalent to a “feature freeze”. Ie, the group does not know of any outstanding technical issues and of missing features that it would reasonably plan to add. Put it another way, this is last round of commenting before proceeding to final implementation testing and, hopefully, to a final W3C Standard. I.e., Last Call doesn’t mean that the group takes no more comments; on the contrary, technical comments are very welcome and necessary to make it sure that the final outcome is correct. Please, send your comments to the groups mailing list: public-rdfa-wg@w3.org (there is also a public archive).

Although lots of things have been discussed in the past few months (i.e., since the last draft published in August) not many things have significantly changed, in fact. Most of the changes are editorial, making the text clearer, more precise, etc. (You can look at the “diff” file, if you are interested.) This document is for the Core, i.e., the generic RDFa processing that can be used for any DOM. It is to be expected to have, in a few days, a similar document published for XHTML+RDFa 1.1 by the same Working Group, and an HTML5+RDFa 1.1 by the HTML Working Group.

I have also worked, in parallel to the specification work, on a modified version of the RDFa distiller. While the “official” service remains unchanged and relies on the current RDFa Recommendation, there is now a “Shadow” version, that relies on RDFa 1.1. The underlying code has undergone some cleanups beyond the adaptation to RDFa 1.1 so I am sure there are bugs…

Finally, a blatant self-promotion: Stéphane Corlosquet, Lin Clark and I will give a tutorial at the upcoming ISWC conference in Shanghai on RDFa and Drupal. The RDFa part relies on 1.1… (There are links to the slides on the page but you do not expect us not to touch them any more before the tutorial itself, do you? So make sure you look at them again after the event…)

August 3, 2010

New RDFa Core 1.1 and XHTML+RDFa 1.1 drafts

Filed under: Semantic Web,Work Related — Ivan Herman @ 20:49
Tags: , , , ,

W3C has just published updated drafts of RDFa Core 1.1 and XHTML+RDFa 1.1. These are “just” new heart-beat documents, meaning that they are not fundamentally new (the first drafts of these documents were published last April) but not yet ”Last Call” documents, i.e., the group does not yet consider the specification work finished. Although… in fact it is not far from that point. The WG has spent the last few weeks to get through open issues, and not many are left open at this moment.

So what has changed since my last blog on the subject where I introduced the new features compared to RDFa 1.0? In fact, nothing spectacular. Lots of minor clarifications issues to make things more precise. There has been a change on the treatment of XML Literals: whereas, in RDFa 1.0, XML Literals are automatically generated any time XML markup is in the text, RDFa 1.1 explicitly requires a corresponding datatype specification; otherwise a plain literal is created in RDF. (This is the only backward incompatibility of RDFa 1.0, as foreseen by the charter.)

Probably the most important addition to RDFa Core was triggered by a comment of Jeni Tenison (though the problem was raised by others, too). Jeni emphasized a slightly dangerous aspect of the profile mechanism in RDFa 1.1. To remind the reader: using the @profile attribute the author of an RDFa 1.1 file can refer to another file somewhere on the Web; that “profile file” may include, in one place, prefix declarations, term specifications, and (this is also new in this version!) a default term URI (see again my earlier blog on the details). The question is: what happens if the profile file is unreachable? The danger is that an RDFa 1.1 would possibly generate wrong triples, which is actually worse than not generate triples at all. The decision of the group (as Jeni actually proposed) was that the whole DOM subtree, i.e., all triples would be dropped starting with the element with the un-referenceable profile.

The profile mechanism has stirred quite some interest both among users of RDFa and elsewhere. Martin Hepp was probably the first to publish an RDF 1.1 profile for GoodRelations and related vocabulary prefixes at http://www.heppnetz.de/grprofile/. To use, essentially, his example, this means that one can use

<div profile="http://www.heppnetz.de/grprofile/">
  <span about="#company" typeof="gr:BusinessEntity>
    <span property="gr:legalName">Hepp's bakery</span>,
    see also the <a rel="rdfs:seeAlso" href="http://example.org/bakery">
    home page of the bakery.</a>
</div>

Because Martin’s profile includes a prefix definition for rdfs, too (alongside a number of other prefixes), the profile definition replaces a whole series of namespace declarations that were necessary in RDFa 1.0. I would guess that similar profile files, with term or prefix definitions, will be defined for foaf or for Dublin Core, too. Other obvious candidates for such profile definitions are the “big” users of RDFa information like Facebook or Google, who can specify the vocabularies they understand, i.e., index. (This did come up at the W3C camp in Raleigh, during the exciting discussion on the Facebook vocabulary.) Finally, another interesting discussion generated by RDFa’s profile mechanism occurred at the “RDF Next” Workshop in Palo Alto a few weeks ago: some participants proposed to consider a similar mechanism in a next version of Turtle (I must admit this came as a surprise, although it does make sense…)

As for implementations of profiles? Profiles are defined in such a way that an RDFa processor can recursively invoke itself to extract the necessary information for processing; indeed, RDFa is also used to encode the prefix, term, etc, definitions (Turtle or RDF/XML can also be used, but RDFa is the only required format). This means that an RDFa processor does not have to implement a different parser to handle the profile information. My ”shadow” RDFa distiller implements this (as well as all RDFa 1.1 features) and it was not complicated. It actually implements a caching mechanism, too: some well known and well published profiles can be stored locally so that the distiller does not go through an extra HTTP request all the time (yes, I know, this may lead to inconsistencies in theory but if such cache is refreshed regularly via, say, a crontab job, it should be o.k. in practice). At the moment the content of that cache is of course curated by hand. (The usual caveat applies: this is code in development, with bugs, with possibly frequent and unannounced changes…) You are all welcome to try the shadow distiller to see what RDFa is capable of. Of course, other RDFa 1.1 implementations are in the making. If you have one, it would be good to know about them, the Working Group is constantly looking for implementation experiences…

June 29, 2010

SemTech2010 & co.

I am on my way home from a long trip in the US (writing these lines on the plane, to be posted from home). Few days in Seattle, SemTech 2010 in San Francisco, finally the “RDF Next Steps” workshop in Palo Alto (i.e, Stanford). I do not want to write about the last one now, simply because we hope to have a more extended public report available within 10-15 days. I.e., more about that later.

Seattle consisted of a number of company visits, but it also included a talk at the SemWeb Meetup in Seattle. I gave a presentation on what happened at W3C the last year which, I think, was was well received. (Although one is never sure about these things.) I had a bunch of discussions and chats after the presentation; it was pleasant, relaxing… I and mainly my colleague from W3C, Eric Prud’hommeaux, had also a long discussion with two developers from Microsoft who are involved in the oData work; that was really interesting because we reached the conclusion of possibly outlining together a possible plan whereby we could write down how to “export” oData into RDF, and publish that, e.g., as W3C note (note that there are already systems doing something like that out there, but I am not knowledgeable enough to judge how complete those solutions are). I think it would be good for the community if this happens. It is important for a general Web of Data to include, well, all the data on the Web…

Semtech… it was big. Bigger than last year (I heard and read a figure of a 30% increase in attendance). This industry is lively indeed! The only problem that it was almost too big; it was the conference of eternal frustration:-( Indeed, there were so many things in parallel that one always had the feeling to have missed something because another, parallel session may have been more interesting! I heard presentations from Facebook, from Google, saw stunning visualizations of RDF graphs, or heard about plans on ontology hosting and management. There was a report on the US and UK governmental data work (this stuff still amazes me, though it is not the first time I hear about it), there was a presentation of BestBuy (alas! I missed that one). There was a separate track on the publication world as a separate “vertical” area (and we also had some great discussions with the people from the New York Times with whom we outlined a possible first step in gathering that community). Lots of hallway conversation with companies and institutions and, of course the social life, chatting with David, and Ian, and the other Ian, and Eric, and the other David, and Christine, and Jeremy, and Jim, and Fabien, and Sandro, and Jenni, and… I should stop and not even try to list everybody because it is simply impossible! I also gave an introductory Semantic Web Tutorial (quite a lot of people in the audience, and I think it went well), we had a panel on the W3C RDB2RDF work and another one on SPARQL 1.1. As a nice little touch, I could announce the publication of the W3C RIF Recommendation as a primeur during the tutorial when as I was talking about RIF (the publication itself happened while I was talking…)

There were, as every year, some “buzz” topics. My impression that the linked open governmental data effort was a buzz and was still new information for many. Facebook’s keynote on the Open Graph Protocol crated another buzz. More generally, RDFa was definitely a buzz (big time!). I.e., as I said, this industry is lively and continue to be exciting.

But there are of course challenges. The way I feel it the biggest challenge is not technical. Yes, of course, there are technical issues, but those will be solved, eventually. The issue is outreach, to get to those new communities who may understand the value of a Web of Data in general but have not enough guidance on how to start doing something. How to publish the data, how to link it to other data, how to consume it, use it, mash it up… How to talk to “C-level” people, how to reach out to them. There are books, of course, but not enough; there are tutorials and guides, of course, but not enough; there are experts around but definitely not enough. As one of our discussion partners put it: if I go to any better bookshop, there are rows of books on, say, XML (good or bad, but they are there). But books on RDF, on Linked Data, on SPARQL, on SKOS, on OWL: only a few here and there (comparatively, that is), and some of them are actually quite old. Let alone the problem of trying to hire experts that could do the job. I really feel that this is the biggest challenge our community faces. I say “community” and not only a single organization like W3C or other; the challenge is too great to be solved by one group only. We have been fighting with this issue for a while now, but it is still a challenge… And a challenge for us all who care about that stuff!

It was a good week!

Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 2,512 other followers