November 2, 2011

Some notes on ISWC2011…

The 10th International Semantic Web Conference (ISWC2011) took place in Bonn last week. Others have already blogged on the conference in a more systematic way (see, for example, Juan Sequeda’s series on semanticweb.com); there is no reason to repeat that. Just a few more personal impression, with the obvious caveat that I may have missed interesting papers or presentations, and the ones I picked here are also the results of my personal bias… So, in no particular order:

Zhishi.me is the outcome of the work of a group from the APEX lab in Shanghai and Southeast University: it is, in some ways, the Chinese DBPedia. “In some ways” because it is actually a mixture of three different Chinese, community driven encyclopedia, namely the Chinese Wikipedia, Baidu Baike and Hudong Baike. I am not sure of the exact numbers, but the combined dataset is probably a bit bigger than DBpedia. The goal of Zhishi.me is to act as a “seed” and a hub for Chinese linked open data contributions, just like DBpedia did and does for the LOD in general.

It is great stuff indeed. I do have one concern (which, hopefully, is only a matter of presentation, i.e., may be a misunderstanding on my side). Although zhishi.me is linked to non-Chinese datasets (DBPedia and others), the paper talks about a “Chinese Linked Open Data (COLD)”, as if this was something different, something separate. As a non-English speaker myself I can fully appreciate the issues of language and culture differences but I would nevertheless hate to see the Chinese community develop a parallel LOD, instead of being an integral part of the the LOD as a whole. Again, I hope this is just a misunderstanding!

There were a number of ontology or RDF graph visualization presentations, for example from the University of Southampton team (“Connecting the Dots”), on the first results of an exploration done by a Magnus Stuhr and his friends in Norway, called LODWheel (the latter was actually at the COLD2011 Workshop), or another one from a mixed team, led by Enrico Motta, on a visualization plugin to the NeOn toolkit called KC-Viz. I have downloaded the latter, and have played a bit with it already, but I haven’t had the time to have a really informed conclusion on it yet. Nevertheless, KC-Viz was interesting for me for a different reason. The basic idea of the tool is to use some sort of an importance metric attached to each node in the class hierarchy and direct the visualization based on that metric. It was reminiscent to some work I did in my previous life on graph visualization, though the metric was different, the graph was only a tree, the visualization approach was different, but nevertheless, there was a similar feel to it… Gosh, that was a long time ago!

The paper of John Howse et al. on visualizing ontologies was also interesting. Interesting because different: the idea is a systematic usage of Euler diagrams to visualize class hierarchies combined with some sort of a visual language for the presentation of property restrictions. In my experience property restrictions is a very difficult (maybe the most difficult?) OWL concept to understand without a logic background; any tool, visual or otherwise, that helps teaching and explaining this can be very important. Whether John’s visual language is the one I am not sure yet, but it may well be. I will consider using it the next time I give a tutorial…

I was impressed by the paper of Gong Cheng and his friends from Nanjing, “Empirical Study of Vocabulary Relatedness…”. Analyzing the results of a search engine (in this case Falcons) to draw conclusion on the nature, the usage, the mutual relationship, etc., of vocabularies is very important indeed. We need empirical results, bound to real life usage. This is not the first work in this direction (see, for example, the work of Ghazvinia et al, from ISWC2009), but there is still much to do. Which reminds me of some much smaller scale work Giovanni, Péter and I didon determining the top vocabulary prefixes for the purpose of the RDFa 1.1 initial context (we used to call it default profile back then). I should probably try to talk to the Nanjing team to merge with their results!

I think the vision paper of Marcus Cobden and his friends (again at the COLD2011 Workshop) on a “Research Agenda for Linked Closed Data” is worth noting. Although not necessarily earthshaking, the fact that we can and we should speak about Linked Closed Data alongside Linked Open Data is important if we want the Semantic Web to be adopted and used by the enterprise world as well. One of the main issue, which is not really addressed frequently enough (although there have been some papers published here and there) is access control. Who has the right to access data? Who has the right to access a particular ontology or rule set that may lead to the deduction of new relationships? What are the licensing requirements, how do we express them? I do not think our community has a full answer to these. B.t.w., W3C organizes a Workshop concentrating on the enterprise usage of Linked Data in December…

Speaking about research agenda… I really liked Frank van Harmelen’s keynote on the second day of the conference. His approach was fresh, and the question he asked was different: essentially, after 10 or more years of research in the Semantic Web area, can we derive some “higher level” laws that describe and govern this area of research? I will not repeat all the laws that he proposed, it is better to look his Web with the HTML version of his slides. The ones that is worth repeating again and again are that “Factual knowledge is a graph”, “Terminological knowledge is a hierarchy”, and “Terminological knowledge is much smaller than the factual knowledge”. Why are these important? To quote from his keynote slides:

  1. traditionally, KR has focussed on small and very intricate sets of axioms: a bunch of universally quantified complex sentences
  2. but now it turns out that much of our knowledge comes in the form of very large but shallow sets of axioms.
  3. lots of the knowledge is in the ground facts, (not in the quantified formula’s)

Which is important to remember when planning future work and activities. “Reasoning”, usually, happens on a huge set of ground facts in a graph, with a shallow hierarchy of terminology…

I was a little bit disappointed by the Linked Science Workshop; probably because I had wrong expectations. I was expecting a workshop looking at how Linked Data in general can help in the renewal of the scientific publication process as a whole (a bit along the lines of the Force11 work on improving the future of scholarly communication). Instead, the workshop was more on how different scientific fields use linked data for their work. Somehow the event was unfocussed for me…

As in some previous years, I was again part of the jury for the Semantic Web Challenge. It was interesting how our own expectations have changed over the years. What was really a wow! a few years ago, has become so natural that we are not excited any more. Which is of course a good thing, it shows that the field is maturing further, but we may need some sort of a Semantic Web Super-Challenge to be really excited again. That being said, the winners of the challenge really did impressive works, I do not want to give the impression of being negative about them… It is just that I was missing that “Wow”.

Finally, I was at one session of the industrial track, which was a bit disappointing. If we wanted to to show the research community that the Semantic Web technologies are really used by industry, then the session did not really make a good job on that. With one exception, and a huge one at it: the presentation of Yahoo! (beware, the link is to a PowerPoint slidedeck). It seems that Yahoo! is building an internal infrastructure based on what they call “Web of Objects”, by regrouping pieces of knowledge in a graph-like fashion. By using internal vocabularies (superset of schema.org) and using the underlying graph infrastructure they aim at regrouping similar or identical knowledge pieces harvested on the Web. I am sure we will hear more about this.

Yes, it was a full week…

  1. As far as I know (talked to author of the paper), they do plan to integrate Zhishi.me into LOD, i.e., as an integrated part of LOD.

    Comment by Jesse Jiaxin Wang (@aiwang) — November 3, 2011 @ 6:05

    • Thanks. I am happy if I am proven wrong!

      Comment by Ivan Herman — November 3, 2011 @ 10:00

