Ivan’s private site

January 4, 2014

Data vs. Publishing: my change of responsibilities…

Fairly Lake Botanical Garden, Shenzhen, China

There was an official announcement, as well as some references, on the fact that the structure of data related work has changed at W3C. A new activity has been created called “Data Activity”, that subsumes what used to be called the Semantic Web Activity. “Subsumes is an important term here: W3C does not abandon the Semantic Web work (I emphasize that because I did get such reactions); instead, the existing and possible future work is simply continuing within a new structure. The renaming is simply a sign that W3C has also to pay attention to the fact that there are many different data formats used on the Web, not all of them follow the principles and technologies of the Semantic Web, and those other formats and approaches also have technological and standardization needs that W3C might be in position to help with. It is not the purpose of this blog, however, to look at the details; the interested reader may consult the official announcements (or consider Tim Finin’s formula: Data Activity  ⊃ Semantic Web  ∪  eGovernment🙂

There is a much less important but more personal aspect of the change, though: I will not be the leader of this new Data Activity (my colleague and friend, Phil Archer, will do that). Before anybody tries to find some complicated explanation (e.g., that I was fired): the reason is much more simple. About a year ago I got interested by a fairly different area, namely Digital Publishing. What used to be, back then, a so-called “headlight” project at W3C, i.e., an exploration into a new area, turned into an Activity on its own, with me as the lead, last summer. There is a good reason for that: after all, digital publishing (e.g., e-books) may represent one of the largest usage areas of the core W3C technologies (i.e., HTML5, CSS, or SVG) right after browsers; indeed, for those of you who do not realize that (I did not know that just a year and a half ago either…) an e-book is “just” a frozen and packaged Web site, using many of the technologies defined by W3C. A major user area, thus, but whose requirements may be special and not yet properly represented at W3C. Hence the new Activity.

However, this development at W3C had its price for me: I had to choose. Heading both the Digital Publishing and the Data Activities was not an option. I have lead W3C’s Semantic Web Activity for cca. 7 years; 7 years that were rich in events and results (the forward march of Linked Open Data, a much more general presence and acceptation of the technology, specifications like OWL 2, RDFa, RDB2RDF, PROV, SKOS, SPARQL 1.1, with RDF 1.1 just around the corner now…). I had my role in many of these, although I was merely a coordinator for the work done by other amazing individuals. But, I had to choose, and I decided to go towards new horizons (in view of my age, probably for the last time in my professional life); hence my choice for Digital Publishing. As simple as that…

But this does not mean I am completely “out”. First of all, I will still actively participate in some of the data activity groups (e.g., in the “CSV on the Web WG”), and have a continuing interest in many of the issues there. But, maybe more importantly, there are some major overlapping areas between Digital Publishing and Data on the Web. For example, publishing also means scientific, scholarly publishing, and this particular area is increasingly aware of the fact that publishing data, as part of reporting of a particular scientific endeavor, becomes as important as publishing a traditional paper. And this raises tons of issues on data formats, linked data, metadata, access, provenance, etc. Another example: the traditional publishing industry makes an increasingly heavy usage of metadata. There is a recognition among publishers that a well chosen and well curated defined metadata for books is a major business asset that may make a publication win or loose. There are many (overlapping…) vocabularies and relationships to libraries, archival facilities, etc., come to the fore. Via this metadata the world of publishing may become a major player of the Linked Data cloud. A final example may be annotation: while many aspects of the annotation work is inherently bound to Semantic Web (see, e.g., the work W3C Community Group on Annotation), it is also considered to be one of the most important areas for future development in, say, the educational publishing area.

I can, hopefully, contribute to these overlapping areas with my experience from the Semantic Web. So no, I am not entirely gone, just changed hats! Or, as on the picture, acting (also) as a bridge…

November 2, 2011

Some notes on ISWC2011…

The 10th International Semantic Web Conference (ISWC2011) took place in Bonn last week. Others have already blogged on the conference in a more systematic way (see, for example, Juan Sequeda’s series on semanticweb.com); there is no reason to repeat that. Just a few more personal impression, with the obvious caveat that I may have missed interesting papers or presentations, and the ones I picked here are also the results of my personal bias… So, in no particular order:

Zhishi.me is the outcome of the work of a group from the APEX lab in Shanghai and Southeast University: it is, in some ways, the Chinese DBPedia. “In some ways” because it is actually a mixture of three different Chinese, community driven encyclopedia, namely the Chinese Wikipedia, Baidu Baike and Hudong Baike. I am not sure of the exact numbers, but the combined dataset is probably a bit bigger than DBpedia. The goal of Zhishi.me is to act as a “seed” and a hub for Chinese linked open data contributions, just like DBpedia did and does for the LOD in general.

It is great stuff indeed. I do have one concern (which, hopefully, is only a matter of presentation, i.e., may be a misunderstanding on my side). Although zhishi.me is linked to non-Chinese datasets (DBPedia and others), the paper talks about a “Chinese Linked Open Data (COLD)”, as if this was something different, something separate. As a non-English speaker myself I can fully appreciate the issues of language and culture differences but I would nevertheless hate to see the Chinese community develop a parallel LOD, instead of being an integral part of the the LOD as a whole. Again, I hope this is just a misunderstanding!

There were a number of ontology or RDF graph visualization presentations, for example from the University of Southampton team (“Connecting the Dots”), on the first results of an exploration done by a Magnus Stuhr and his friends in Norway, called LODWheel (the latter was actually at the COLD2011 Workshop), or another one from a mixed team, led by Enrico Motta, on a visualization plugin to the NeOn toolkit called KC-Viz. I have downloaded the latter, and have played a bit with it already, but I haven’t had the time to have a really informed conclusion on it yet. Nevertheless, KC-Viz was interesting for me for a different reason. The basic idea of the tool is to use some sort of an importance metric attached to each node in the class hierarchy and direct the visualization based on that metric. It was reminiscent to some work I did in my previous life on graph visualization, though the metric was different, the graph was only a tree, the visualization approach was different, but nevertheless, there was a similar feel to it… Gosh, that was a long time ago!

The paper of John Howse et al. on visualizing ontologies was also interesting. Interesting because different: the idea is a systematic usage of Euler diagrams to visualize class hierarchies combined with some sort of a visual language for the presentation of property restrictions. In my experience property restrictions is a very difficult (maybe the most difficult?) OWL concept to understand without a logic background; any tool, visual or otherwise, that helps teaching and explaining this can be very important. Whether John’s visual language is the one I am not sure yet, but it may well be. I will consider using it the next time I give a tutorial…

I was impressed by the paper of Gong Cheng and his friends from Nanjing, “Empirical Study of Vocabulary Relatedness…”. Analyzing the results of a search engine (in this case Falcons) to draw conclusion on the nature, the usage, the mutual relationship, etc., of vocabularies is very important indeed. We need empirical results, bound to real life usage. This is not the first work in this direction (see, for example, the work of Ghazvinia et al, from ISWC2009), but there is still much to do. Which reminds me of some much smaller scale work Giovanni, Péter and I didon determining the top vocabulary prefixes for the purpose of the RDFa 1.1 initial context (we used to call it default profile back then). I should probably try to talk to the Nanjing team to merge with their results!

I think the vision paper of Marcus Cobden and his friends (again at the COLD2011 Workshop) on a “Research Agenda for Linked Closed Data” is worth noting. Although not necessarily earthshaking, the fact that we can and we should speak about Linked Closed Data alongside Linked Open Data is important if we want the Semantic Web to be adopted and used by the enterprise world as well. One of the main issue, which is not really addressed frequently enough (although there have been some papers published here and there) is access control. Who has the right to access data? Who has the right to access a particular ontology or rule set that may lead to the deduction of new relationships? What are the licensing requirements, how do we express them? I do not think our community has a full answer to these. B.t.w., W3C organizes a Workshop concentrating on the enterprise usage of Linked Data in December…

Speaking about research agenda… I really liked Frank van Harmelen’s keynote on the second day of the conference. His approach was fresh, and the question he asked was different: essentially, after 10 or more years of research in the Semantic Web area, can we derive some “higher level” laws that describe and govern this area of research? I will not repeat all the laws that he proposed, it is better to look his Web with the HTML version of his slides. The ones that is worth repeating again and again are that “Factual knowledge is a graph”, “Terminological knowledge is a hierarchy”, and “Terminological knowledge is much smaller than the factual knowledge”. Why are these important? To quote from his keynote slides:

  1. traditionally, KR has focussed on small and very intricate sets of axioms: a bunch of universally quantified complex sentences
  2. but now it turns out that much of our knowledge comes in the form of very large but shallow sets of axioms.
  3. lots of the knowledge is in the ground facts, (not in the quantified formula’s)

Which is important to remember when planning future work and activities. “Reasoning”, usually, happens on a huge set of ground facts in a graph, with a shallow hierarchy of terminology…

I was a little bit disappointed by the Linked Science Workshop; probably because I had wrong expectations. I was expecting a workshop looking at how Linked Data in general can help in the renewal of the scientific publication process as a whole (a bit along the lines of the Force11 work on improving the future of scholarly communication). Instead, the workshop was more on how different scientific fields use linked data for their work. Somehow the event was unfocussed for me…

As in some previous years, I was again part of the jury for the Semantic Web Challenge. It was interesting how our own expectations have changed over the years. What was really a wow! a few years ago, has become so natural that we are not excited any more. Which is of course a good thing, it shows that the field is maturing further, but we may need some sort of a Semantic Web Super-Challenge to be really excited again. That being said, the winners of the challenge really did impressive works, I do not want to give the impression of being negative about them… It is just that I was missing that “Wow”.

Finally, I was at one session of the industrial track, which was a bit disappointing. If we wanted to to show the research community that the Semantic Web technologies are really used by industry, then the session did not really make a good job on that. With one exception, and a huge one at it: the presentation of Yahoo! (beware, the link is to a PowerPoint slidedeck). It seems that Yahoo! is building an internal infrastructure based on what they call “Web of Objects”, by regrouping pieces of knowledge in a graph-like fashion. By using internal vocabularies (superset of schema.org) and using the underlying graph infrastructure they aim at regrouping similar or identical knowledge pieces harvested on the Web. I am sure we will hear more about this.

Yes, it was a full week…

Enhanced by Zemanta

November 12, 2009

Pay to be free…

Filed under: Social aspects,Work Related — Ivan Herman @ 17:00

I may not be well informed, so this may be a known approach for some of you, but it is the first time I see this…

There has been a tension between (scientific) publishers and authors for a while on whether one is allowed to put one’s publication on the Web. When dealing with traditional publishers the author usually gives away his/her copyright and the papers are rarely available on the Web (which is a source of constant frustrations to readers). Fortunately, this is not always the case; for example, the proceedings of the World Wide Web conference series are published by ACM, but the papers are nevertheless available on the Web for free (thanks to IW3C2).

Well, a counter-proposal from a publisher is quite amazing. A Hungarian publisher, Akadémiai Kiadó, offers authors a deal, called the “Optional Open Article”: if you pay the nice sum of 900€, then the paper is also put onto an on line edition and is made freely available on the Web. (The fact that it is then freely available is clear in the agreement posted on the web site). Pay for your freedom. Isn’t this wonderful?

And, to make it clear: this is a very prestigious publisher in Hungary, is related to the Hungarian Academy of Sciences and, therefore, the prime publishers locally of Hungarian scientists…

I find it appalling.  But this may only be me.

November 4, 2008

Semantic Web for dummies…

Filed under: Semantic Web,Work Related — Ivan Herman @ 20:14
Tags: ,

A possible sign that a technology is getting into the mainstream is when a book is published in the “XXX for dummies” series. Well, I just realized today that the “Semantic Web for dummies”, by Jeff Pollock, is in the publication pipeline, to be out in the bookshops in March! We are getting there…

July 5, 2008

Low hanging… dogfood?

Filed under: Semantic Web,Work Related — Ivan Herman @ 10:52
Tags: ,

This should be, actually, a comment on Péter’s comment on my previous blog, but it really becomes a separate topic. Ie, I decided to put it into a separate blog. Besides, it is a bit too long for a comment…

To summarize, the JWS journal has a pre-print service running, as a back end, the openacademia software developed by Péter and his friends. Which also means that the JWS data should be accessible in RDF, probably following the the SWC ontology (although I have not found a pointer on the JWS site).

But, if so, don’t we have a low hanging, hm, dogfood here for the SW community? We begin to have most of the recent SW publications in RDF somewhere on the net. Beyond the JWS papers the Semantic Web Conference Corpus site not only includes the RDF data for ISWC, ESWC, ASWC, and some related workshops, but it also has a SPARQL endpoint. I know that Daniel Schwabe is working on getting the WWW2008 conference material into a similar format and, hopefully, we can have the material available for the WWW200X conferences available somewhere on the Web. I maintain a list of books on a wiki (well, hopefully, the community maintains it…) but I also keep the same list on Bibsonomy, and the list is therefore available in RDF, too (again, using the SWC ontology). And there might be other resources that I do not know about.

So… the easy thing to do is to integrate all this RDF data via some SPARQL endpoint. Because the data is already in RDF, that does not cost anything (although I am not 100% sure all the data follow the same vocabulary, so querying might be a bit tricky). But what I would love to see is to have a general service with a nice user interface on top of the data. I want to be able to search easily through the data without writing SPARQL queries or dive into the RDF graph directly with an RDF browser. The scale can be tricky. A few weeks ago David Huynh created a nice exhibit page for the ESWC2008 data. It really looks great and helps a lot in searching the data. However… as an experiment I copied his file, and added a few more datasets from the SW Corpus. Well… it turned out to be too much for Exhibit (I may have made a mistake somewhere, of course, but I do not believe Exhibit is good enough for that amount of data). Ie, a more dedicated interface should be created to provide this service for end users (maybe along the lines of openacademia?).

And, of course, it is easy to have nice ideas on how to add new features with all the data around… For example, the book wiki page has references to Chris Bizer’s bookmashup data via the ISBN numbers. We could use DBpedia and Geonames to access information on conference cities, FOAF data on authors and editors… We could use some good service (like MOAT) to have a uniform tagging system for the papers’ topic, or use Ed Summers’ Library of Congress Subject headings in SKOS… In other words, this could become a nice LOD application, too! (Hm, maybe it is not such a low hanging dogfood after all?)

What I would really like is to get a comment on this blog saying “you uninformed fool, this already exists here and here!”. I would humbly stand corrected, and would happily use the service. Anyone with this comment?

July 1, 2008

Journal of Web Semantics Preprint Server

Filed under: Semantic Web,Work Related — Ivan Herman @ 13:21

(Maybe it is common knowledge and I am the only one who has missed it, but maybe it is not. If the latter, this can be useful information for somebody else, too…)

I spent some time yesterday scanning through some papers of the latest issues of the Journal of Web Semantics. I could do that relatively easily, because I happen to be employed by a research institute (CWI, in Amsterdam), which pays hefty price to access the digital library of Elsevier. This means I could download and print the papers I was interested in without problems. However, what is with those in the community who do not happen to work at a university or a research institute? Why isn’t JWS online? The subscription price for the journal is €77 per year; not a huge amount of money, maybe, but a significant sum nevertheless for a private person.

However, after some searching, I was pleasantly surprised to find out that Elsevier has a preprint server for JWS up and running. Still in beta (I had some temporary access problems today, for example), but it works well and most of the papers I was interested in are downloadable in PDF (the format is not necessarily the same as the final journal print, but who cares?). It is good to see that access of the JWS papers are not restricted to institutionalized researchers only, but they are accessible to everyone alongside the papers of conferences like ISWC or WWW…

(In case you only see the blog item somewhere but not the comments: it is worth looking at them. Jim Hendler corrected me on the preprint server: it turns out not to be Elsevier’s. My mistake or, rather, lack of knowledge… — IH, 2008-07-02)

Create a free website or blog at WordPress.com.