Ivan’s private site

October 25, 2007

Former classmates and Hungary

Filed under: General, Hungary, Private — Ivan Herman @ 20:24

Some days ago I had the pleasure to spend an evening with a friend whom I had not seen for more 27 years; we used to be classmates in one of the top high schools back in the seventies in Budapest. After a while the unavoidable happened: we began to enumerate our common classmates, to share information about who did what, what happened with them, how their life evolved, etc. One of the striking facts was that (out of cca. 35 kids back then) about half (if not more) left Hungary at some point or other. Some people live in France, others in Germany, US, the UK (like my friend), the Netherlands (like myself and my wife),… And practically none of those have moved back, or plan to move back to Hungary, in spite of the changes that occurred there. Though not necessarily surprising, this fact stroke me again as a somewhat unfortunate fact about a whole generation in Hungary. Slightly sad and discomforting…

October 13, 2007

Move the Hungarian away…

Filed under: Hungary, Private — Ivan Herman @ 18:00

Nico made a comment on my previous, Hungarian blog, and he was absolutely right. Mixing two languages, with one of the two being as peculiar as Hungarian:-) is really not a good idea. So I created myself a separate, Hungarian-only blog. From now on, the two blogs are strictly separated…

Néhány összetartozó hír…

Filed under: Hungary, Private — Ivan Herman @ 12:27

October 12, 2007

Wikipedia URI-s as reliable identifiers for the Semantic Web?

Filed under: Semantic Web, Work Related — Ivan Herman @ 15:05

Martin Hepp drew my attention on one of his upcoming publications[1]; some related thoughts…

The issues around URI-s come up regularly on the various SW related mailing list and discussion fora leading, sometimes, to passionate discussions. That is all right, it is indeed a complicated. But, somehow, the question of where to find the URI-s for various concepts does not always get enough attention (at least in my view). I remember a while ago, when Frederick, Yves, and some others were working on the Music Ontology, my question was: all that is fine, but what is the authoritative URI for, say, Beethoven’s 7th symphony?

Of course, in some areas, communities are working on such naming schemes for their own constituencies. LSID-s are a prime example in the Life Science domain. On line catalogs of digital libraries (see my earlier blog referring to RDA-s, for example) might provide us with another rich source of stable URI-s. As yet another example the lingvoj site of Bernard Vatant (just updated a few days ago) might establish itself as a set of stable URI-s for spoken languages (ie, the URI http://www.lingvoj.org/lang/hu might become the URI for Hungarian). A number of similar datasets appear, for example, through the Open Linked Data project that could, eventually, play similar roles.

Yes but, in the meantime, what happens to the vast number of other “things”? What is the answer to my 7th symphony question? An idea I heard before: why not using the Wikipedia URI-s for that purpose? And that sounds like a good idea indeed. However, for that to work, a number of questions should be answered. Eg, how stable are those URI-s? How reliable are they? And this is where Martin et al.’s paper come in. They do a series of statistical measurements and analysis on the evolution of Wikipedia entries (they rely on data of this year). Their measurements indicate, for example, that the Wikipedia URI-s are indeed stable enough. To be more precise, their measurement show that this year around 93% of the URI-s on Wikipedia had a stable meaning (ie, the text of the corresponding article may have changed in some details, but the URI can still be considered as referring to the same notion). Given the large number of articles, this seems pretty o.k. to me… There are also some other statistical details in the paper (on the subject of the articles, for example), as well as further references, but, succinctly, that is probably the most important result. I am sure that further analysis on Wikipedia is still necessary (and I am also sure it will happen); this paper is certainly an interesting one among those!

So, should we rely on Wikipedia for the 7th symphony? Almost. If we go this direction, my choice would be to use DBpedia instead. DBpedia being a dump of Wikipedia, it inherits all the stability results that Martin et al. describe. Also, the current DBPedia setup makes a clear difference between a non informational resource URI and its RDF representation (an issue raised as a problem in [1] for Wikipedia URI-s). Last, but certainly not least, the RDF graphs in DBpedia are linked to an increasing number of other data sets via the Open Linked Data setup that applications may also exploit. Ie, a suitable URI for the 7th symphony might be:

http://dbpedia.org/resource/Symphony_No._7_%28Beethoven%29

derived from the corresponding Wikipedia URI.

Of course, this is not a silver bullet. There can be lots of criticisms for the topics treated in Wikipedia (or not). To continue my example, the list of Beethoven’s work is fairly well covered by Wikipedia articles, but this is less true for, say, Robert Schumann. New, more systematic vocabularies might appear in which case we may have URI aliases on our hands. Etc. However… do we have another, existing choice for today? I would be curious to hear…

(Note that the URI alias issue might be solved by automatically adding owl:sameAs predicates wherever appropriate. For example, the lingvoj data already includes such a link for each language, linking to… the corresponding DBpedia URI.)

[1] M. Hepp, K. Siorpaes, and D. Bachlechner, “Harvesting Wiki Consensus Using Wikipedia Entries as Vocabulary for Knowledge Management,” IEEE Internet Computing, vol. 11, pp. 54-65, 2007. Also
available on-line at Martin’s site.

October 5, 2007

Agenda clash (W3C Technical Plenary and ISWC2007)

Filed under: Semantic Web, Work Related — Ivan Herman @ 12:04
Tags:

The W3C Technical Plenary Day has been announced (this is a public day, part of the W3C Technical Plenary Week). It is really a pity I cannot be there (the W3C day is in Boston, and I go to ISWC2007 in Korea a few days later; the two events are just too close to one another for a person living in Amsterdam…). Some (all?) sessions are relevant to the various aspects of the evolution of Semantic Web: maybe not technically, but certainly socially. How should W3C groups be organized with respect to openness vs. member confidentiality? How should we handle extensibility based on URI-s (we’ll hit this issue with respect to HTML5 vs. GRDDL and RDFa at some point)? Just two of the obvious examples… There will be public slides, of course, but that is not the same:-(

Oh well… one cannot get it all, right? I am looking forward going to ISWC2007, but I will miss being at this event, that is for sure!

Blog at WordPress.com.