Ivan’s private site

November 28, 2007

GRDDL support in proposed Dublin Core in X/HTML guidelines

Filed under: Semantic Web,Work Related — Ivan Herman @ 10:07
Tags: , ,

I think this is worth emphasizing. Pete Johnson’s mail in the SW IG:

The Dublin Core Metadata Initiative currently has a proposed update to
its specification for expressing DC metadata in X/HTML meta and link
elements [1] available for public comment [2].

That document serves as an X/HTML “meta data profile” document, and
although this isn’t stated explicitly in human-readable form in the
document itself – it will be in the final version! -, it includes a
GRDDL profile transformation link. So for any XHTML instance using this
profile, a GRDDL-aware processor can extract the corresponding RDF
triples. The draft XSLT transform itself is currently hosted elsewhere,
but will be assigned a DCMI URI when it is more fully tested.

Comments are welcome, and should be sent to the DCMI Architecture
Community mailing list [3].

Cheers

Pete

[1] http://dublincore.org/documents/2007/11/05/dc-html/
[2] http://dublincore.org/news/2007/#dcmi-news-20071105-02
[3] http://www.jiscmail.ac.uk/lists/dc-architecture.html

This is a major step in bringing the Dublin Core metadata into RDF smoothly…

November 20, 2007

Life in the parks of Beijing

Filed under: General,Private — Ivan Herman @ 10:00
Tags: ,

People dancing at a park in Beijing

Lot has been written about Beijing, about the way the Chinese think, how different they are from, say, Europeans or average Americans. Well, here is one more small element that stroke me again last Sunday.

It was a beautiful day, so a a friend and I went to walk in the Jingshan Park. It was full of people. And lots of them did what would certainly be difficult to see in a European park: people spontaneously got together, start to sing, dance, or listened to a (again spontaneous) story teller… The music could be traditional Chinese or some “modern” pop; people could be old and young, men or women, they could be good dancers or clumsy like a bear… no problem. I saw a woman of a certain age walking by, putting her coat down, joining the dance for 5-10 minutes, then picking her coat up again and walk away again. Smiling.

I wonder what makes the difference. It seems that people in Beijing (and in most of China I guess, I remember seeing similar scenes in Hong Kong) have much less inhibition for these sorts of things than, for example, we have in Europe. Sure, you may see people dancing and singing in, say, the Vondelpark of Amsterdam but almost exclusively groups of friends and certainly under a certain age. Beyond that… well, yes, if you are almost a professional, if you are one of the very few who looks like a photo model but otherwise? Hell no… You do not want to ridicule yourself, do you?

ISWC2007 — second and third days (with a little delay)

Filed under: Semantic Web,Work Related — Ivan Herman @ 4:13

First of all why the delay? Well, I typed this blog at the airport of Busan and stored it on my machine. So far so good. But then I arrived to Beijing (where I am now) and I wanted to post them on my blog… and I failed. It turns out that all blogs of the type XXX.wordpress.com have, ehem, difficulties to be reached from China. Why? No idea. It took me several days to find a way to access my blog again and post the original text. It is a few days old; hopefully it is still ok…

The paper of Saggion et al. reported on an EU project (called MUSING) which applies Semantic Web techniques to business intelligence. Business intelligence is one of the possible “verticals”, ie, application areas of the Semantic Web that one hears about, but I have not seen that many specific examples yet. The reason why this may be an area of interest is obvious: it is really about gathering, integrating and analyzing very different datasets (public or otherwise). So it was good to see something more specific. These type of projects might be a good application examples for the Linking Open Data project results, too: some of the data this MUSING project relies on, for example, is already available there (although they did not realize it until now:-(

YARS2 has already found its way in the news around Semantic Web as one of the high-end triple store facilities. But it was good to hear about the technical details from Andreas Harth. A cluster of servers offering a distributed storage of many many millions of triples… Cool!

It was interesting that two new Semantic Web search and indexing engines were published sometimes last week more or less at the same time, namely Sindice and Falcons. And so happens that, on the one hand, Giovanni Tumerello and his team made a presentation on Sindice at ISWC2007 and, on the other hand, I have just arrived to Beijing for a Chinese Semantic Web Symposium where the development team of Falcons (from the Southeast University in Naijing) will also make a presentation. Nice coincidence. Staying with Sindice for now: it is really a cool tool. One aspect I liked is the extra functionality they have put in to retrieve resources via inverse functional properties. I am curious to see when extensions to Sindice, for example, will appear for Semantic Web browsers (the authors claim that is is very easy to do…)

I was interested by the paper of F. Abel et al., adding access control to an RDF query. What happens is as follows: (1) the access control policy is described in what looks like a simple rules language; (2) when an RDF query arrives, each triple pattern is analysed by a rule engine and the result is the extension/modification of the original RDF query (eg, filters or extra triple patterns are added). This modified query is then processed by a usual query engine (at the moment they did not implement it for SPARQL, but they plan to do it soon). This looks like a neat idea and a good use case for SPARQL (access control is certainly one of the very important issues that comes to the fore these day). This is all the more interesting because, now that SPARQL is a Proposed Recommendation (yey!), it will be interesting to collect use cases to see if extensions to SPARQL will become necessary or not (probably yes).

There was an interesting BOF on the third day (we should be grateful to Paola Di Maio who has pushed for it) on usability and Semantic Web. There were some discussions as for what this exactly means: for some, this not only includes the (human) usability issues of Semantic Web applications, but also the programmers’ usability of various SW API-s. (Although, personally, I believe these two issues are better kept separate.) It was decided to continue discussions on a separate mailing list and to try to build a more visible and active community around the issue. The (hitherto pretty much dormant) SW User Interaction mailing list, ran part of the W3C Semantic Web IG, will be used for that purpose. If you are interested, sign up!

The third day also included a panel with the catchy title “Prodigy or Sociopath: the Adolescent Semantic Web”. I very much liked the format of the panel. I am often bothered by panels where each panelist makes a short (well, sometimes not-so-short) presentation first and then they expect the audience to participate after 40-45 minutes into the panel’s time. That model often fails in practice. In this case the organizers formulated 5 statements (like for example “The SW mainly requires light ontologies”), each question was elaborated upon by one panelist, and the floor was open for discussion after each such statement. And this model worked well although, as far as I could see, nothing very controversial was said after all during the discussions. Maybe the question “We need social scientists to join the SW conferences” was the one generating most of the comments. (Although I think everybody agreed that some sort of stronger cooperation with social scientist is really essential. As far as I am concerned, this is the most interesting aspect of the Web Science Initiative, too.)

Last but not least: there was a great keynote by Chris Welty. The type of keynote that forces some sort of humility and self-analysis on what should be and shouldn’t be done and how; to realize that, well, everyone of us gets it wrong, maybe more often than not, and that should be all right. For me, the most important message was his “Better not Perfect” statement: we should recognize the value when something is really “better” and accept it, use it, disclose it, even if we know it is not “perfect”. Scientists (I must also say, mainly European scientists) have often the tendency to try to cover, eg, all the edge cases thoroughly, strive for aesthetics, etc, thereby often loosing both simplicity and speed of deployment and usage…

It was a good conference. Some areas were missing, although this may not be because it was not tried by the organizers but simply because it did not succeed (yet). For example, there was only one paper on relationships to (traditional) databases (the paper of W. Hu et al) although this area is really very important (check out the presentations of the W3C Workshop on the subject and consider the fact that, according to some statistics, almost 80% of the current Web content comes from databases…). Security, privacy, and related issues were also more or less missing, with the exception of the access control paper that I already mentioned (although there was a separate workshop on the subject, but there were no real presence at the core conference). I would have loved to see more application papers, too, but maybe that is asking too much from a mostly scientific conference. There were also less “buzz” than at ISWC2006 on the corridors, coffee breaks, etc; this may have been the result of both the setting (this conference was a bit lost in a major conference centre) and the predominance of Asian attendance who are known to be less outgoing than their European, Australian, or US colleages. But, again, it was a good conference and a good week!

November 14, 2007

ISWC2007 — first day

Filed under: Semantic Web,Work Related — Ivan Herman @ 0:45

It is always difficult to write about a conference… One has to make a selection of sessions, there is jet-lag involved (big time:-), hallway conversations instead of sessions, etc. This means that one will clearly unfairly forget some papers and presentations. Well, my apologies, but that is the way it is…

That being said: the keynote of Brewster Kahle on the open library project was certainly interesting. For those who may not know, these guys have no less of a goal than to make most of the culture of this World (books, music, video, etc) available on the Web, for eternity, and free of charge. Fascinating goal, though they are clearly facing huge hurdles, least of those being technical. The really tough problems, as Brewster emphasized several times, are legal: copyright and similar issues…

I must admit that such projects always make be a little bit, well, suspicious. Cleary, Brewster and his team are making huge efforts to be as international as possible, they work with people in Egypt, China, countries in Africa and South America, etc… nevertheless, I am always a bit afraid about other cultures being left behind by such an endevour. With the best will of all involved, the result will be highly influenced by those who really have the means and the possibilities to participate; ie, with the way the World is shaped today, it will be dominated by Anglo-Saxon, primarily American view of the World. What about small languages and cultures? Of course, this is already a problem: who has read the poetry of, say, Miklós Radnóti? A Hungarian poet, whose war-time poetry is one of the most dramatic account of life in work camps during the war… but he wrote in Hungarian. Difficult if to translate properly (the wiki page refers to some translations, but, well…). On the other hand, imagine a world where this open library is the reference point for, say, literature around the World, including for Hungarians, merely because this will be the reference on the Web? One can already see a similar effect with Wikipedia; in spite of the local language versions, the English version dominates, and, well, not only “if it is on Wikipedia, it must be true” but also “if it is not on Wikipedia, it does not exist”…

Well, enough wining. I am really hopeful that these guys will prove me wrong. However, another point is more technical and exciting; let me quote here directly from the Linking Open Data Wiki page:

Brewster Kahle from The Internet Archive gave a most inspiring talk on “Universal Access to Human Knowledge”. He proposed a challenge to the SemWeb community to work with them to interlink their Open Library project into the SemWeb. It is a gold mine of data for the LOD group. TomHeath & I (DavidPeterson) had a quick chat with Brewster and he is extremely interested in this work and opening a line of communication. Even to the point of putting a new /RDF/ style link into their URIs for books (ex: http://archive.org/details/owlandpussycat00leariala). They already have a mass of metadata.

And it seems that this discussion has already started on the LOD mailing list. Yes, it would be a formidable addition to the Open Data set!

I quite liked the RDFSync paper of Morbidoni et al. It is one of those papers which have a relatively simple and nevertheless powerful message: one can partition an RDF graph with Minimal Spanning Graphs (MSG-s); each of those partitions can be individually digitally signed and checksumed (using Jeremy Carroll’s algorithm); the full graph can therefore be represented (uniquely!) by a lexicographically ordered set of such checksums. Why is that good? Because if copies of RDF graphs are to by synchronized, one can check those individual cheksums, and move over only those parts of the graphs where those checksums are different (ie, the underlying partition is different). When data are duplicated, for example, the win can be huge. As I said, a basically simple and nevertheless very powerful approach. Check it out. It is worth it.

I also quite liked the paper of Alani et al., reporting on a study on a pilot project with various governmental organizations in the UK to use SW technologies in their operations. This is a typical use case paper: what are the (possibly non technical) hurdles to overcome, how to create a value proposition without scaring these organizations away, etc. I think it is a good reference paper for everyone who intends to use SW technologies in practical projects with participation of non-SW savy partners.

There were, obviously, a number of papers where I got the feeling “there is something to check out here at some point”, but I could not necessarily follow all the technical details during the presentations. This was the case, for example, for the paper of Zaginis et al on computing deltas of RDF Models, or the details of another interesting case study of Srinivas et al on how to choose clinical trial candidates using complex medical ontologies, patient records and (highly non-trivial) inferencing. The paper of Tamilin et al. on Heterogeneous Ontology Environments was also interesting; I must admit it is the first time I saw a formalism on distributed DL processing. All these leave me some homework…

See what the next day will bring!

November 6, 2007

Dublin Core Metadata and RDF

Filed under: Semantic Web,Work Related — Ivan Herman @ 17:55
Tags: , ,

The DCMI has just released a new version of their document: “Expressing Dublin Core metadata using HTML/XHTML meta and link elements”. I am not really familiar with the details of the DCMI process but my understanding is that public comments are still welcome before finalizing the document. It is interesting to note that a clear mechanism is provided to define and use the different DCMI vocabularies via some sort of a namespace declaration mechanism, ie, this is not only to use the well known properties like dc:date. Check it out!

Although not explicitly said in the document proper (everything is expressed in terms of the abstract Dublin Core data model) but it is noteworthy that the specification provides a clear way of getting the DC metadata into RDF. On the one hand, the abstract model can be expressed in RDF (see again the data model definition), and the encoding in HTML/XHTML is based on an HTML profile for DCMI. And using this profile, one gets a hook into the GRDDL processing world.

This more than just a theoretical possibility. Pete Johnson, one of the editors of the document, posted an extra mail on the DCMI archives referring to the fact the namespace document is already prepared for GRDDL and it links to an appropriate XSLT transform. Ie, already today, if one uses the profile and the conventions described in the DCMI documents, Dublin Core metadata can be mashed up with other RDF data via GRDDL. Eg, the example document on the DCMI site can yield the RDF content either via the W3C GRDDL service. Yey!

B.t.w.: it is interesting to note that the usage of (X)HTML profiles begins to come to the fore. GRDDL is based on profiles; the POWDER specification (see my earlier blog) uses profiles; now DCMI uses it big time. This is important to remember: there are voices in the HTML world who would like to strike HTML profiles. As far as I know (I may be wrong, though) the main reason being: it is not used. Well, this is changing!

November 3, 2007

Dostoevsky, the early psychologist

Filed under: General,Private — Ivan Herman @ 16:20
Tags:

I read quite a lot of Dostoevsky as a teenager (alongside other Russian novels of his age) but I have not done it for a long time. Lately I read (again) one his novels, “The Insulted and Humiliated”. Behind the sometimes bit oldish literary style, it is quite an amazing book. Something I did not realize as a teenager: most of the human characters in the book appear as if they were study cases in a psychology handbook. Their reactions, their actions, etc, are all based on complex human behavioral patterns that could have been described by Freud. The noteworthy fact, however, is that the book was written in 1861, when Freud was still a 5 year old… I think I will have to pick the other novels of Dostoevsky from my bookshelf and read them again!

November 2, 2007

Powdering logos

Filed under: Semantic Web,Work Related — Ivan Herman @ 12:59
Tags:

After some commotion the copyright issues around the W3C Semantic Web logos are now settled. A few simple lines of RDF/XML have also been added to the SVG versions of the logos, essentially using two cc:license predicates connecting to two URI-s in W3C space. SVG is nice, because you can add RDF/XML into the SVG file itself, making such additional metadata easy to add.

But what about the other files? Well, that is what POWDER is all about. And because the best way learning something is to use it, I decided to put the copyright statements in place via POWDER. Of course, this is still a moving target, i.e., the exact predicate names might still change, the tools to follow up and extract all the triples are not yet in place, etc, but the general structure is there. So it is worth learning it.

The first step is to create an RDF file containing the so called POWDER resources. There are two types of license statements, depending on whether the logo includes the W3C logo or not. So I have to begin by:

<wdr:Package rdf:ID="SWLogos">
  <wdr:hasDRs rdf:parseType="Collection">
    <wdr:DR rdf:about="#SWLogosWithW3C"/>
    <wdr:DR rdf:about="#SWLogosWithoutW3C"/>
  </wdr:hasDRs>
</wdr:Package>

So far so good. The resources #SWLogosWithW3C and #SWLogosWithW3C are where the real meat is: information has to be provided on the “scope”, ie, which resources we are making statements on (in this cases the logo URI-s themselves) and the “descriptors”, ie, what those statements are. So we have the following structure:

<wdr:DR rdf:ID="SWLogosWithoutW3C">
  ...
  <wdr:hasScope>
    ...
  </wdr:hasScope>
  <wdr:hasDescriptors>
    <wdr:Descriptors>
     ...
    </wdr:Descriptors>
  </wdr:hasDescriptors>

The second (i.e., descriptors’) part is easy. One defines properties for the <wdr:Descriptors> resource (i.e., the cc:license properties); the idea is that these properties will apply to the the resources that are identified via the object of the <wdr:hasScope> predicate. (Yeah, that is a little bit tricky; properties are defined for a subject that are then also applied to other subjects via a POWDER processor… There are discussions around this, and it is still a design issue the group is fighting with).

The more interesting part is to identify the resources, and this is where the scope comes in. For that purpose, URI patterns are used: one should say that, well, the URI-s are on the W3C domain, starting exactly with this and this path, they should end with specific suffixes (ie, svg, png, etc). As a small extra complication in this specific case, names that have the string ‘w3c’ in them should be excluded; indeed, as said above, another type of license applies to those. The definition of these resources is very much at the heart of the POWDER specification, so it is spawn into a separate document.) Here is how it is done:

<wdr:hasScope>
  <wdr:ResourceSet>
     <wdr:includeHosts>www.w3.org</wdr:includeHosts>
     <wdr:includePathStartsWith>/Icons/SW/</wdr:includePathStartsWith>
     <wdr:excludePathStartsWith>/Icons/SW/Buttons</wdr:excludePathStartsWith>
     <wdr:includePathEndsWith>png svg gif eps</wdr:includePathEndsWith>
     <wdr:excludeRegEx>w3c</wdr:excludeRegEx>
   </wdr:ResourceSet>
</wdr:hasScope>

It pretty much says what I wrote above, right?

The last element of the puzzle is to make this file available somehow. One way is to add a reference to the W3C Semantic Web logos index page itself. How? For example, use an HTML profile defined by the POWDER group and a single <link> statement referring to the resource file itself. A POWDER implementation will, eventually, do the whole POWDER processing, returning triples like:

<rdf:Description rdf:about="http://www.w3.org/Icons/SW/sw-cube.gif>
   <cc:license rdf:resource="http://www.w3.org/2007/10/sw-logos.html#LogosWithoutW3C/>
</rdf:Description>

Cute… I am eager to see the first implementation!

Of course, there are a number of issues I did not cover here (authentication, dates, etc); but maybe made you curious enough to read the document itself!

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 2,317 other followers