Ivan’s private site

May 26, 2007

Jena supports RDFa

Filed under: Semantic Web, Work Related — Ivan Herman @ 15:12

This is certainly good news: Jena now supports RDFa. Thanks to Jeremy… (I am not sure when this will be part of the standard distribution of Jena and of Joseki, though, but it is only a matter of time…). I am sure other implementations will follow!

May 24, 2007

Faceted view of the WWW2007 papers

Filed under: Semantic Web, Work Related — Ivan Herman @ 10:34

In case you have not seen this yet: David Huynh has created an exhibit based view of the WWW2007 papers. Looks great and is much easier to navigate… I have certainly changed my bookmark!

May 23, 2007

“Unifying Reasoning and Search to Web Scale” (IEEE Internet Computing)

Filed under: Semantic Web, Work Related — Ivan Herman @ 8:44

An interesting short note by Dieter Fensel and Frank van Harmelen in the March/April issue of IEEE Internet Computing, entitled “Unifying Reasoning and Search to Web Scale”. Unfortunately, the IEEE articles are available for subscribers only, but here is the abstract:

Researchers have developed reasoning methods for rather small, closed, trustworthy, consistent, and static domains, but Web-scale reasoning remains elusive. The authors seek to merge semantic reasoning and search in something new that reflects proper unification without adding bizarre syntax to programming languages or nonscalable logic to superficially align Web principles and reasoning.

I think the essence of their argument is well summarized in a pseudo-code for how reasoning can or should happen on very very large scale (ie, on Web scale):

do
  draw a sample,
  do the reasoning on the sample;
  if    you have more time,
        and/or if you don't
        trust the result,
  then draw a bigger sample,
repeat

Of course, the devil is in the details, ie, this is a general approach rather then a solution (and the authors do not claim it otherwise). But it is certainly an interesting direction to consider!

(Bibliographic details: “Unifying Reasoning and Search to Web Scale”, by Dieter Fensel and Frank van Harmelen, IEEE Internet Computing, Volume 11, No. 2, March/April 2007.)

May 13, 2007

WWW Conference (4th day)

Filed under: Semantic Web, Work Related — Ivan Herman @ 17:30

The last day of a long week (I also had a W3C AC meeting before the conference…).

The day began with a keynote of Dick Hardt on what he calls “Identity 2.0”. (Everybody seems to live in 2.0 these days. I wonder whether I should not call myself “Ivan 2.0”, just to sound fashionable:-). His presentation was very enjoyable, well prepared, but a bit too abstract. I have the slight impression that he reused a presentation originally prepared for a non-technical audience, which means that it stayed on an abstract level without giving any details on the technology underneath. Pity. Anyway, the vision he has is to use a centralized “security agent” that knows all necessary data about you, about your different identities, etc, which would automatically check the claim with issuers of other identity elements (eg, did I really go to school at the place I claim) and would also send out identity information to relying parties. Without more details on how this would be done, where that security agent would be, etc, all this sounded a bit scary to me. On the other hand, Dick Hardt is (as far as I know) one of the editors of the OpenID spec, so he obviously had to think about all the pitfalls such an approach would have both socially and technically. This is why it was a real pity he did not take his time to go more into the technology details. Oh well…

The paper on Yago from Fabian Suchanek et al was pretty interesting. By analyzing wikipedia category entries, and reinforced by an analysis of the terms using WordNet, they create semi-automatically a large knowledge base with about 900,000 entities and with around 6 million facts on those entities. It has query interface, but can also be downloaded from their site. The system also includes a rule-like inference engine through which new facts can be found. All that is great and impressive but they have developed this in isolation, by using non-Semantic Web technologies. Ie, they did not use RDF (though they have triplets, in fact), they do not use OWL or Rules, SPARQL… as it stands, the system is completely disjoint from the rest of the Semantic Web. The only technical reason I heard at the presentation is that they had difficulties to time-stamp their facts, so they needed an alternative structure. I do not want to minimize the problem around timed RDF statements, and it is of course their right use whatever technologies they want for their research, but it would be a pity if it stayed that way. Luckily, this may not be the case. I asked this question after the presentation and Fabian said they would combine this somehow with dbpedia; and I actually found a reference on Chris Bizer’s latest slides on exactly this. My understanding is that Fabian and Chris found some ways of binding Yago to RDF during the conference. If so, Yago may become an impressive addition to the available Semantic Web knowledge bases!

(Note on 2007-06-07: Fabian Suchanek contacted me drawing my attention on two misunderstandings. I am happy to copy them here:

  • “The YAGO model (as described in the paper) is basically RDFS plus some additional semantics. YAGO can also be downloaded in RDF. ”
  • “The reason why we did not use OWL was that OWL does not support acyclic transitive relations – which are important for YAGO. Thus, we basically use RDFS and add acyclic transitive relations, but this is just the semantics.”

And yes, since then, Yago has been incorporated into dbpedia! Yey!)

Another paper I found particularly interesting was the paper “Analysis of Topological Characteristics of Huge Online Social Networking Services”, by Yong-Yeol Ahn et al. It is not really a Semantic Web paper (although it was part of the Semantic Web track) but more in direction of the Web Science discussion that took place on the first day. They retrieve connection data from social sites like MySpace, Orkut, or cyworld (the latter is a Korean social contact system) and create a huge graph from the “who knows whom” like relationships. They use then social network analysis to analyze the graphs. They found, for example, that in all these graphs they could spot a number of nodes with a huge number of links, way beyond the number one would have with “normal” social contacts (they referred to these nodes as “super users”). As an answer to a question it turns out that these super users are persons, and not various types of organizations or associations that regularly appear on these sites. They could also analyze how the graphs evolved over time. Beyond the specific result, I think the line of research is really interesting and important. By the way, the next WWW conference in Beijing (WWW2008) will have a “Social networks and Web 2.0” track; worth keeping an eye on that one.

Yong-Yeol also showed a nice cartoon on social spaces, an image that Sandro also discovered the other day:

A cartoon showing the various social spaces on an imaginary map

As I said before: it was a long but interesting and fruitful week (and even longer trip, because I spent a week in Boston before coming to Banff). And i still have to go through the conference proceedings to see what else is interesting that I could not attend. It is time I go back home tomorrow…

May 12, 2007

WWW Conference (3rd day)

Filed under: Semantic Web, Work Related — Ivan Herman @ 23:30

First of all, I realized how stupid I am. I mixed up my diary entry and I missed the Linked Data BOF at WWW2007. There is no excuse, other than human stupidity… But I found out that Paul Miller blogged even twice on that (on the 10th and the 11th), so at least I could read something about what happened (and Chris, Susie, Tim, and others also told me about it).

I went to the keynote of Bill Buxton. Great speaker, very entertaining presentation, but made me think a lot on what really the takeaway message was. The funny thing was that I did not really find any at first, but it made one think and discuss with others. And there are some general, though difficult-to-describe messages that I could take away after all: that we, computer scientists (or, if you prefer, geeks:-) should take a step back sometimes (often?) and realize that what happens around us in technology has a long history and background that is worth remembering (ie, it is worth put things in perspective sometimes) but, also, that our community has a major responsibility on how the social evolution happens. I liked his reference that the annoying ads on the web pages are also “our” fault in so far as the community has not yet come up with other business models that would be less of a, shall we say, pain in the back. Nothing fundamentally new, in fact, but are the kind of things that are always worth discussing again and again.

Then the Semantic Web track began, lasting for two days. Obviously, lots of papers, lot to read again at home once I am back home. The general picture (over the two days, actually) is that I missed the more “application” oriented papers, something like the “Semantic Web in Use” track at ISWC. Most of the papers were quite theoretical and more often than not concentrating on issues around ontologies, their management, etc, rather than on simpler levels and applications (the “dark side” of Jim Hendler). However, with this caveat, there were quite a number of interesting papers.

The paper of Christian Halaschek-Wiener and Jim Hendler (Toward Expressive Syndication on the Web) on analyzing news feeds using OWL had an interesting aspect (beyond the application proper), namely the speed gain they could achieve by a careful fiddling around with the core algorithms (see the paper for the details if you are interested). It refers back to what I wrote yesterday: that the Semantic Web tools being slow should be considered as an urban legend soon… The paper of Axel Polleres on Rules and SPARQL can also be related to the same issue: Axel shows how SPARQL queries can be mapped on some rule engines (like Prolog). Beyond the theoretical interest of the technique this also means that SPARQL implementations can capitalize on the long tradition of rule language implementations bound to, say, a relational database. Similarly, the paper of Jean-Sebastien Brunner (Explorations in the Use of Semantic Web Technologies for Product Information Management) showed how the group at IBM China could make spectacular speed ups of their Minerva environment: a query, also involving some level of OWL reasoning, was speeded up from an hour’s response time to a second!

David Huynh created quite a stir with his Exhibit presentation. For those who do not know it yet, Exhibit is a really nice tool to create very quickly a faceted browser for structured data. Great and nice stuff (I have actually used it for my publication and presentation list on my CWI home page). But what David did was great: he had the guts to make a presentation by building up a web page with his tools on the spot, editing his page directly while all the audience was watching. It went beautifully, and he got a well-deserved applause for his performance…

The paper of Kemafor Anyanwu on SPARQ2L defines path expressions and corresponding filters to SPARQL. It is the line of other, similar works (like PSPARQL, for example) this means that more complex and also more expressive queries can be made (e.g., (X,??p,?Y) means connecting X to ?Y to some path). Of course, SPARQL has to be finished first (hopefully by the end of the year), but these ideas are interesting and important when and if work will reopen for a second version of SPARQL…

A long day…

May 11, 2007

WWW2007 Conference (part of 2nd day)

Filed under: Semantic Web, Work Related — Ivan Herman @ 15:20

I already blogged on part of the 2nd day of the conference yesterday, so I won’t repeat that…

I was quite pleased by how the the Semantic Web session of the W3C Track turned out. The room was packed full, which is a good sign. RIF was presented by Sandro; this was the first time for a more public overview on what is going there and Sandro did a great job at that. Rules are becoming integral part of the Semantic Web technology landscape, I think the publication of the RIF Core will really be an important step. It is still a first draft, but more can be expected later this year. I remember joking with Chris Welty last December on that 2006 may have been “the year of SPARQL” and 2007 may become “the year of Rules”. Well, maybe not 2007 but only 2008, but nevertheless…

Harry Halpin and Fabien Gandon also did a great job on GRDDL and RDFa. Faben ran a bunch of demos on his machine, meshing up different data sources using microformats, eRDF, or RDFa, all meshed up with GRDDL and displayed in the browser via Javascript. Great stuff. Quite a lot of questions from potential users (I had to cut the discussion to move to the next presentation:-)

Susie Stephens and Alan Ruttenberg gave a demo on how Semantic Web technologies can be used in Health Care and Life Sciences. Susie gave a brief introduction, followed by Alan for the demo part. (The presentation was based on the work of the whole HCLS IG.) It was impressive. This was not just a toy demo: these HCLS IG guys integrated a whole range of public databases (Susie has a slide on which ones) concentrating on Alzheimer disease information, and Alan showed how this integrated data can be queried by various types of SPARQL queries, being able to ask questions with responses in few seconds, whereas finding the same answers via traditional means would have taken hours if not days. The data set contains around 350M triplets, stored on a commodity hardware at MIT, with a query response time of a few seconds. (More about the demo can be found on the IG’s wiki page.) There weren’t really questions, only Giovanni Tummarello stood up and said “thank you”. I had the impression that he expressed the reaction of most of the people in the audience; certainly mine.

By the way, the demo highlighted one more thing: the remarks whereby the Semantic Web tools are slow become really outdated. Of course, improvement are always welcome, but getting a SPARQL query response on 350M triplets in a few seconds is a very respectable time and, as we know, these are by far not the top scores, we are hearing about triple stores storing several billions of triplets, significant improvements on reasoner responses, etc. Lacking speed of SW tools gradually become urban legends…

In some ways, this leads to the afternoon session I went to, namely the panel on Multimedia Semantics. The panel was a little bit hijacked by a slightly provocative remark by Mor Naaman (Yahoo! Research) who declared the Semantic Web dead. It was the usual Web 2.0 argumentation, referring to (in this case) Flickr tagging, including Flickr’s machine tags, as a superior way of using semantics and making all Semantic Web approaches obsolete. Although I have the impression that Mor did not mean this 100%, and was putting this on his slide for the sake of generating discussions. He did a good job at that, because some of us took the bait… I talked about a remark from Giles Day at a conference earlier this year on the importance of ontologies within Pfizer instead of tagging (I blogged about this back in March). I think we agreed that tagging à la Flickr and more closely Semantic Webby tools can happily live side by side, depending on the application at hand (sometimes tagging is the o.k. answer…). And, in fact, tagging, mainly machine tagging, is really not that different from Semantic Web tools; machine tags are RDF triplets in disguise, it is just very unfashionable these days in certain communities to call this RDF. Actually, Dave Beckett also came up to the microphone, referring to his Flickrurl library that can be used to export Flickr tags to RDF, and that it was a very easy thing to implement… We also agreed with Mor that the image of Semantic Web, or RDF in particular, is still misleading out there and people still have the think of it as something too complex. Sigh, we still have a long way to go in changing this image and turn this into an urban legend, too…

It was a good day again.

May 10, 2007

WWW2007 Conference (1st and part of the 2nd day)

Filed under: Work Related — Ivan Herman @ 22:58

I have arrived to Banff last Saturday already (we had a W3C meeting before the WWW2007 conference). Absolutely beautiful place; I have some photos on the Web from an earlier visit, but I already have a set of additional ones on my machine (to update the photo site sometimes next week).

Meeting, seeing lots of people; I do not even want to make an attempt listing them, I am sure I would forget somebody. It is almost too much; there is no time to really sit down and have quality time with one person only…

I was at a Web Science panel yesterday morning, largely initiated by WSRI. It was a good panel with Peter Patel-Schneider, Danny Weitzner, and Nigel Shadbolt as panelists. This Web Science stuff may become really interesting. I think that the discussion on whether this is science or not is a bit sterile (I think it was Nigel who said that), what is really interesting (for me…) is that this can lead to a set of research by “soft” sciences on what the social effects of the Web are. And this is really exciting stuff. How does the Web change societies, politics, the way people interact with one another, the way they provide a common knowledge… Peter referred to this as cultural anthropology, which may be the right term; whatever it is, I think there are really lots of things to do there. It was interesting that the keynote presentation of today by Prabhakar Raghavan referred to the same issue; he spent a pretty large part of his (quite nice) keynote on the need to understand social behaviour, the social incentives mechanisms behind some social sites, etc. “Computing meets humanities like never before — sociology, economics, anthropology”, as he put it. Lots of things to keep an eye on…

I quite liked the paper on CSurf in the User Interface section. A nice way of analyzing web page links by building up the context around a link on the source and then analyzing the target page to match that context to have a quicker, more efficient non-visual browser. Great for blind people but, also, for mobile web usage (though the latter is not yet available publicly). This stuff is worth remembering.

Another paper on the same session was the GeoTracker: analyze the content of an RSS feed for possible clues for time and/or geographical location information, and then display the results on a map. Ie, instead of having a list of RSS feeds on the screen for, say, news items, I can have a map with little flags put on the location(s) of what the news item is all about. Cute stuff, though clearly not easy; this information can be very ambiguous (mainly in the US where city names are repeated in almost all states…). Nevertheless, exploring new user interfaces for RSS feed is always a nice thing to do…

Nice reception, again chatting with a huge number of people… I got exhausted at the end of the day. But it was worth it.

Blog at WordPress.com.