Ivan’s private site

April 26, 2009

WWW2009 Impressions

As usual, when making notes of a conference like WWW2009, in Madrid, one has only a partial view. This is all the more true for a conference of the size of WWWW2009 with around 1000 attendees and with 5-6 parallel tracks. I must admit that I usually have difficulties with so many tracks at the same time; I obviously loose some of the events happening, which is a source of unavoidable frustration. With this caveat, just some of the topics that I will probably remember…

The power of Twitter. Although this was not a “topic” of the conference, this was the first WWW conference where twitter was king. Twitter was everywhere, the #www2009 topic was getting several new entries per second (it even got spammed:-(, and other twitter tags were used for some of the specialized events (like #w3ctrack or #ldow2009) One could get a glimpse of what was happening elsewhere just by following these topics. In fact, this report is much more sketchy than usual simply because my own tweetes from the conference or, of course, all tweetes of the #www2009 topic can very well replace some of the notes I wrote in blogs in earlier years.

Social networks. Going beyond twitter, the ubiquitous presence of social networks, their effect on just about anything is still a major topic, like the continuous flow of papers trying, eg, to extract semantics from tag clouds (eg, the paper of Benjamin Markines et al) or the Googles and Yahoo!-s of this World trying to exploit these tags to improve their search results. (Yahoo’s experimental tag explorer is a good example trying to exploit these further.) Nothing radically new here, but progress is reported on all conferences, and this one was no exception. One of the keynotes, by Pablo Rodriguez from Telefonica, actually claimed that the needs of social networks in terms of network infrastructure are so different that they are bound to require changes on the hardware/firmware level of networks. Posting, for example, a video on a social site may create a sudden peak of high volume access (for example if posted by a “celebrity”) that makes it very different from the more steady flow of data that more traditional sites provide and require. For example local caching in routers might be needed. I am no expert in this at all (anything that is close to hardware is sort of a black box to me) so I cannot judge these statements but it was interesting to hear. Another interesting point he made was that “celebrities” of a specific network may (not necessary intentionally) start a dos attack against a site: think of the amount of http requests flowing to a site mentioned by one of these social network stars!

Web Science. There was a panel (organized by Nigel Shadbold, with Tim Berners-Lee, Ricardo Baeza-Yates, and Mike Brodie). The whole topic is still fairly open (at least for me): what exactly is Web Science and where are the boundaries? What types of research belongs to WS, and what is better kept outside to be handled by other disciplines? What type of abstractions would be necessary to study the Web as a whole (just as chemistry can be seen as a set of abstractions on top of physics)? What type of interdisciplinary research groups should be established? As far as I am concerned, I do not have a response to any of these questions:-( What I could see happening is that under the banner “Web Science” many different sub-disciplines will appear very soon and gain independent life without too much relationships among themselves. As far as I am concerned, I would be more interested by the relationship between the Web and society at large than by the technical aspects, but that is only me. An interesting practical point for the future is that there are plans to combine (eg, co-locate) future WWW conferences with Web Science events; that would really be a gain for both event series in my view.

Computing cloud. Yep, this comes up more an more often. Obviously a big deal in the keynote of Alfred Spector, from Google, but came up elsewhere, too. The a mini-tutorial on Hadoop, MapReduce, and Hive, given by Tom White as part of the Developers’ track, was really interesting and instructive for me. We know that the computing cloud has a great interest for the Semantic Web community; it may indeed be a tool to handle the significant amount of data out there. The LOD data is already available on the Amazon services (thanks to OpenLink), Chris Bizer and friends’ Mobile DBpedia makes use of cloud facilities, the LarKC project also makes use of massively parallel computing (I am not sure they use the cloud), too. Something to keep an eye on, that is for sure; I am sure the topic will gain more importance in future conferences. (And one more technology I should familiarize myself with…)

Power of data. Issues around search have become the dominating theme of the WWW conferences, and this one was no exception. Many research try to exploit the sheer amount and variety of data that has been accumulated by the big search engines, for example. I have heard several talks over the years coming from Google’s R&D lab (including a keynote at this conference). I must admit the overall impression I get from these is that a more or less straightforward exploitation of a huge amount of data is used like a sledgehammer for all problems. (I am probably unfair.) Ricardo Baeza-Yates (from Yahoo!) also reported some work in his keynote on, eg, analyzing the search queries themselves, ie, the paths of different searches performed by users between the time they begin some search and the time they find what they were looking for. (Interesting stuff! By the way, there is also a conference on weblogs and social media, ICWSM; one more conference coming up around Web technologies.) I also listened to a presentation on Yahoo!’s Boss by Ted Drake (again on the Developers’ track): what is interesting is that one can access to (a part of) Yahoo!’s accumulated indexes to build, eg, one’s own search engines but, I presume, one could also use this data for other type of research exploiting the data. Power of data for the masses? (I have heard of Boss before and I would have welcome more technical details at the presentation but, well…)

Web of data, a.k.a. Semantic Web. The conference started by a great workshop on Linked Data. I again rely on twitter notes and the general twitter notes for more details, no need to repeat them here. Suffices it to say that, beyond the individual papers, there were a general “buzz” in the air, a general enthusiasm that was reflected by the high number of participants (over 100). For anybody interested, it is worth looking at all the papers, they were good! Having said that, what I am really waiting for is to see many real application of the LOD (and not only experimental, university usage) but that takes its time; there were no really breathtaking news on that at the workshop.

But, of course, the workshop was for the converted; what was more interesting is to see that the Linked Data concept, and the Semantic Web in general, created more and more interest at the conference proper and not only for the long time Semantic Web adepts. Jim Hendler did a surprise presentation at the Developers’ track (surprise, because a announced speaker could not come, so he took his place) talking to non-Semantic Web developers about what can be done already today with this technology, about the excitement that is out there, about the companies that have already picked up this technology. It was good to get these messages out there again and again. Georgi Kobilarov did also a great presentation on DBpedia at the track; there were several people I talked to afterward who were really carried away by the possibilities opened up by having access to a huge amount of data through the unifying abstraction of RDF, RDFS, and possibly (a little bit of:-) OWL.

I also went to the Semantic Web referreed paper track, obviously. I must admit I was a little bit disappointed because lots of colleagues that I would typically see on such event that were not around. I presume ISWC has now become major competition to WWW in this area and when money is tight, people have to make a choice. In earlier years ISWC was considered to be much more theoretical while WWW had more practical papers, but the last few ISWC’s I attended seemed to indicate that this is changing. I think any of the WWW papers could have been presented at the ISWC without any problems. As a consequence, I guess many people decided that ISWC is a better place to be. It will be interesting to see how things will evolve in future; it is not impossible that Semantic Web, as a topic, will gradually move away from WWW to ISWC. (I would expect specifically Linked Data papers to appear at ISWC very soon!)

That being said: it was nice to see a paper on DERI Pipes (by Danh Le-Phuoc et al) or on Triplify (by Sören Auer et al). This is not the first time I heard about these but it is good to have them more widely published. There was a paper on a rule system benchmark (by Senlin Liang et al); although I am no expert on this, with the advancement of RIF it will be good to have such benchmarks being put forward. The paper of Philippe Cudré-Mauroux et al on the disambiguation of ID’s on linked data issue caught my attention: with the advancement of linked data we enter (as the presenter put it) an “ID Jungle” with tons of URI-s referring, more or less, to the same concept (eg, a specific person), and a simple owl:sameAs is not an ideal solution to handle this. The idMesh system provides a mean to analyze relationships among those ID-s. I must admit I did not follow all details of the paper but it is certainly one of the papers I will have to study in more details when I get to it!

W3C’s “camps”. W3C tried another model this year, replacing the more traditional W3C tracks by two ‘camps’ on mobile web and on social web. But… this is where the large number of parallel track backfired: I could not go to any of them:-( There were all kinds of overlaps with other presentations (eg, the social web camp fully coincided with the Semantic Web paper track). Pity, because the feedback I heard from participants was very positive. Sigh. Well, actually, courtesy of Fabien Gandon, I was present on the social web camp virtually, witness this slide

It was a slightly exhaustive but good week!

About these ads

3 Comments

  1. Thanks Jim!
    I was able to gather a fair bit thanks to Twitter, but a nice write-up helps :)

    Comment by Danny Ayers — April 26, 2009 @ 21:05

  2. Eek – I typed Jim! Sorry Ivan, must be the Hagrid slide confused me (brilliant, btw).

    Comment by Danny Ayers — April 26, 2009 @ 21:07

  3. […] a solution to tackle the problem of disambiguating entities (e.g. people) on the Web. Found via Ivan Herman’s www2009 impressions, see also Orri Erlang’s extensive www […]

    Pingback by www2009: Twentieth Web Anniversary « O’Really? — May 1, 2009 @ 17:18


RSS feed for comments on this post.

The Rubric Theme. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,618 other followers

%d bloggers like this: