Ivan’s private site

February 13, 2010

Semantic 3D (Visit to the Fokus 3D workshop)

I had the pleasure, in the past two days, to participate at a workshop called Fokus3D. It was the closing event of a European R&D project of a similar name, concentrating on what is called Semantic 3D. I was invited because the project made use of certain types of Semantic Web technologies (e.g., OWL) and, also, because it is the community of my previous professional life: I did spent many years in Computer Graphics… (Which also meant that I met old friends and colleagues that I had not seen for many years, which was really very pleasant…)

So what is this “Semantic 3D”? What does it have to do with Semantic Web?

Here is a a crash course on 3D graphics: when systems display those beautiful graphics 3D objects that we are all used to, the underlying system transforms complex mathematical descriptions of shapes, surfaces, or 3D bodies into a load of (triangular) meshes that are displayed by the graphics hardware. The mathematical descriptions are purely geometrical and define, say, spline surfaces, planes, or some geometric transformations that place those surface description into space.

These 3D objects represent, usually, some real object. A chair, a car, a tree, or a house. The representation of a chair is a combination of several such shapes; some of those describe the arms, the back, etc. But this information, i.e., that this and this combination of shapes is actually the arm of a chair, is usually lost somewhere in the process. Modelers start with a concept, a “semantic”, and end up with shapes; information is gone on the way. This means that many things cannot be done well: one would like to have semantics based search (“searching for the arm of a chair”), one would like to know the origin of a a particular shape (i.e., how was it created, under what process and transformation), one would like to follow the evolution in time of a particular shape to retrace the designer’s actions, etc., etc. And, due to the huge number of shapes, managing this type (meta) data is far from obvious. Keeping that information in a manageable way together with the geometric processing: we get Semantic 3D.

There was, of course, a slight confusion of terms for me: this notion of semantics would be considered as (meta)data for Semantic Web people. That being said, such data requires controlled vocabularies, and very complicated ones at that, so there are strong connections nevertheless. But there is also semantics in terms of knowledge representation: There are relationships among, and classification of, shape elements, these relationships can represent constraints and other features that can be used for reasoning, for inference. So more complex ontologies come into the picture (and OWL is widely used in this space). These ontologies are often application dependent, reflecting the diversity of application areas from CAD to gaming, or from cultural heritage systems to medical and biological applications. In future, such ontologies should also incorporate features like uncertainty (reflecting the fact that, at least in some areas like protein modeling, those relationships are not necessarily crisp); they should also include features such as provenance or time relationships.

Last but not least: there are lots of data there. I mean lots, stored in biological databases, shape libraries, scanned historical artifacts, each representing an object (like the reproduction of the Ramses statue on the figure) with many many shapes. Integration of that data is a challenge even within one application area, let alone with data at large. It will take a long time when this data will be organized in a way that it could be, say, exposed and integrated as Linked Open Data. But we may get there, eventually (and your truly has done his best to convince the community of the value of doing that…). Standard representations have to be developed, algorithms crystallized, vocabularies and ontologies defined, etc. The good news is that there is a community that is determined to continue working in this direction. The workshop organizers plan to write down a research roadmap (to be put on line within 1-2 weeks), and a special issue of the journal “Computer & Graphics” has been announced, co-edited by Bianca Falcidieno, from the CNR in Genova, and myself. So… stay tuned.

Reblog this post [with Zemanta]

December 6, 2009

LOD and the top 10 SW products 2009

Filed under: Semantic Web,Work Related — Ivan Herman @ 11:19
Tags: ,

Richard MacManus has just published the “Top 10 Semantic Web products of 2009” (see part I and part II) in ReadWriteWeb. What I found interesting on that list is to see that products have been included that  are related to, and are using, the output of the Linked Open Data project: Open Calais, Zemanta, BBC’s Semantic Music project, Freebase, DBPedia, Data.gov. (Ok, listing DBPedia as a “product” may not be absolutely right, but, well…).

Why is this interesting? Because one of the negative comments that one hears sometimes (often?), related to the LOD, is that this is nothing more than an academic exercise, ie,  it does not make any sense for business. Well, here we are!

November 2, 2009

Promise hold (NYT and the LOD)

Filed under: Semantic Web,Work Related — Ivan Herman @ 11:41
Tags: ,

I was at the SemTech conference in June when Evan Sandhaus from the New Your Times gave a keynote and when he announced that the NYT would gradually publish many of their data as Linked Data using Semantic Web technologies. Unfortunately, I had to leave on the last day of the ISWC2009 last week when they announced to keep their promise, and release the first 5,000 subject headings tags to the LOD. Which is really great news.

I remember Evan saying in Santa Clara (maybe privately, I do not remember that detail) that they are newcomers in this area, and it will be difficult to get it right (and, well, there are bugs, as, for example, Eric Hellman or Richard Cyganiak pointed out in their respective blogs). But I think we should really applaud when such a promise is held…

October 26, 2009

ISWC2009 I.

20091026046This year’s ISWC is held in Chantilly, Virginia. In a nice conference building in a beautiful park with autumn colours that, for reasons I do not really know, is always much more striking and amazing in America than in Europe. It is a bit of a pity that it is so far from Washington but, well, you can’t get it all…

First day: tutorials.

(For me, because there were also a bunch of workshops.) In the morning I was at the tutorial on how to consume Linked Open Data, by Juan Sequeda, Jammie Taylor, Patrick Sinclair, and Olaf Hartig; in the afternoon I went to the one on legal and social frameworks for sharing data on the Web, by Leigh Dodds, Jordan Hatcher, Tom Heath, and Kaitlin Thaney.

Juan and his  friends had actually a difficult task, and that became clear right at the start during the intro of Juan: part of the audience did not really know what LOD was all about, whereas there were also others who were, shall we say, old timers on the subject. I think the speakers did a really good job in navigating through these constraints, making short introductions to what LOD is all about but talking about issues and showing examples that were interesting for all of us. Kudos to that. Issues were raised by the audience that were really to-the-point (who should create sameAs,  links, how trustworthy are they, how to choose vocabularies and how they map to one another, etc) and, in his closing slides, Juan actually gave a list of the open  R&D issues in LOD. Worth looking at those (and no reason to repeat the list here…). B.t.w., the slides of the tutorial are on line.

One very interesting technology I heard about that, shame on me, but I did know was a tool based on a traversal based execution scheme for SPARQL called sqin.  Olaf did a presentation on that. What essentially happens is  as follows. At the beginning the default graph of the SPARQL query is empty. However, the system would systematically fetch RDF triples by dereferencing URI-s in the query pattern, adding those to the default graph. The query is matched against it, some variable will match thereby ‘adding’ new URIs to the pattern. And the process starts again, possibly yielding a complete solution (or more) to the original query. At the end of the process, solutions will be found on the Web, even if the system itself does not have any ‘real’ data behind it at the start. Of course, no one can secure that all solutions will be found, and you need some ‘seed’ URI-s in the original query pattern, but it nevertheless looks like a very powerful tool to explore, say, the LOD.  Very interesting!

Then there were some examples on how LOD is used. Jammie talked about Freebase, and how Freebase is, in fact, a way for everybody to easily add information to the LOD (after all, Freebase works like a wiki, and all the data is reflected on the LOD).  He also had a very important message that is worth repeating (go to his slides for the rest): it takes very little effort to add a republishing capability to your triples store based application, thereby extending the general LOD. So… do it! This is how the system evolves…

Patrick described a quite geeky system that the BBC folks have developed (hopefully will become public soon): take the BBC’s musical data in RDF (which is available), plus the LOD cloud, plus… an IRC bot. What you get is an IRC channel which will pick up data on music, including the sound tracks, photos, etc, and display it on the machine. I presume you  can give orders and preferences through the IRC. Obviously a geeky stuff not for the masses:-) but shows what you can do…

The afternoon tutorial on the Legal and Social frameworks was of course very different. I think one of the many, but maybe the most important aspect of this tutorial is that… it took place! This may sound a bit strange but it is important for all our community to realize that we will have issues around copyright, licensing, waivers, etc, when it comes to the Web of Data, whether we like these issue or not. Tutorials like this, written notes and information, etc, are essential. Let us face it: most of us do not understand the details of the legal issues. So I was simply listening and trying to absorb what I heard…

I do not want to repeat the details of what I heard here; one thing I learned over the years is that I should leave legal argumentations and descriptions to those who really understand that. Ie, look at the slides. It is worth it. But just to show the complexities: I did not know or fully realize that there are major differences what can or cannot be copyrighted among countries: for example, a phone book cannot be copyrighted in the US or Europe, but can in Australia. That the seemingly simple notion of ‘attribution’ can, in fact, become an endless pit when it comes to data and the queries thereof (eg, if I have a filter in a query that results in data, should I give an attribution to the fact that were, in fact, filtered out?). Etc.

There is also a takeaway message for me (though it may be quite trivial) among the things I learned. Tom showed some practical examples on how can one add, say, licensing information to data by adding some RDF triples. However, for a larger data set the licensing may be different within the dataset. Eg, if you retrieve data from somewhere, and you enrich it with additional metadata, the metadata itself may have a different licensing (it is yours) than the data that you use (which may have its own licence). What this means is that when you organize your data internally, you should think about the licensing information you will add well in advance: organize your URI-s accordingly, for example. If you don’t, and you want to add license at the end, you might find yourself in trouble! Sounds like a simple message, but it is important. (Reminds me of what accessibility people always say: if you take accessibility issues into account right at the beginning when you build up a Web site, it is not complicated; but if you have to add accessibility features after the facts, it may become hell…)

By the way, Leigh has made a kind of an overview of the current ‘blobs’ on the LOD cloud to see whether any kind of licensing information is available or not. He has an overview of the results in his slides. The main fact is: the majority of data sets has no information whatsoever (or, at least, nothing that can be found in about 10 minutes)…

It was a good day. Looking forward to the rest.

June 20, 2009

SemTech2009 impressions (addendum)

I wrote a blog yesterday on my SemTech impressions; I realized this morning that I forgot to add an item although I intended to.  Peter Deitz did indeed a presentation on a site called “social actions”: essentially a specialized index and search engine on various social, non-governmental actions around the World that one might want to join, contribute to, etc.  (Eg, the search on climate change will point you to a number of corresponding actions aroud the globe.) The interesting aspect, from the Semantic Web point of view, is that Peter would like to integrate the data, the access, etc, to the rest of the SW, essentially to the LOD (although he did not use this term), but he needs (and asks for) help from the community. Beyond the clear value of this particular dataset this is becoming a pattern (the NYT example in my blog yesterday is similar): people realize the value of publishing their data in a Linked Data format, but it is difficult to make the first steps. Even more tutorials, descriptions, and mainly community help is needed. That is essential for the success of Linked Data!

Reblog this post [with Zemanta]

June 19, 2009

SemTech2009 impressions

The first and possibly most important aspect of SemTech 2009 is that… it happened! I must admit that back in April-May, when the conference’s Web Site did not include any news of the program yet, I was a bit concerned that the general economic malaise would kill this year’s conference. O.k., I might have been paranoiac, but I think some level of concern was indeed legitimate. And… not only did the conference happen as planned, but the numbers were essentially the same as last year’s (over 1000). I think that by itself is an important sign of the interest in Semantic Technologies. Kudos to the organizers!

A general trend that was reaffirmed this year: by now, Semantic Web technologies are the obvious reference points for almost all presentations, products, etc, that were presented at the event. RDF(S), RDFa, OWL, SPARQL, etc, have become household names; newer specs like SKOS or POWDER may not have been as widely referred to yet, but I am sure that will come, too. Linked Data (and, more specifically, the Linked Open Data cloud) were almost ubiquitous this year while I do not believe that it was even mentioned last year. That is a huge change (although I still miss real “user facing” applications of LOD to show up; some, like Talis’ system deployed at UK universities, were presented but not as part of the regular conference). All that being said, I somehow seem to have missed more sessions than last year, which make my impressions more patchy. There were several journal interviews that I could not escape, hallway discussions that were great but made me miss a presentation here and there… I guess this is what happens when you have such a number of people around!

Tom Tague (from Open Calais) gave a very nice opening keynote. His talk was actually not on Open Calais (he did that in 2008), but rather on his experience in talking to different people who tried to start up new ventures in the Semantic Web area (a quote from his talk: “in 80% of the discussions I did not understand what the vendors wanted, and I walked away with my cheque book intact… Simplify!”). The main areas that he looked at were tools, social, advertising, search, publishing, user interface. One of the remarks I liked was on search: in his view (and I think I agree with that) Semantic Technologies may not be really interesting for general search (where the statistical, i.e., brute force methods work well) but for specialized, area-specific search tools (things like GoPubMed or applications deployed at, eg, Eli Lilly or experimented with at Elsevier come to my mind as good examples). Similarly, these technologies are not necessarily of interest for general, “robotic” publication tools like Google’s news, but for high quality publishing, with possible editorial oversight (reducing costs and difficulties).

(He also had a nice text on one of his slides: “Web2.0: Take Web 1.0, add a liberal dash of social, generous amounts of user generated content, atomize your content assets and stir until fully confused”:-)

Tom Gruber talked about his newest project: SIRI. A super-duper personal assistant running on an iPhone with conversational (voice directed) interface. The group behind it integrates a bunch of info on the Web (the “usual” stuffs like restaurants and travel sites), categorize them, and hide the complexities behind a sexy user interface. The problem I have is that I just do not see how this would scale. I see one of the major promises of the Semantic Web getting data in RDF out there so that such, essentially mash-up applications would become much easier to create and maintain. Until then, it is really tedious… On a more personal note, I am not sure I would like the voice conversational interface. I know that I have never used the voice commands on my phone for example; I do not feel comfortable with it. But, well, that is probably only me…

Chime Ogbuji made a really nice presentation on the system they have developed at the Cleveland Clinic. Great combination of RDF, OWL, and SPARQL. The interesting aspect (for me) was that usage of a medical expert system called Cyc, which is used to convert the doctor’s question in natural language (insofar as a question full of medical jargon can be considered as “natural”:-) into, essentially, a SPARQL query. The medical ontologies are used to direct this conversion process, and then the triple store could be queried through the generated query. Impressive work. (Part of it was documented in a W3C use case, but this presentation had a different emphasis.)

Unfortunately, I had to skip Peter Mika’s presentation on the SearchMonkey experiences, I will have to look at his slides… But, as a last minute addition to the program, the organizers succeeded in getting Othar Hansson and Kavi Goel to talk about Google’s rich sniplets. I have already blogged on this a few weeks ago but this presentation made the goal of the project way more understandable. Essentially, by recognizing specific microformat or RDFa vocabularies, they can improve the user experience by adding extra information on the search result. It is interesting to observe the difference between Yahoo! and Google in this respect: both of them use microformats/RDFa for the same general goal but, whereas Yahoo! relies on the community providing applications and on users personalizing their own search result page, Google controls the output in a generic way that does not require further user actions. It will be interesting to see how these differences influence people’s usage patterns. There were some discussion on the Google’s choice on vocabularies; the presenters made it quite clear that they are perfectly happy using other vocabularies (eg, vCard or FOAF) if they become pervasive, and this is a discussion that Google plans to engage with the community. There is of course a chicken-and-egg issue there (if a vocabulary is known by Google, then it will be more widely used, too), and this is cleary an area to discuss further. But these are details. The very fact that both Yahoo! and Google look at microformats and RDFa is what counts! Who would have thought just about a year ago?

I was not particularl impressed by the Semantic Search panel. I had the impression that the participants did not really know what they should say and talk about:-(

Nice presentation by Jeffrey Smitz from Boeing on a system called SPARQL Server pages. Essentially: the user can use similar structures like, say, a PHP page, ie, a mixture of HTML tags and server “calls”, except that this “calls” refer to SPARQL queries against a triple store on the server. Their system also includes some rule based OWL reasoning on the server side, although I am not sure I got all the details. All in all, the system seemed a bit complex, but the general approach is interesting! And it is nice to see that a company like Boeing seems to make good use of RDF+OWL+SPARQL; it would be good to know more…

I missed Zepheira’s presentation on freemix which is a shame, but, well, it happens. But I did play with freemix before travelling to San Jose;  I called it “Exhibit for the masses”. And this, I think, is a fair characterization. David Huynh’s exhibit is a really nice tool, but it is not easy to use it. On the other hand, it took me about 2 minutes to make a visualization of a json data set I used for an exhibit page elsewhere…

Andraz Tori talked about Common tag, a small vocabulary that, for example, can be used when marking up texts with tags (something that engines like Zemanta or Open Calais do). Bringing the RDF and the tagging worlds together is really important; I am very curious how successful this initiative will be…

The keynote on the last day was from the New York Times (by Evan Sandhaus and Robert Larson). It was quite interesting to see how a reputable journal like the NYT has developed a tradition of indexing, abstracting, cataloging articles, how these are archived and searched. Impressive. It is also great that the NYT Annotated Corpus has been released to the Research community. I did not know about that and, I presume, this must be a great resource for a lot of people active in the are of, say, natural language processing. Finally they announced their intention to release their thesaurus in a Semantic Web format, to add a “blob” to the Linked Data Cloud. They still have to work out the details (and expect feedback from the community) and I would hope they would publish a SKOS thesaurus and might even annotate the news items on their web site using this thesaurus in RDFa. But something in this space will happen, that is for sure! Other reputable newspapers, like Le Monde, the Guardian, NRC Handelsblatt,  el Pais, will you follow?

I also had my share of talking: gave an intro tutorial to SW, gave an overview of what is happening at W3C (quite a lot this year, including the finalization of POWDER, OWL 2, and SKOS!) and participated at an OWL 2 panel (with Mike Smith, Zhe Wu, Deb McGuinnis, and Ian Horrocks). I was quite happy with the tutorial and the way the panel went; the audience for the talk could have been a bit larger. But, well…

It was a long week, long trips, not much sleep… but well worth it!

Reblog this post [with Zemanta]

June 1, 2009

PWC report on Semantic Web

There has already been a number of blogs and tweetes on PriceWaterhouseCoopers’ Spring ’09 Technology Forecast on Semantic Web, but it may still be worth writing about it. The document can be downloaded from the Web free of charge in return for a registration. It includes some of PWC’s own overview on the technology, plus interviews with Tom Scott (BBC), Uche Ogbuji (Zepheira), Lynn Vogel (University of Texas M.D. Anderson Cancer Center), and Frank Chum (Chevron).

The document is clearly not aimed at technologists of the Semantic Web. But there are number of well chosen wordings and quotes that might help us to talk to people around us who have to be convinced about the value of Linked Data/Semantic Web. Just a few of those:

PricewaterhouseCoopers believes a Web of data will develop that fully augments the document Web of today. You’ll be able to find and take pieces of data sets from different places, aggregate them without warehousing, and analyze them in a more straightforward, powerful way than you can now.

[…]

Let’s say your agency represents musicians, and you want to develop your own ontology […]. You might create your own ontology to keep better tabs on what’s current in the music world […]. You also can link your ontology to someone else’s and take advantage of their data in conjunction with yours. Contrast this scenario with how data rationalization occurs in the relational data world. Each time, for each point of data integration, humans must figure out the semantics for the data element and verify through time consuming activities that a field with a specific label […] is actually useful, maintained, and defined to mean what the label implies. Although an ontology-based approach requires more front-end effort than a traditional data integration program, ultimately the ontological approach to data classification is more scalable […]. It’s more scalable precisely because the semantics of any data being integrated is being managed in a collaborative, standard, reusable way.

[…]

With the Semantic Web, you don’t have to reinvent the wheel with your own ontology, because others […] have already created ontologies and made them available on the Web. As long as they’re public and useful, you can use those. Where your context differs from theirs, you make yours specific, but where there’s commonality, you use what they have created and leave it in place. Ideally, you make public the non-sensitive elements of your business-specific ontology that are consistent with your business model, so others can make use of them. All of these are linked over the Web, so you have both the benefits and the risks of these interdependencies. Once you link, you can browse and query across all the domains you’re linked to.

[…]

Traditional data integration methods have fallen short because enterprises have been left to their own devices to develop and maintain all the metadata needed to integrate silos of unconnected data. As a result, most data remain beyond the reach of enterprises, because they run out of integration time and money after accomplishing a fraction of the integration they need.[…] The most basic lesson is that data integration must be rethought as data linking—a decentralized, federated approach that uses ontology-mediated links to leave the data at their sources. The philosophy behind this approach embraces different information contexts, rather than insisting on one version of the truth, to get around the old-style data integration obstacles.

Yeah, we all know that, right? But can we really put it in succint terms for outsiders? That is not that easy… Ie, worth reading the report (and thanks to PWC!).

April 26, 2009

WWW2009 Impressions

As usual, when making notes of a conference like WWW2009, in Madrid, one has only a partial view. This is all the more true for a conference of the size of WWWW2009 with around 1000 attendees and with 5-6 parallel tracks. I must admit that I usually have difficulties with so many tracks at the same time; I obviously loose some of the events happening, which is a source of unavoidable frustration. With this caveat, just some of the topics that I will probably remember…

The power of Twitter. Although this was not a “topic” of the conference, this was the first WWW conference where twitter was king. Twitter was everywhere, the #www2009 topic was getting several new entries per second (it even got spammed:-(, and other twitter tags were used for some of the specialized events (like #w3ctrack or #ldow2009) One could get a glimpse of what was happening elsewhere just by following these topics. In fact, this report is much more sketchy than usual simply because my own tweetes from the conference or, of course, all tweetes of the #www2009 topic can very well replace some of the notes I wrote in blogs in earlier years.

Social networks. Going beyond twitter, the ubiquitous presence of social networks, their effect on just about anything is still a major topic, like the continuous flow of papers trying, eg, to extract semantics from tag clouds (eg, the paper of Benjamin Markines et al) or the Googles and Yahoo!-s of this World trying to exploit these tags to improve their search results. (Yahoo’s experimental tag explorer is a good example trying to exploit these further.) Nothing radically new here, but progress is reported on all conferences, and this one was no exception. One of the keynotes, by Pablo Rodriguez from Telefonica, actually claimed that the needs of social networks in terms of network infrastructure are so different that they are bound to require changes on the hardware/firmware level of networks. Posting, for example, a video on a social site may create a sudden peak of high volume access (for example if posted by a “celebrity”) that makes it very different from the more steady flow of data that more traditional sites provide and require. For example local caching in routers might be needed. I am no expert in this at all (anything that is close to hardware is sort of a black box to me) so I cannot judge these statements but it was interesting to hear. Another interesting point he made was that “celebrities” of a specific network may (not necessary intentionally) start a dos attack against a site: think of the amount of http requests flowing to a site mentioned by one of these social network stars!

Web Science. There was a panel (organized by Nigel Shadbold, with Tim Berners-Lee, Ricardo Baeza-Yates, and Mike Brodie). The whole topic is still fairly open (at least for me): what exactly is Web Science and where are the boundaries? What types of research belongs to WS, and what is better kept outside to be handled by other disciplines? What type of abstractions would be necessary to study the Web as a whole (just as chemistry can be seen as a set of abstractions on top of physics)? What type of interdisciplinary research groups should be established? As far as I am concerned, I do not have a response to any of these questions:-( What I could see happening is that under the banner “Web Science” many different sub-disciplines will appear very soon and gain independent life without too much relationships among themselves. As far as I am concerned, I would be more interested by the relationship between the Web and society at large than by the technical aspects, but that is only me. An interesting practical point for the future is that there are plans to combine (eg, co-locate) future WWW conferences with Web Science events; that would really be a gain for both event series in my view.

Computing cloud. Yep, this comes up more an more often. Obviously a big deal in the keynote of Alfred Spector, from Google, but came up elsewhere, too. The a mini-tutorial on Hadoop, MapReduce, and Hive, given by Tom White as part of the Developers’ track, was really interesting and instructive for me. We know that the computing cloud has a great interest for the Semantic Web community; it may indeed be a tool to handle the significant amount of data out there. The LOD data is already available on the Amazon services (thanks to OpenLink), Chris Bizer and friends’ Mobile DBpedia makes use of cloud facilities, the LarKC project also makes use of massively parallel computing (I am not sure they use the cloud), too. Something to keep an eye on, that is for sure; I am sure the topic will gain more importance in future conferences. (And one more technology I should familiarize myself with…)

Power of data. Issues around search have become the dominating theme of the WWW conferences, and this one was no exception. Many research try to exploit the sheer amount and variety of data that has been accumulated by the big search engines, for example. I have heard several talks over the years coming from Google’s R&D lab (including a keynote at this conference). I must admit the overall impression I get from these is that a more or less straightforward exploitation of a huge amount of data is used like a sledgehammer for all problems. (I am probably unfair.) Ricardo Baeza-Yates (from Yahoo!) also reported some work in his keynote on, eg, analyzing the search queries themselves, ie, the paths of different searches performed by users between the time they begin some search and the time they find what they were looking for. (Interesting stuff! By the way, there is also a conference on weblogs and social media, ICWSM; one more conference coming up around Web technologies.) I also listened to a presentation on Yahoo!’s Boss by Ted Drake (again on the Developers’ track): what is interesting is that one can access to (a part of) Yahoo!’s accumulated indexes to build, eg, one’s own search engines but, I presume, one could also use this data for other type of research exploiting the data. Power of data for the masses? (I have heard of Boss before and I would have welcome more technical details at the presentation but, well…)

Web of data, a.k.a. Semantic Web. The conference started by a great workshop on Linked Data. I again rely on twitter notes and the general twitter notes for more details, no need to repeat them here. Suffices it to say that, beyond the individual papers, there were a general “buzz” in the air, a general enthusiasm that was reflected by the high number of participants (over 100). For anybody interested, it is worth looking at all the papers, they were good! Having said that, what I am really waiting for is to see many real application of the LOD (and not only experimental, university usage) but that takes its time; there were no really breathtaking news on that at the workshop.

But, of course, the workshop was for the converted; what was more interesting is to see that the Linked Data concept, and the Semantic Web in general, created more and more interest at the conference proper and not only for the long time Semantic Web adepts. Jim Hendler did a surprise presentation at the Developers’ track (surprise, because a announced speaker could not come, so he took his place) talking to non-Semantic Web developers about what can be done already today with this technology, about the excitement that is out there, about the companies that have already picked up this technology. It was good to get these messages out there again and again. Georgi Kobilarov did also a great presentation on DBpedia at the track; there were several people I talked to afterward who were really carried away by the possibilities opened up by having access to a huge amount of data through the unifying abstraction of RDF, RDFS, and possibly (a little bit of:-) OWL.

I also went to the Semantic Web referreed paper track, obviously. I must admit I was a little bit disappointed because lots of colleagues that I would typically see on such event that were not around. I presume ISWC has now become major competition to WWW in this area and when money is tight, people have to make a choice. In earlier years ISWC was considered to be much more theoretical while WWW had more practical papers, but the last few ISWC’s I attended seemed to indicate that this is changing. I think any of the WWW papers could have been presented at the ISWC without any problems. As a consequence, I guess many people decided that ISWC is a better place to be. It will be interesting to see how things will evolve in future; it is not impossible that Semantic Web, as a topic, will gradually move away from WWW to ISWC. (I would expect specifically Linked Data papers to appear at ISWC very soon!)

That being said: it was nice to see a paper on DERI Pipes (by Danh Le-Phuoc et al) or on Triplify (by Sören Auer et al). This is not the first time I heard about these but it is good to have them more widely published. There was a paper on a rule system benchmark (by Senlin Liang et al); although I am no expert on this, with the advancement of RIF it will be good to have such benchmarks being put forward. The paper of Philippe Cudré-Mauroux et al on the disambiguation of ID’s on linked data issue caught my attention: with the advancement of linked data we enter (as the presenter put it) an “ID Jungle” with tons of URI-s referring, more or less, to the same concept (eg, a specific person), and a simple owl:sameAs is not an ideal solution to handle this. The idMesh system provides a mean to analyze relationships among those ID-s. I must admit I did not follow all details of the paper but it is certainly one of the papers I will have to study in more details when I get to it!

W3C’s “camps”. W3C tried another model this year, replacing the more traditional W3C tracks by two ‘camps’ on mobile web and on social web. But… this is where the large number of parallel track backfired: I could not go to any of them:-( There were all kinds of overlaps with other presentations (eg, the social web camp fully coincided with the Semantic Web paper track). Pity, because the feedback I heard from participants was very positive. Sigh. Well, actually, courtesy of Fabien Gandon, I was present on the social web camp virtually, witness this slide

It was a slightly exhaustive but good week!

March 10, 2009

Governments, Web Standards, Semantic Web

Filed under: Semantic Web,Work Related — Ivan Herman @ 16:06
Tags: , , , ,

The W3C Interest Group on eGovernment has just published its first Working Draft: Improving Access to Government through Better Use of the Web. Only a first draft, but may be of interest for the Semantic Web crowd, look at the separate chapter on Open Government Data

March 6, 2009

Colourful Linked Data cloud (and HCLS)

Filed under: Semantic Web,Work Related — Ivan Herman @ 11:04
Tags: , , , ,

Coloured version of the LOD cloud, emphasizing the various application areas
Anja Jentzsch and Chris Bizer have published a new version of the LOD cloud figure. One of the interesting new things is that they have also produced a coloured version of the same figure that emphasizes the various application areas of the individual datasets. It is striking to see how active the HCLS community is: the lower third of the bubbles are all from that area! This might give an incentive to other communities (like eGovernment or the Oil & Gas industries) to do the same…

« Previous PageNext Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 2,545 other followers