Ivan’s private site

December 6, 2009

LOD and the top 10 SW products 2009

Filed under: Semantic Web,Work Related — Ivan Herman @ 11:19
Tags: ,

Richard MacManus has just published the “Top 10 Semantic Web products of 2009” (see part I and part II) in ReadWriteWeb. What I found interesting on that list is to see that products have been included that  are related to, and are using, the output of the Linked Open Data project: Open Calais, Zemanta, BBC’s Semantic Music project, Freebase, DBPedia, Data.gov. (Ok, listing DBPedia as a “product” may not be absolutely right, but, well…).

Why is this interesting? Because one of the negative comments that one hears sometimes (often?), related to the LOD, is that this is nothing more than an academic exercise, ie,  it does not make any sense for business. Well, here we are!

October 26, 2009

ISWC2009 I.

20091026046This year’s ISWC is held in Chantilly, Virginia. In a nice conference building in a beautiful park with autumn colours that, for reasons I do not really know, is always much more striking and amazing in America than in Europe. It is a bit of a pity that it is so far from Washington but, well, you can’t get it all…

First day: tutorials.

(For me, because there were also a bunch of workshops.) In the morning I was at the tutorial on how to consume Linked Open Data, by Juan Sequeda, Jammie Taylor, Patrick Sinclair, and Olaf Hartig; in the afternoon I went to the one on legal and social frameworks for sharing data on the Web, by Leigh Dodds, Jordan Hatcher, Tom Heath, and Kaitlin Thaney.

Juan and his  friends had actually a difficult task, and that became clear right at the start during the intro of Juan: part of the audience did not really know what LOD was all about, whereas there were also others who were, shall we say, old timers on the subject. I think the speakers did a really good job in navigating through these constraints, making short introductions to what LOD is all about but talking about issues and showing examples that were interesting for all of us. Kudos to that. Issues were raised by the audience that were really to-the-point (who should create sameAs,  links, how trustworthy are they, how to choose vocabularies and how they map to one another, etc) and, in his closing slides, Juan actually gave a list of the open  R&D issues in LOD. Worth looking at those (and no reason to repeat the list here…). B.t.w., the slides of the tutorial are on line.

One very interesting technology I heard about that, shame on me, but I did know was a tool based on a traversal based execution scheme for SPARQL called sqin.  Olaf did a presentation on that. What essentially happens is  as follows. At the beginning the default graph of the SPARQL query is empty. However, the system would systematically fetch RDF triples by dereferencing URI-s in the query pattern, adding those to the default graph. The query is matched against it, some variable will match thereby ‘adding’ new URIs to the pattern. And the process starts again, possibly yielding a complete solution (or more) to the original query. At the end of the process, solutions will be found on the Web, even if the system itself does not have any ‘real’ data behind it at the start. Of course, no one can secure that all solutions will be found, and you need some ‘seed’ URI-s in the original query pattern, but it nevertheless looks like a very powerful tool to explore, say, the LOD.  Very interesting!

Then there were some examples on how LOD is used. Jammie talked about Freebase, and how Freebase is, in fact, a way for everybody to easily add information to the LOD (after all, Freebase works like a wiki, and all the data is reflected on the LOD).  He also had a very important message that is worth repeating (go to his slides for the rest): it takes very little effort to add a republishing capability to your triples store based application, thereby extending the general LOD. So… do it! This is how the system evolves…

Patrick described a quite geeky system that the BBC folks have developed (hopefully will become public soon): take the BBC’s musical data in RDF (which is available), plus the LOD cloud, plus… an IRC bot. What you get is an IRC channel which will pick up data on music, including the sound tracks, photos, etc, and display it on the machine. I presume you  can give orders and preferences through the IRC. Obviously a geeky stuff not for the masses:-) but shows what you can do…

The afternoon tutorial on the Legal and Social frameworks was of course very different. I think one of the many, but maybe the most important aspect of this tutorial is that… it took place! This may sound a bit strange but it is important for all our community to realize that we will have issues around copyright, licensing, waivers, etc, when it comes to the Web of Data, whether we like these issue or not. Tutorials like this, written notes and information, etc, are essential. Let us face it: most of us do not understand the details of the legal issues. So I was simply listening and trying to absorb what I heard…

I do not want to repeat the details of what I heard here; one thing I learned over the years is that I should leave legal argumentations and descriptions to those who really understand that. Ie, look at the slides. It is worth it. But just to show the complexities: I did not know or fully realize that there are major differences what can or cannot be copyrighted among countries: for example, a phone book cannot be copyrighted in the US or Europe, but can in Australia. That the seemingly simple notion of ‘attribution’ can, in fact, become an endless pit when it comes to data and the queries thereof (eg, if I have a filter in a query that results in data, should I give an attribution to the fact that were, in fact, filtered out?). Etc.

There is also a takeaway message for me (though it may be quite trivial) among the things I learned. Tom showed some practical examples on how can one add, say, licensing information to data by adding some RDF triples. However, for a larger data set the licensing may be different within the dataset. Eg, if you retrieve data from somewhere, and you enrich it with additional metadata, the metadata itself may have a different licensing (it is yours) than the data that you use (which may have its own licence). What this means is that when you organize your data internally, you should think about the licensing information you will add well in advance: organize your URI-s accordingly, for example. If you don’t, and you want to add license at the end, you might find yourself in trouble! Sounds like a simple message, but it is important. (Reminds me of what accessibility people always say: if you take accessibility issues into account right at the beginning when you build up a Web site, it is not complicated; but if you have to add accessibility features after the facts, it may become hell…)

By the way, Leigh has made a kind of an overview of the current ‘blobs’ on the LOD cloud to see whether any kind of licensing information is available or not. He has an overview of the results in his slides. The main fact is: the majority of data sets has no information whatsoever (or, at least, nothing that can be found in about 10 minutes)…

It was a good day. Looking forward to the rest.

March 10, 2009

Governments, Web Standards, Semantic Web

Filed under: Semantic Web,Work Related — Ivan Herman @ 16:06
Tags: , , , ,

The W3C Interest Group on eGovernment has just published its first Working Draft: Improving Access to Government through Better Use of the Web. Only a first draft, but may be of interest for the Semantic Web crowd, look at the separate chapter on Open Government Data

March 6, 2009

Colourful Linked Data cloud (and HCLS)

Filed under: Semantic Web,Work Related — Ivan Herman @ 11:04
Tags: , , , ,

Coloured version of the LOD cloud, emphasizing the various application areas
Anja Jentzsch and Chris Bizer have published a new version of the LOD cloud figure. One of the interesting new things is that they have also produced a coloured version of the same figure that emphasizes the various application areas of the individual datasets. It is striking to see how active the HCLS community is: the lower third of the bubbles are all from that area! This might give an incentive to other communities (like eGovernment or the Oil & Gas industries) to do the same…

December 9, 2008

Zemanta and the Linked Data Cloud…

A few weeks ago I blogged on Open Calais and the Linked Data Cloud. I just received a comment on that blog:

Hi Herman,

yes, today Zemanta’s API officially stopped flying low on the radar and was released to wider public.

It does support RDF output, links to Linking Open Data entities and has properly defined namespace:

http://www.zemanta.com/api/

which is great! B.t.w., just as a pure coincidence, a new SW Use Case was published yesterday on Faviki, one of Zemanta’s user…

November 14, 2008

Calais Release 4 and the Linking Data cloud…

Just got to this news via Yves’s blog: Reuters’ Open Calais service comes with a new release in January, and this will bind to the Linked Data cloud. To quote the official blog of Reuters:

Release 4 of Calais will be a big deal. In that release we’ll go beyond the ability to extract semantic data from your content. We will link that extracted semantic data to datasets from dozens of other information sources, from Wikipedia to Freebase to the CIA World Fact Book. In short – instead of being limited to the contents of the document you’re processing, you’ll be able to develop solutions that leverage a large and rapidly growing information asset: the Linked Data Cloud.

Ie: when analyzing a text, Open Calais will return URIs into DBPedia, Freebase, Musicbrainz… Thereby opening up the possibility for various of applications that would not be possible (or would be fairly complicated) without. One more step to make it possible to reuse all those data on the Web… Yey!

B.t.w.: I write these lines using WordPress and I have Zemanta’s Firefox plugin running to generate the tags. However, as far as I know (I may be wrong!), the Zemanta service does not provide those URI-s yet (they do provide some URI-s in their return format, but I am not sure those are LOD URIs). Maybe some day?

(Thanks to Yves for drawing my attention on this…)

(Note after the original publication of the blog: it seems I was wrong and Zemanta does have a similar feature, see Andraz’ comment.)

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,021 other followers