August 11, 2013

The value of community driven content (OSM vs. Google Map)

Notre dame de la gardeThis is just a nice little example which might be worth noting for those who do not know Open Street Map (I am also a relatively new user of it).

I had a nice walk in Marseille yesterday, which included going down from the big cathedral on the top of the hill (“Notre Dame de La Garde“) to the seaside. There is a not-very-well-known path behind the church that one can take which is, for my taste anyway, a gorgeous way of doing it.

The path of course appears on Google’s Map: look at the small path going from the church to the “Rue du Bois Sacré”. However: look at the same area using Open Street Map: not only is the path there, but it gives a bunch of details. Indeed, it is not really a simple path: it is a long series of steps, i.e., do not try to drive even a bike there:-( And because it is a hot city, it is also good to know that there is small public fountain along the path (and, indeed, it is there and it works!)…

It is not really Google’s fault. They probably got the material from some sort of an official mapping system (they could not get their camera cars or bikes up there…) and there is no way a company, even as huge as Google, can cover such details. But a community-driven site can: people can add such details easily. (Actually, there was part of the path that was missing, and I will add it soon using my GPS readings.) Therein lies the power:-)

March 16, 2013

Multilingual Linked Open Data?

Logo of the EU Multilingual Web ProjectExperts developing Web sites for various cultures and languages know that it is way better to include such features into Web pages at the start, i.e., at the time of the core design, rather than to “add” them once the site is done. What is valid for Web sites is also valid for data deployed on the Web, and that is especially true for Linked Data whose mantra is to combine data and datasets from all over the place.

Why do I say all this? I had the pleasure to participate, earlier this week, at the MultilingualWeb Workshop in Rome, Italy. One of the topics of the workshop was Linked (Open) Data and its multilingual (and, also, multicultural) aspects. There were a number of presentations at a dedicated session (the presentations are online, linked from the Workshop Page; just scroll down and look for a session entitled “Machines”), and there was also a separate break-out session (the slides are not yet on-line, but they should be soon). There are also a number of interesting projects and issues in this area beyond those presented at the event; for example, the lemon model or the (related) Monnet EU project as examples.

All these projects are great. However, the overall situation in the Linked Data world is, in this respect, not that great, at least in my view. If one looks at the various Linked Data (or Semantic Web) related mailing lists, discussion fora, workshops, etc, multilingual or multicultural issues are almost never discussed. I did not make any systematic analysis of the various datasets on the LOD cloud, but I have the impression that only a few of them are prepared for multilingual use (e.g., by providing alternative labels and other metadata in different languages). URI-s are defined in English, most of the vocabularies we use are documented in only one language; they may be hard to use for non-English speakers. Worse, vocabularies may not even be properly prepared for multicultural use (just consider the complexity of personal names which is hardly ever properly reflected in vocabularies). And this is where we hit the same problem as for Web sites; with all its successes we are still at the beginning of the deployment of Linked Data: our community should have much more frequent discussions on how to handle this issue now, because after a while it may be too late.

B.t.w., one of the outcomes of the break-out session at the Workshop was that a W3C Community Group should be created soon to produce some best practices for Multilingual Linked Open Data. There is already some work done in the area, look at the page set up by José Emilio Labra Gayo, Dimitris Kontokostas, and Sören Auer; this may very well be the starting point. Watch this space!

It is hard. But it will be harder if we miss this particular boat.

November 7, 2011

March 13, 2011

Example for the power of open data…

Earthquakes around the globe on the week of the 11th of March

I wish I would not have to use this example… But I just hit it this morning via a tweet of Jim Hendler. RPI has an example on how can one combine public gov data (in this case, a Data.gov dataset on Earthquakes), its RDF version with a SPARQL query, and a visualization tool like Exhibit. The result is an interactive map on Earthquakes of the last week. Running the demo today reveals an incredible amount (over 160) of events on the coast of Honshu, Japan, which led to the earthquake and tsunami disaster on the 11th of March. I do not know how much time it took for Li Ding to prepare the original demo, but I suspect it was not a big deal once the tools were in place.

The demo is dynamic, in the sense that in a week it will probably show some other data than today. So I have made a screen dump for memento (I hope it is all right with Jim and Din). If you are looking at it now, it is worth zooming into the area around Japan to gain some more insight into the sheer dimensions of the disaster: there were  325 quakes (out of 411 around the globe) in that area during the week! I must admit I did not know that…

I have the, hopefully not too naïve, belief that tools like this may not only increase our factual knowledge, but would also help, in future, to help those who are now struggling in coping with the aftermath of this disaster. Yes, having open data, and tools to handle them and integrate them, is really important.

October 16, 2010

Open Data as a Tangram

The Open Data Tangram of the W3C Brazil Office

Many of us have seen, or heard of Tim’s talk on linked open data using a bag of potato chips as an example. I have just stumbled into another analogy for the usefulness of open data yesterday. It is not my idea, just telling about it…

I have spent a day at the W3C Brazil Office in São Paulo yesterday. As part of their goodies to distribute, the Office has produced a small Tangram. (For those who do not know, a Tangram is an old Chinese puzzle: it consists of a small set of simple geometric forms that can be arranged in a square. The cute thing is that there is a large number of figures that can be created by simply rearraning those pieces.) The distributed Tangram has puzzle pieces that are annotated with terms like (I hope I get it right, I do not speak Portuguese) “Compatibility”, “Transparency”, etc.

And indeed. Organizations would publish their data in some configuration (say, a square…). The rich possibilities come from the fact that anybody can take those pieces of data, rearrange them and produce different, cute configurations, i.e., applications using the very same data. And that is where the power of open data comes…

July 12, 2010

Experiences of LOD publication

Frank van Harmelen’s tweet drew my attention on a paper of Jan Hannemann and Jürgen Kett on Linked Data for Libraries. I hope Jan and Jürgen will not be upset if I copy some quotes from their paper, but I thought that giving more publicity to some of their experiences in deploying linked data at the German National Library is worthwhile. Reproduced here without change though somewhat shortened:

  • Setting up a service is not trivial. […] the essential software solutions (tools) involved have not reached full maturity yet. […] documentation may be lacking the required depth. […] multiple software components need to be setup to work together  […] which requires appropriate expertise.[…]
  • Data modeling can be complex. When publishing data on the web, it is advantageous to use existing, registered ontologies. Unfortunately, these ontologies do not always match the data representation of each individual library […] the definitions of individual properties can vary considerably. […] There is no simple answer to the question which is the right thing to do.[…]
  • Open data exchange mentality does not exist everywhere. Even before linked data, libraries have exchanged and aligned their data sets. The results of such projects could be prime information sources for connecting linked data sets. Sadly, not all institutions involved share the open exchange mentality, and shared ownership may make it difficult to publish these results.
  • Best practices are seen as rules. Linked open data is based largely on best practices rather than rules. However, this pragmatic aspect is not seen as essential in all areas of the linked data community. Deviations from perceived standards tend to be criticized, which can cause institutions new to the semantic web to doubt their decisions – even if they make sense for the organization in question. Libraries should not be deterred by such feedback and rather see this as a motivation to contribute their own experiences and knowledge to the community. Guidelines and best practices should be carefully considered in the context of each institution’s needs, especially in this early forming phase of the semantic cloud.[…]
  • Properly modeled data is very useful. Once the data modeling is completed and the data made available, it can be used by others. A colleague at the Technical University of Braunschweig has shown that with properly modeled data, this can result in very useful applications: within a day, he imported our data into a database, added a web interface and had thus created a searchable access to our data.

