Ivan’s private site

April 1, 2011

2nd Last Call for RDFa 1.1

Filed under: Semantic Web,Work Related — Ivan Herman @ 2:58
Tags: , ,

The W3C RDFa Working Group published a “Last Call” for RDFa 1.1 back at the end of October last year. This was meant to be a “feature freeze” version and was asking for public comments. Well, the group received quite a number of those. Lots of small things, requiring changes of the documents in many places to make them more precise even in various corner cases, and some more significant ones. In some ways, it shows that the W3C process works, ensuring quite an influence of the community on the final shape of the documents. Because of the many changes the group decided to re-issue a Last Call (yes, the jargon is a bit misleading here…), aimed at a last check before the document goes to its next phase on the road of becoming a standard. Almost all the changes are minor for users, though important for, e.g., implementers to ensure interoperability. “Almost all”, because there is one new and, I believe, very important though controversial new feature, namely the so-called default profiles.

I have already blogged about profiles when they were first published back in April last year. In short, profile documents provide an indirection mechanism to define prefixes and terms for an RDFa source: publishers may collect all the prefixes they deem important for a specific application and authors, instead of being required to define a whole set of prefixes in the RDFa file itself, can just refer to the profile file to have them all at their disposal. I think the profile feature was the feature stirring the biggest interest in the RDFa 1.1 work: they are undeniably useful, and undeniably controversial… Indeed, in theory at least, profiles represent yet another HTTP round when extracting RDF from and RDFa file, which is never a good thing. But a good caching mechanism or other implementation tricks can greatly alleviate the pain… (B.t.w., the group has also created some guidelines for profile publishers to help implementers.)

This draft goes one step further by introducing default profiles. These are profiles just like any other, but they are defined with fixed URI-s (namely http://www.w3.org/profile/rdfa-1.1 for RDFa 1.1 in general, and, additionally, http://www.w3.org/profile/html-rdfa-1.1 for the various HTML variants) and the user does not have to declare them in an RDFa source. Which means that a very simple HTML+RDFa file of the sort:

<html>
  <body>
    <p about ="xsd:maxExclusive" rel="rdf:type" resource="owl:DatatypeProperty">
      An OWL Axiom: "xsd:maxExclusive" is a Datatype Property in OWL.
    </p>
  </body>
</html>

(note the missing prefix declarations!) will produce that RDF triple that you might expect. Can’t be simpler, can it?

Why? Why was it necessary to introduce this? Well, the experience shows that many HTML+RDFa authors forget to declare the prefixes. One can look, for example, at the pages that include Facebook’s Open Graph Protocol RDFa statements: although I do not have an exact numbers, I would suspect that around 50% of these pages do not have them. That means that, strictly speaking, those statements cannot be interpreted as RDF triples. The Semantic Web community may ask, try to convince, beg, etc., the HTML authors (or the underlying tools) to do “the right thing”, and we certainly should continue doing so, but we also have to accept this reality. A default profile mechanism can alleviate that, thereby greatly extending the amount of triples that can become part of a Web of Data. And even for seasoned RDF(a) users not having to declare anything for some of the common prefixes is a plus.

Of course, the big, nay, the BIG issue is: what prefixes and terms would those default profiles declare? What is the decision procedure? At this time, we do not have a final answer yet. It is quite obvious that all the vocabularies defined by W3C Recommendations and official Notes and that have a fixed prefix (most of them do) should be part of the list. We may want to add Member Submissions to this list. If you look at the default profile, these are already there in the first table (i.e., the code example above is safe). The HTML variant would add all the traditional @rel values, like license, next, previous, etc.

But what else? At the moment, the profiles include a set of prefixes and terms that are just there for testing purposes (although they do indicate a tendency), so do not take the default profile as the final content. For the HTML @rel values, we would, most probably, rely on any policy that the HTML5 Working Group will define eventually; the role of the HTML default profile will simply be to reflect those. That seems quite straightforward However, the issues of default prefixes is clearly different. For those, the Working Group is contemplating two different approaches

  1. Set up some sort of a registration mechanism, not unlike the xpointer registry. This would also include some accompanying mailing lists where objections can be raised against the inclusion of a specific prefix, etc.
  2. Try to get some information from search engines on the Semantic Web (Sindice, Yahoo!, anyone else?) that may provide with a list of, say, the top 20 prefixes as used on the Semantic Web. Such a list would reflect the real usage of vocabularies and prefixes. (We still have to see whether this is an information these engines can provide or not.)

At this moment it is not yet clear which way is realistic. Personally, I am more in favour of the second approach (if technically feasible), but the end result may be different; this is a policy that W3C will have to set up.

Apart from the content, another issue is the change mode and frequency of the default profile. First of all, the set of default prefixes can only grow. I.e., once a prefix has made it on the default profile, it has to stay there with an unchanged URI. That is obviously important to ensure stability. I.e., new prefixes coming to the fore by virtue of being used by the community can be added to the set, but no prefix can be removed. As for the frequency: a balance has to be found between stability, i.e., that RDFa processors can rely (e.g., for caching) on a not-too-frequent change of the default profiles, and relevance, i.e., that new vocabularies could find their way into the set of default prefixes. Again my personal feeling is that an update of the profiles once every 6 months, or even once a year, might strike a good balance here. To be decided.

As before, all comments are welcome but, again as before, I would prefer if you sent those comments to the RDFa WG’s mailing list rather than commenting this blog: public-rdfa-wg@w3.org (see also the archives).

Finally: I have worked on a new version of my RDFa distiller to include all the 1.1 features. This version of the distiller is now public, so you can try out the different new features. Of course, it is still not a final release, there are bugs, so…

Enhanced by Zemanta

March 29, 2011

LDOW2011 Workshop

Filed under: Semantic Web,Work Related — Ivan Herman @ 15:29
Tags: ,

The Linked Open Data Workshop (LDOW20XX) has become an integral part of the yearly WWW conferences, and this year was no exception under the unsurprising name of LDOW2011. And, as always, it is was an enjoyable, pleasant event. The organizers (Chris Bizer, Tom Heath, Michael Hausenblas, and Tim Berners-Lee) made the choice of accepting slightly less papers to leave room for more discussions. That was a good choice; the workshop was really more of a workshop rather than just listening to presentations, there were nice discussions, lots of comments… and that was great.

It is very difficult to summarize a whole day, and I do not want to go and comment each individual paper. The papers (and, I believe, soon the presentation slides) are on the Web, of course, it is worth glancing at each of them. For me, and that is obviously very personal, maybe the most important takeaway is actually close to the blog I wrote yesterday on the empirical study of SPARQL queries. And this is the general fact that we are at the point when the size and complexity of linked open data cloud is such that we can begin to make meaningful measurements, experimental data analysis, empirical studies, etc, to understand how the data is really used out there, what is the shape and behavior of the beast, and how these affect the tools and specifications we develop.

The workshop started with an overview of Chris (I hope his slides will be on the Web at some point) doing exactly that. He looked at the evolution of the LOD cloud and tried to analyze its content. There were some nice cosy figures: the growth in 2010, in terms of the number of triples, was of 300%, with some spectacular application areas coming into the game, like a 955% growing of library related data, or the appearance of  governmental data from nothing in 2009 to about 11B triples in November 2010. Although Danny Vrandecic made the remark at the end of the Workshop that we should stop measuring the LOD cloud in terms of pure number of triples (and I can agree with that), those numbers are nice nevertheless. Some figures were less satisfactory: links among datasets is relatively low (90 out of the 200 datasets have only around 1000 links to the outside, and the majority only interlink with only one other dataset; only around 9% of the datasets publish machine readable licenses (although 31% publish machine readable provenance data, which is a bit nicer). Some of the common vocabularies are commonly reused (31% use Dublin Core terms, for example), but way too many dataset publishers define their own vocabulary even if that is not strictly necessary, and only about 7% publish mapping relationships from their own vocabulary to others.

Beyond the numbers themselves, I believe the important point is that somebody does collect and publish these data regularly to understand where we should put some emphasis in future. For example (and this came up during the discussion) work should be done on simple (in my view, rule, i.e., RIF or N3 based) mappings among vocabularies, those should be published for others to use; that figure of 7% is really too low. Work on helping data providers to create additional links easily is another area of necessary improvement (and there were, in fact, several papers on that very topic during the day).

I do not know whether it was a coincidence or whether the organizers did it on purpose, but the day ended by a similar paper but on vocabularies. A group from DERI collected some specific datasets to see how a particular vocabulary (in this case the GoodRelations vocabulary) is being used on the Web of Data, what are the usage patterns, how it can be used for specific possible use cases, etc. The issue here is not the GoodRelations ontology as such (you can see the details of the results in the paper) but rather the methodology: we are at the point when we can measure what we got, and we can therefore come up with empirical data that will help us to concentrate on what is essential. I hope this approach will come up to the fore more and more in future.  We need it.

It was a good day.

March 28, 2011

Empirical study of real-world SPARQL queries

Filed under: Semantic Web,Work Related — Ivan Herman @ 12:21
Tags: , ,

A nice paper I just heard at the USEWOD2011 Workshop at the WWW2011 conference: “Empirical study of real-world SPARQL queries”, by M.A. Gallego and his friends from the Univ. of Valladolid, in Spain. What they did was to analyse the SPARQL queries as issued by various clients to the DBPedia and the Semantic Web Dogfood dataset, to see if some general features appear that RDF triple stores and SPARQL implementers can take into account. This is a workshop paper, i.e., work in progress, so the results must be taken with a pinch of salt. E.g., it seems that DESCRIBE and CONSTRUCT queries are very rarely used (not a big surprise), that the OPTIONAL and UNION are used quite a lot, so their optimization is important, that most of the queries are dead simple, but around half of them rely on FILTER (albeit with one variable only), etc.

The interesting point for me is, however, that some of these data were radically different between these two datasets. E.g., 16% of the queries used OPTIONAL for DBPedia, whereas only 0.41% for the Dogfood dataset. What this tells me is that it is extremely difficult to optimise data stores in general. I.e., the characteristics of the data set, and indeed the application area (e.g., I would expect SPARQL queries to be much more complicated in the health care domain) have to play an important role. What the dimensions of optimizations are is not clear, but the type of research Gallego and his friends are doing might shed some light… Kudos for having started this discussion!

March 13, 2011

Example for the power of open data…

Earthquakes around the globe on the week of the 11th of March

I wish I would not have to use this example… But I just hit it this morning via a tweet of Jim Hendler. RPI has an example on how can one combine public gov data (in this case, a Data.gov dataset on Earthquakes), its RDF version with a SPARQL query, and a visualization tool like Exhibit. The result is an interactive map on Earthquakes of the last week. Running the demo today reveals an incredible amount (over 160) of events on the coast of Honshu, Japan, which led to the earthquake and tsunami disaster on the 11th of March. I do not know how much time it took for Li Ding to prepare the original demo, but I suspect it was not a big deal once the tools were in place.

The demo is dynamic, in the sense that in a week it will probably show some other data than today. So I have made a screen dump for memento (I hope it is all right with Jim and Din). If you are looking at it now, it is worth zooming into the area around Japan to gain some more insight into the sheer dimensions of the disaster: there were  325 quakes (out of 411 around the globe) in that area during the week! I must admit I did not know that…

I have the, hopefully not too naïve, belief that tools like this may not only increase our factual knowledge, but would also help, in future, to help those who are now struggling in coping with the aftermath of this disaster. Yes, having open data, and tools to handle them and integrate them, is really important.

November 23, 2010

My first mapping from RDB to RDF using a direct mapping, cont.

A few days ago I posted a blog on how the RDB to RDF direct mapping could be used for a simple example. I do not want to repeat the whole blog: the essence of it was that database tables were mapped onto a simple RDF Graph (this is what the direct mapping does) and the resulting graph was transformed into the “target” graph using the following SPARQL 1.1 construct:

CONSTRUCT {
  ?id a:title ?title ;
    a:year  ?year ;
    a:author _:x .
  _:x a:name ?name ;
    a:homepage ?hp .
}
WHERE {
  SELECT (IRI(fn:concat("http://...",?isbn)) AS ?id)
          ?title ?year ?name
         (IRI(?homepage) AS ?hp)
  {
    ?book a  <Book>;
       ?isbn ;
       ?title ;
        ?year ;
       ?author .
    ?author a  <Author>;
       ?name ;
       ?homepage .
  }
}

where the trick was to use a nested SELECT whose main job was to create URI references from strings. I realized that if one uses the latest editors’ version of SPARQL 1.1 (i.e., that version that is much closer to what SPARQL 1.1 will be) then the solution is actually simpler due to the variable assigning possibility that makes the nested SELECT unnecessary:

CONSTRUCT {
  ?id a:title ?title ;
    a:year  ?year ;
    a:author _:x .
  _:x a:name ?name ;
    a:homepage ?hp .
}
WHERE {
  ?book a  <Book>;
     ?isbn ;
     ?title ;
      ?year ;
     ?author .
  ?author a  <Author>;
     ?name ;
     ?homepage .
  BIND (IRI(fn:concat("http://...",?isbn)) AS ?id)
  BIND (IRI(?homepage) AS ?hp)
}

which makes, at least in my view, the mapping even clearer.

But SPARQL is not the only way to transform the graph. Another possibility is to use RIF Core. Essentially the same transformation can indeed be expressed using the RIF Presentation syntax. Here it is (with a little help from Sandro Hawke and Harold Boley):

Forall ?book ?title ?author ?isbn ?year ?id (
  ?id[a:year->?year a:title->?title a:author->?author] :-
    And(
      ?book[rdf:type-> <Book>
             a:isbn->?isbn
             a:title->?title
             a:year->?year
             a:author->?author]
      External(pred:iri-string(?id External( func:concat("http://..." ?isbn ) )))
    )
)
Forall ?author ?name ?hp ?homepage (
 ?author[a:name->?name a:homepage->?hp] :-
   And(
        ?author[rdf:type-> <Author>
                a:name->?name
                a:homepage->?homepage]
        External(pred:iri-string(?hp ?homepage))
  )
)

(as I did in the earlier examples, I did not put the prefix declaration and other syntactic stuffs into the code above.)

The only difference between the two is that I retained the URI for the author, because generating a blank node on the fly in RIF Core does not seem to be possible. A better solution would be, probably, to mint a URI from the ?author variable just like I did for the ISBN value. Other than that, the two solutions are pretty much identical…

November 19, 2010

My first mapping from RDB to RDF using a direct mapping

A few weeks ago I wrote a blog on my first RDB to RDF mapping using R2RML; the W3C RDB2RDF Working Group had just published a first public Working Draft for R2RML. That mapping was based on a specific mapping language (i.e., R2RML). R2RML relies on an R2RML processing done by, for example, the database system, interpreting the language, using some SQL constructions, etc. The R2RML processing depends on the specific schema of the database which guides the mapping.

As I already mentioned in that blog, a “direct” mapping was also in preparation by the Working Group; well, the first public Working Draft of that mapping has just been published. That mapping does not depend on the schema of the database: it defines a general mapping of any relational database structure into RDF; only a base URI has to be specified for the database, everything else is generated automatically. The resulting RDF graph is of course much more coarse than the one generated by R2RML; whereas the result of an R2RML mapping may be a graph using well specified vocabularies, for example, this is not the case for the output of the direct mapping. But that is not really a problem: after all, we have SPARQL or RIF to make transformation on graphs! Ie, the two approaches are really complementary.

What I will do in this blog is to show how the very same example as in my previous blog can be handled by a direct mapping. As a reminder: the toy example I use comes from my  generic Semantic Web tutorial. Here is the (toy) table:

which is then converted into an RDF Graph:

(Just as in the previous case I will ignore the part of the graph dealing with the publisher, which has the same structure as the author part. I will also ignore the prefix definitions.)

The direct mapping of the first and second tables is pretty straightforward. The URI-s are a bit ugly but, well, this is what you get when you use a generic solution. So here it is:

@base <http://book.example/> .
<Book/ID=0006511409X#_> a <Book> ;
  <Book#ISBN> "0006511409X" ;
  <Book#Title> "The Glass Palace" ;
  <Book#Year>  "2000" ;
  <Book#Author> <Author/ID=id_xyz#_> .

<Author/ID=id_xyz#_> a <Author> ;
  <Author#ID> "id_xyz" ;
  <Author#Name> "Ghosh, Amitav" ;
  <Author#Homepage> "http://www.amitavghosh.com" .

Simple, isn’t it?

The result is fairly close to what we want, but not exactly. First of all, we want to use different vocabulary terms (like a:name). Also, note that the direct mapping produces literal objects most of the time, except when there is a “jump” from one table to another. Finally, the resulting graph should use a blank node for the author, which is not the case in the generated graph.

Fortunately, we have tools in the Semantic Web domain to transform RDF graphs. RIF is one possible solution; another is SPARQL, using the CONSTRUCT form. Using SPARQL is an attractive solution because, in practice, the output of the direct mapping may not even be materialized; instead, one would expect a SPARQL engine attached to a particular relational database, mapping the SPARQL queries to the table on the fly. I will use SPARQL 1.1 below because that gives nice facilities to generate RDF URI Resources from strings, i.e., to have “bridges” from literals to URI-s. Here is a possible SPARQL 1.1 query/construct that could be used to achieve what we want:

CONSTRUCT {
  ?id a:title ?title ;
    a:year  ?year ;
    a:author _:x .
  _:x a:name ?name ;
    a:homepage ?hp .
}
WHERE {
  SELECT (IRI(fn:concat("http://...",?isbn)) AS ?id)
          ?title ?year ?name
         (IRI(?homepage) AS ?hp)
  {
    ?book a <Book> ;
      <Book#ISBN> ?isbn ;
      <Book#Title> ?title ;
      <Book#Year>  ?year ;
      <Book#Author> ?author .
    ?author a <Author> ;
      <Author#Name> ?name ;
      <Author#Homepage ?homepage .
  }
}

Note the usage of a nested query; this is used to create new variables representing the URI references to be used by the outer query. The key is the IRI operator. (Both the nesting and the AS in the SELECT are SPARQL 1.1 features.)

That is it. Of course, the question does arise: which one would one use? The direct mapping or R2RML? Apart from the possible restriction that the local database system may implement the direct mapping only, it becomes also a question of taste. The heavy tool in R2RML is, in fact, the embedded SQL query; if one is comfortable with SQL than that is fine. But if the user is more comfortable with Semantic Web tools (e.g., SPARQL or RIF) then the direct mapping might be handier.

(Note that these are evolving documents still. I already know that my previous blog is wrong in the sense that it is not in line with the next version of R2RML. Oh well…)

November 2, 2010

My first mapping from RDB to RDF using R2RML

The W3C RDB2RDF Working Group has just published a first public Working Draft for the standardized RDB->RDF mapping language called R2RML. I decided that the only way to understand a specification like that is to try to use it for an example. Caveat: this is a “First Public Working Draft” for R2RML, so many things still have to happen and there will be changes.

For several years now I use a simple example in my generic Semantic Web tutorial (see, e.g., the one at SemTech). It is an artificial example referring to an imaginary bookshop’s table:

which is then converted into an RDF Graph:

(And the tutorial story is how this graph can be merged with a graph coming from another bookshop’s data.) Up until now I always glossed over how this mapping is done. Well, so how could that be done with R2RML?

R2RML defines mappings that describe how an RDB table is mapped on triples. (R2RML is in itself in RDF, b.t.w.) Simply put, in R2RML, each row of a table is mapped to an RDF subject; the individual cells, with the column names, provide the object and the predicates, respectively.

If we look at the middle table in the example, it corresponds to the lower right hand part of the graph. The R2RML mapping has to specify that the homepage column should actually produce an RDF Resource as a literal and not a string. Furthermore, the first column should become a blank node; that has to be specified, too. Here is the way this is all specified:

:Table2 rdf:type rr:TriplesMap ;
    rr:logicalTable "Select  ("_:" || ID) AS pid, Name, ("<" || Homepage || ">) AS Home from person_table";
    rr:subjectMap [ a rr:BlankNodeMap ; rr:column "pid" ; ] ;
    rr:propertyObjectMap [ rr:property a:name; rr:column "Name" ] ;
    rr:propertyObjectMap [ a rr:IRIMap ; rr:property a:homepage; rr:column "Home" ] .

What happens here is:

  1. a mapping is defined that turns the original table into a virtual, “logical” table using SQL. The goal here is to generate a blank node ID on the fly, and a URI in NTriple syntax (note, however, that I am not sure it is o.k. to use that approach in the spec!);
  2. the subject for the triples is chosen to be a cell in a specific column (“pid”, generated by the SQL transform of the previous point), and it is also specified that this is a blank node;
  3. the other two properties are specified (for the same subject); the one for the home page also specifies that the object must be a URI resource (as opposed to a Literal).

That is it. Mapping of the bottom table to the lower left hand corner of the graph is also quite similar, I will not go into this here.

But we still need the “root”, so to say, i.e., the node in the upper right hand corner, the top portion of the graph (with the title and the year) and, mainly, we also have to relate the root to the portion of the graph that is generated from the middle table.

First, the following R2RML part does the job of generating the top part of the graph:

:Table1 rdf:type rr:TriplesMap ;
    rr:logicalTable "Select ('<http:..isbn/' || ISBN || '>') AS isbn, 
                     Author, Title, Publisher, Year from book_table";
    rr:subjectMap [ rdf:type rr:IRIMap ; rr:column "isbn" ] ;
    rr:propertyObjectMap [ rr:property a:title ; rr:column "Title" ; ] ;
    rr:propertyObjectMap [ rr:property a:year ; rr:column "Year" ; ] ;

The only role of the mapping to a logical table is to generate a URI from the ISBN; all the other cells are, conceptually, simply copied on the logical table. The rest is fairly straightforward.

The missing trick is to combine, i.e., to “join”, the two tables on the graph. R2RML has a separate construction for that, referred to as “mapping” the foreign keys. The following additional statements should be added to :Table1:

    rr:foreignKeyMap [ 
       rr:key a:author ; 
       rr:parentTriplesMap :Table2 ; rr:joinCondition "{child}.Author = {parent}.pid"
    ] .

Which combines the nodes defined by :Table1 with those of :Table2. And voilà! We’re done: the R2RML document is ready, i.e., an R2RML engine would generate my example table into my example graph.

Of course, there are more complicated possibilities. Triples, or whole rows, can be explicitly stored in a specific named graph, for example. Or a column defining a predicate could, actually, use a cell in another column as an object. Etc. And, to be honest, I am not even 100% sure that above is correct, I may have misunderstood some details. But the “melody” is still clear.

Note the role the SQL based mapping of the original table to the logical table has. For SQL experts, most of the work can be done there, i.e., the resulting RDF graph can be ready for further usage by an application, to be linked into the LOD, to be used with the right attributes, namespaces, etc. Which is very powerful indeed, provided… the user has the necessary SQL expertise. And, while that is obviously true for database managers, it is not necessarily true for RDF experts. For those, a slightly different model seems to be more appropriate: they would prefer to get an RDF graph ASAP, so to say, without any fancy transformation, and would then use RIF, SWSRL, SPARQL’s CONSTRUCT, etc., to turn it into the RDF graph they eventually want to have. In other words, they may not need the concept of a logical table. That is what is referred to by the group as the “default” mapping. I.e., what graph does one get if nothing is specified? If that is properly defined then, say, RIF experts can use their expertise instead of SQL. This default mapping is not yet fully specified by the group, but it is on its way; it will be published shortly, and will complete the R2RML picture. So watch that space…

October 27, 2010

Publication of the Last Call for RDFa Core 1.1

The W3C RDFa Working Group has just published the “Last Call Working Draft” for RDFa Core 1.1. As Manu Sporny, the co-chair of the group, said in his tweet, this W3C jargon is equivalent to a “feature freeze”. Ie, the group does not know of any outstanding technical issues and of missing features that it would reasonably plan to add. Put it another way, this is last round of commenting before proceeding to final implementation testing and, hopefully, to a final W3C Standard. I.e., Last Call doesn’t mean that the group takes no more comments; on the contrary, technical comments are very welcome and necessary to make it sure that the final outcome is correct. Please, send your comments to the groups mailing list: public-rdfa-wg@w3.org (there is also a public archive).

Although lots of things have been discussed in the past few months (i.e., since the last draft published in August) not many things have significantly changed, in fact. Most of the changes are editorial, making the text clearer, more precise, etc. (You can look at the “diff” file, if you are interested.) This document is for the Core, i.e., the generic RDFa processing that can be used for any DOM. It is to be expected to have, in a few days, a similar document published for XHTML+RDFa 1.1 by the same Working Group, and an HTML5+RDFa 1.1 by the HTML Working Group.

I have also worked, in parallel to the specification work, on a modified version of the RDFa distiller. While the “official” service remains unchanged and relies on the current RDFa Recommendation, there is now a “Shadow” version, that relies on RDFa 1.1. The underlying code has undergone some cleanups beyond the adaptation to RDFa 1.1 so I am sure there are bugs…

Finally, a blatant self-promotion: Stéphane Corlosquet, Lin Clark and I will give a tutorial at the upcoming ISWC conference in Shanghai on RDFa and Drupal. The RDFa part relies on 1.1… (There are links to the slides on the page but you do not expect us not to touch them any more before the tutorial itself, do you? So make sure you look at them again after the event…)

October 16, 2010

Open Data as a Tangram

Filed under: Semantic Web,Work Related — Ivan Herman @ 14:59
Tags: ,

The Open Data Tangram of the W3C Brazil Office


Many of us have seen, or heard of Tim’s talk on linked open data using a bag of potato chips as an example. I have just stumbled into another analogy for the usefulness of open data yesterday. It is not my idea, just telling about it…

I have spent a day at the W3C Brazil Office in São Paulo yesterday. As part of their goodies to distribute, the Office has produced a small Tangram. (For those who do not know, a Tangram is an old Chinese puzzle: it consists of a small set of simple geometric forms that can be arranged in a square. The cute thing is that there is a large number of figures that can be created by simply rearraning those pieces.) The distributed Tangram has puzzle pieces that are annotated with terms like (I hope I get it right, I do not speak Portuguese) “Compatibility”, “Transparency”, etc.

And indeed. Organizations would publish their data in some configuration (say, a square…). The rich possibilities come from the fact that anybody can take those pieces of data, rearrange them and produce different, cute configurations, i.e., applications using the very same data. And that is where the power of open data comes…

September 28, 2010

ICT2010 Event Brussels, 2nd day: eGov (#ict2010eu for twitter…)

The main event today, as far as I am concerned, was the Governmental Linked Data session that some of us organized under the auspices of the Open Knowledge Foundation. The idea was to talk about the goals, dreams, and problems of Governmental Linked Data to the non-initiated (and the non-converted:-). I believe (although one is never objective about one’s own child) that the session went really well. There were cca. 140 people in the audience which, frankly, exceeded my expectation. Josema gave a nice overview of his “dreams”, i.e., what are the goals and promises of this whole move; this was followed by Jonathan’s dreams that were, of course, largely identical to Josema’s, but he also gave some data and facts about what is happening in Europe these days (e.g., in the area of data catalogues). He also referred to the upcoming European data catalogue project (PublicData.eu) which will be a great asset when it comes. Jeni talked not only about her dreams but also some of the practical experiences in deploying that stuff; as somebody deeply involved in the UK governmental project, i.e., as a person in the trenches, so to say, Jeni was really a great person to talk about that. The fourth and last speaker was Andreas, showing some existing applications on linked governmental data, and also talking about his dream of an application that would, e.g., help in the discussion on problematic societal issues like the Stuttgart 21 project. (Actually, Andreas had the temerity of using the Internet for live demos; with the absolutely awful quality network at the conference I would not have dared to do so!) There was also a lively discussion and questions after the presentations, both as part of the official session as well as after it. It is difficult to say how many people we “reached”, of course, but I think we were successful in getting the idea of Governmental Linked Data more accepted by a wider audience. (B.t.w., there is also a page with all the slide references.) It was interesting that, later in the day, I had a chat with A colleague who claimed that by now the very idea of linked data, and of governmental linked data, is widely accepted by everybody as a way to go, though, of course, lots of details have to be fleshed out. I may not be so up-beat than he is, but, well, it may just be my usual pessimism…

Other than this session, I also listened to several session on the Future Internet. There is now a new funding round on this topic (with a deadline mid January), so it obviously drew quite some attention. In spite of the fact that it is quite difficult to grasp what this think is all about. The goals described by various speakers were putting an emphasis on the societal aspects of upcoming works, on trying to understand what the profound, societal consequences of the ubiquitous internet presence are, what social changes will that bring, how can we understand, via interdisciplinary work, the evolutions, etc. These are all really exciting questions although also very difficult. What bothered me a little bit that all this sounded very familiar: it was the same set of goals outlined by the Web Science Initiative, these days Web Science Trust: just make a global change of “internet” to “Web”, and you got the same! This was all the more disturbing that, when asked about other organizations doing similar work, the representative of the Commission referred to “a UK project called Web Science Initiative, you know, started by Wendy Hall and Tim Berners-Lee…”, i.e., they completely missed the fact that WST is not a UK thing… Missing communications here?

I ranted yesterday on some of the oddities of the conference organization. Sorry, I have to add some more: we (the organizers of the session) sent them the detailed program of the session a few weeks ago. They did put it up on the Web in… Microsoft Word format. What would have costed them to convert that at least into PDF (or ask us to do it, if necessary), let alone turning it into HTML. At a time when everybody is talking about mobile devices and mobile internet, putting up a piece of information that no mobile phone, for example, can read… (B.t.w., they distributed the program of the conference on a USB stick, which is fine, but with a bunch of programs running on Windows only… When will such organizers learn that there are people out there using Linux or a Mac? Sigh…)

B.t.w.: if you have not realized yet, the #ict2010eu twitter feed contains a huge number of entries, a bunch of them are related to our session…

« Previous PageNext Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 2,545 other followers