Ivan’s private site

January 24, 2012

Nice reading on Semantic Search

I had a great time reading a paper on Semantic Search[1]. Although the paper is on the details of a specific Semantic Web search engine (DERI’s SWSE), I was reading it as somebody not really familiar with all the intricate details of such a search engine setup and operation (i.e., I would not dare to give an opinion on whether the choice taken by this group is better or worse than the ones taken by the developers of other engines) and wanting to gain a good image of what is happening in general. And, for that purpose, this paper was really interesting and instructive. It is long (cca. 50 pages), i.e., I did not even try to understand everything at my first reading, but it did give a great overall impression of what is going on.

One of the “associations” I had, maybe somewhat surprisingly, is with another paper I read lately, namely a report on basic profiles for Linked Data[2]. In that paper Nally et al. look at what “subsets” of current Semantic Web specifications could be defined, as “profiles”, for the purpose of publishing and using Linked Data. This was also a general topic at a W3C Workshop on Linked Data Patterns at the end of last year (see also the final report of the event) and it is not a secret that W3C is considering setting up a relevant Working Group in the near future. Well, the experiences of an engine like SWSE might come very handy here. For example, SWSE uses a subset of the OWL 2 RL Profile for inferencing; that may be a good input for a possible Linked Data profile (although the differences are really minor, if one looks at the appendix of the paper that lists the rule sets the engine uses). The idea of “Authoritative Reasoning” is also interesting and possibly relevant; that approach makes a lot of pragmatic sense, I wonder whether this is not something that should be, somehow, documented for a general use. And I am sure there are more: In general, analyzing the experiences of major Semantic Web search engines on handling Linked Data might provide a great set of input for such pragmatic work.

I was also wondering about a very different issue. A great deal of work had to be done in SWSE on the proper handling of owl:sameAs. On the other hand, one of the recurring discussions on various mailing list and elsewhere is on whether the usage of this property is semantically o.k. or not (see, e.g., [3]). A possible alternative would be to define (beyond owl:sameAs) a set of properties borrowed from the SKOS Recommendation, like closeMatch, exactMatch, broadMatch, etc. It is almost trivial to generalize these SKOS properties for the general case but, reading this paper, I was wondering: what effect would such predicates have on search? Would it make it more complicated or, in fact, would such predicates make the life of search engines easier by providing “hints” that could be used for the user interface? Or both? Or is it already too late, because the ubiquitous usage of owl:sameAs is already so prevalent that it is not worth touching that stuff? I do not have a clear answer at this moment…

Thanks to the authors!

  1. A. Hogan, et al., “€œSearching and Browsing Linked Data with SWSE: the Semantic Web Search Engine”€, Journal of Web Semantics, vol. 4, no. December, pp. 365-401, 2011.
  2. M. Nally and S. Speicher, “Toward a Basic Profile for Linked Data”, IBM developersWork, 2011.
  3. H. Halpin, et al. “When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data”, Proceedings of the International Semantic Web Conference, pp. 305-320, 2010

November 7, 2011

November 2, 2011

Some notes on ISWC2011…

The 10th International Semantic Web Conference (ISWC2011) took place in Bonn last week. Others have already blogged on the conference in a more systematic way (see, for example, Juan Sequeda’s series on semanticweb.com); there is no reason to repeat that. Just a few more personal impression, with the obvious caveat that I may have missed interesting papers or presentations, and the ones I picked here are also the results of my personal bias… So, in no particular order:

Zhishi.me is the outcome of the work of a group from the APEX lab in Shanghai and Southeast University: it is, in some ways, the Chinese DBPedia. “In some ways” because it is actually a mixture of three different Chinese, community driven encyclopedia, namely the Chinese Wikipedia, Baidu Baike and Hudong Baike. I am not sure of the exact numbers, but the combined dataset is probably a bit bigger than DBpedia. The goal of Zhishi.me is to act as a “seed” and a hub for Chinese linked open data contributions, just like DBpedia did and does for the LOD in general.

It is great stuff indeed. I do have one concern (which, hopefully, is only a matter of presentation, i.e., may be a misunderstanding on my side). Although zhishi.me is linked to non-Chinese datasets (DBPedia and others), the paper talks about a “Chinese Linked Open Data (COLD)”, as if this was something different, something separate. As a non-English speaker myself I can fully appreciate the issues of language and culture differences but I would nevertheless hate to see the Chinese community develop a parallel LOD, instead of being an integral part of the the LOD as a whole. Again, I hope this is just a misunderstanding!

There were a number of ontology or RDF graph visualization presentations, for example from the University of Southampton team (“Connecting the Dots”), on the first results of an exploration done by a Magnus Stuhr and his friends in Norway, called LODWheel (the latter was actually at the COLD2011 Workshop), or another one from a mixed team, led by Enrico Motta, on a visualization plugin to the NeOn toolkit called KC-Viz. I have downloaded the latter, and have played a bit with it already, but I haven’t had the time to have a really informed conclusion on it yet. Nevertheless, KC-Viz was interesting for me for a different reason. The basic idea of the tool is to use some sort of an importance metric attached to each node in the class hierarchy and direct the visualization based on that metric. It was reminiscent to some work I did in my previous life on graph visualization, though the metric was different, the graph was only a tree, the visualization approach was different, but nevertheless, there was a similar feel to it… Gosh, that was a long time ago!

The paper of John Howse et al. on visualizing ontologies was also interesting. Interesting because different: the idea is a systematic usage of Euler diagrams to visualize class hierarchies combined with some sort of a visual language for the presentation of property restrictions. In my experience property restrictions is a very difficult (maybe the most difficult?) OWL concept to understand without a logic background; any tool, visual or otherwise, that helps teaching and explaining this can be very important. Whether John’s visual language is the one I am not sure yet, but it may well be. I will consider using it the next time I give a tutorial…

I was impressed by the paper of Gong Cheng and his friends from Nanjing, “Empirical Study of Vocabulary Relatedness…”. Analyzing the results of a search engine (in this case Falcons) to draw conclusion on the nature, the usage, the mutual relationship, etc., of vocabularies is very important indeed. We need empirical results, bound to real life usage. This is not the first work in this direction (see, for example, the work of Ghazvinia et al, from ISWC2009), but there is still much to do. Which reminds me of some much smaller scale work Giovanni, Péter and I didon determining the top vocabulary prefixes for the purpose of the RDFa 1.1 initial context (we used to call it default profile back then). I should probably try to talk to the Nanjing team to merge with their results!

I think the vision paper of Marcus Cobden and his friends (again at the COLD2011 Workshop) on a “Research Agenda for Linked Closed Data” is worth noting. Although not necessarily earthshaking, the fact that we can and we should speak about Linked Closed Data alongside Linked Open Data is important if we want the Semantic Web to be adopted and used by the enterprise world as well. One of the main issue, which is not really addressed frequently enough (although there have been some papers published here and there) is access control. Who has the right to access data? Who has the right to access a particular ontology or rule set that may lead to the deduction of new relationships? What are the licensing requirements, how do we express them? I do not think our community has a full answer to these. B.t.w., W3C organizes a Workshop concentrating on the enterprise usage of Linked Data in December…

Speaking about research agenda… I really liked Frank van Harmelen’s keynote on the second day of the conference. His approach was fresh, and the question he asked was different: essentially, after 10 or more years of research in the Semantic Web area, can we derive some “higher level” laws that describe and govern this area of research? I will not repeat all the laws that he proposed, it is better to look his Web with the HTML version of his slides. The ones that is worth repeating again and again are that “Factual knowledge is a graph”, “Terminological knowledge is a hierarchy”, and “Terminological knowledge is much smaller than the factual knowledge”. Why are these important? To quote from his keynote slides:

  1. traditionally, KR has focussed on small and very intricate sets of axioms: a bunch of universally quantified complex sentences
  2. but now it turns out that much of our knowledge comes in the form of very large but shallow sets of axioms.
  3. lots of the knowledge is in the ground facts, (not in the quantified formula’s)

Which is important to remember when planning future work and activities. “Reasoning”, usually, happens on a huge set of ground facts in a graph, with a shallow hierarchy of terminology…

I was a little bit disappointed by the Linked Science Workshop; probably because I had wrong expectations. I was expecting a workshop looking at how Linked Data in general can help in the renewal of the scientific publication process as a whole (a bit along the lines of the Force11 work on improving the future of scholarly communication). Instead, the workshop was more on how different scientific fields use linked data for their work. Somehow the event was unfocussed for me…

As in some previous years, I was again part of the jury for the Semantic Web Challenge. It was interesting how our own expectations have changed over the years. What was really a wow! a few years ago, has become so natural that we are not excited any more. Which is of course a good thing, it shows that the field is maturing further, but we may need some sort of a Semantic Web Super-Challenge to be really excited again. That being said, the winners of the challenge really did impressive works, I do not want to give the impression of being negative about them… It is just that I was missing that “Wow”.

Finally, I was at one session of the industrial track, which was a bit disappointing. If we wanted to to show the research community that the Semantic Web technologies are really used by industry, then the session did not really make a good job on that. With one exception, and a huge one at it: the presentation of Yahoo! (beware, the link is to a PowerPoint slidedeck). It seems that Yahoo! is building an internal infrastructure based on what they call “Web of Objects”, by regrouping pieces of knowledge in a graph-like fashion. By using internal vocabularies (superset of schema.org) and using the underlying graph infrastructure they aim at regrouping similar or identical knowledge pieces harvested on the Web. I am sure we will hear more about this.

Yes, it was a full week…

Enhanced by Zemanta

April 9, 2011

Announcement on rNews

Filed under: Semantic Web,Work Related — Ivan Herman @ 6:38
Tags: , ,
Semantic Web Bus / Bandwagon

Image by dullhunk via Flickr

A few days ago IPTC published a press release on rNews: “Standard draft for embedding metadata in online news”. This is, potentially, a huge thing for Linked Data and the Semantic Web. Without going into too much technical details (no reason to repeat what is on the IPTC pages on rNews, you can look it up there) what this means is that, potentially, all major online news services on the globe, from the Associated Press to the AFP, or from the New York Times to the Süddeutsche Zeitung, will have have their news items enriched with metadata, and this metadata will be expressed in RDFa. In other words, the news items will be usable, by extracting RDF, as part of any Semantic Web applications, can be mashed up with other types of data easily, etc. In short, news item will become a major part of the Semantic Web landscape with the extra specificity to be an extremely dynamic set of data that is renewed every day. That is exciting!

Of course, it will take some time to get there, but we should realize that IPTC is the major standard setting body in the news publishing world. I.e., rNews has a major chance to be largely adopted. It is time for the Semantic Web community to pay attention…

Enhanced by Zemanta

September 28, 2010

ICT2010 Event Brussels, 2nd day: eGov (#ict2010eu for twitter…)

The main event today, as far as I am concerned, was the Governmental Linked Data session that some of us organized under the auspices of the Open Knowledge Foundation. The idea was to talk about the goals, dreams, and problems of Governmental Linked Data to the non-initiated (and the non-converted:-). I believe (although one is never objective about one’s own child) that the session went really well. There were cca. 140 people in the audience which, frankly, exceeded my expectation. Josema gave a nice overview of his “dreams”, i.e., what are the goals and promises of this whole move; this was followed by Jonathan’s dreams that were, of course, largely identical to Josema’s, but he also gave some data and facts about what is happening in Europe these days (e.g., in the area of data catalogues). He also referred to the upcoming European data catalogue project (PublicData.eu) which will be a great asset when it comes. Jeni talked not only about her dreams but also some of the practical experiences in deploying that stuff; as somebody deeply involved in the UK governmental project, i.e., as a person in the trenches, so to say, Jeni was really a great person to talk about that. The fourth and last speaker was Andreas, showing some existing applications on linked governmental data, and also talking about his dream of an application that would, e.g., help in the discussion on problematic societal issues like the Stuttgart 21 project. (Actually, Andreas had the temerity of using the Internet for live demos; with the absolutely awful quality network at the conference I would not have dared to do so!) There was also a lively discussion and questions after the presentations, both as part of the official session as well as after it. It is difficult to say how many people we “reached”, of course, but I think we were successful in getting the idea of Governmental Linked Data more accepted by a wider audience. (B.t.w., there is also a page with all the slide references.) It was interesting that, later in the day, I had a chat with A colleague who claimed that by now the very idea of linked data, and of governmental linked data, is widely accepted by everybody as a way to go, though, of course, lots of details have to be fleshed out. I may not be so up-beat than he is, but, well, it may just be my usual pessimism…

Other than this session, I also listened to several session on the Future Internet. There is now a new funding round on this topic (with a deadline mid January), so it obviously drew quite some attention. In spite of the fact that it is quite difficult to grasp what this think is all about. The goals described by various speakers were putting an emphasis on the societal aspects of upcoming works, on trying to understand what the profound, societal consequences of the ubiquitous internet presence are, what social changes will that bring, how can we understand, via interdisciplinary work, the evolutions, etc. These are all really exciting questions although also very difficult. What bothered me a little bit that all this sounded very familiar: it was the same set of goals outlined by the Web Science Initiative, these days Web Science Trust: just make a global change of “internet” to “Web”, and you got the same! This was all the more disturbing that, when asked about other organizations doing similar work, the representative of the Commission referred to “a UK project called Web Science Initiative, you know, started by Wendy Hall and Tim Berners-Lee…”, i.e., they completely missed the fact that WST is not a UK thing… Missing communications here?

I ranted yesterday on some of the oddities of the conference organization. Sorry, I have to add some more: we (the organizers of the session) sent them the detailed program of the session a few weeks ago. They did put it up on the Web in… Microsoft Word format. What would have costed them to convert that at least into PDF (or ask us to do it, if necessary), let alone turning it into HTML. At a time when everybody is talking about mobile devices and mobile internet, putting up a piece of information that no mobile phone, for example, can read… (B.t.w., they distributed the program of the conference on a USB stick, which is fine, but with a bunch of programs running on Windows only… When will such organizers learn that there are people out there using Linux or a Mac? Sigh…)

B.t.w.: if you have not realized yet, the #ict2010eu twitter feed contains a huge number of entries, a bunch of them are related to our session…

July 12, 2010

Experiences of LOD publication

Filed under: Semantic Web,Work Related — Ivan Herman @ 10:39
Tags: , ,

Frank van Harmelen’s tweet drew my attention on a paper of Jan Hannemann and Jürgen Kett on Linked Data for Libraries. I hope Jan and Jürgen will not be upset if I copy some quotes from their paper, but I thought that giving more publicity to some of their experiences in deploying linked data at the German National Library is worthwhile. Reproduced here without change though somewhat shortened:

  • Setting up a service is not trivial. […] the essential software solutions (tools) involved have not reached full maturity yet. […] documentation may be lacking the required depth. […] multiple software components need to be setup to work together  […] which requires appropriate expertise.[…]
  • Data modeling can be complex. When publishing data on the web, it is advantageous to use existing, registered ontologies. Unfortunately, these ontologies do not always match the data representation of each individual library […] the definitions of individual properties can vary considerably. […] There is no simple answer to the question which is the right thing to do.[…]
  • Open data exchange mentality does not exist everywhere. Even before linked data, libraries have exchanged and aligned their data sets. The results of such projects could be prime information sources for connecting linked data sets. Sadly, not all institutions involved share the open exchange mentality, and shared ownership may make it difficult to publish these results.
  • Best practices are seen as rules. Linked open data is based largely on best practices rather than rules. However, this pragmatic aspect is not seen as essential in all areas of the linked data community. Deviations from perceived standards tend to be criticized, which can cause institutions new to the semantic web to doubt their decisions – even if they make sense for the organization in question. Libraries should not be deterred by such feedback and rather see this as a motivation to contribute their own experiences and knowledge to the community. Guidelines and best practices should be carefully considered in the context of each institution’s needs, especially in this early forming phase of the semantic cloud.[…]
  • Properly modeled data is very useful. Once the data modeling is completed and the data made available, it can be used by others. A colleague at the Technical University of Braunschweig has shown that with properly modeled data, this can result in very useful applications: within a day, he imported our data into a database, added a web interface and had thus created a searchable access to our data.

June 29, 2010

SemTech2010 & co.

I am on my way home from a long trip in the US (writing these lines on the plane, to be posted from home). Few days in Seattle, SemTech 2010 in San Francisco, finally the “RDF Next Steps” workshop in Palo Alto (i.e, Stanford). I do not want to write about the last one now, simply because we hope to have a more extended public report available within 10-15 days. I.e., more about that later.

Seattle consisted of a number of company visits, but it also included a talk at the SemWeb Meetup in Seattle. I gave a presentation on what happened at W3C the last year which, I think, was was well received. (Although one is never sure about these things.) I had a bunch of discussions and chats after the presentation; it was pleasant, relaxing… I and mainly my colleague from W3C, Eric Prud’hommeaux, had also a long discussion with two developers from Microsoft who are involved in the oData work; that was really interesting because we reached the conclusion of possibly outlining together a possible plan whereby we could write down how to “export” oData into RDF, and publish that, e.g., as W3C note (note that there are already systems doing something like that out there, but I am not knowledgeable enough to judge how complete those solutions are). I think it would be good for the community if this happens. It is important for a general Web of Data to include, well, all the data on the Web…

Semtech… it was big. Bigger than last year (I heard and read a figure of a 30% increase in attendance). This industry is lively indeed! The only problem that it was almost too big; it was the conference of eternal frustration:-( Indeed, there were so many things in parallel that one always had the feeling to have missed something because another, parallel session may have been more interesting! I heard presentations from Facebook, from Google, saw stunning visualizations of RDF graphs, or heard about plans on ontology hosting and management. There was a report on the US and UK governmental data work (this stuff still amazes me, though it is not the first time I hear about it), there was a presentation of BestBuy (alas! I missed that one). There was a separate track on the publication world as a separate “vertical” area (and we also had some great discussions with the people from the New York Times with whom we outlined a possible first step in gathering that community). Lots of hallway conversation with companies and institutions and, of course the social life, chatting with David, and Ian, and the other Ian, and Eric, and the other David, and Christine, and Jeremy, and Jim, and Fabien, and Sandro, and Jenni, and… I should stop and not even try to list everybody because it is simply impossible! I also gave an introductory Semantic Web Tutorial (quite a lot of people in the audience, and I think it went well), we had a panel on the W3C RDB2RDF work and another one on SPARQL 1.1. As a nice little touch, I could announce the publication of the W3C RIF Recommendation as a primeur during the tutorial when as I was talking about RIF (the publication itself happened while I was talking…)

There were, as every year, some “buzz” topics. My impression that the linked open governmental data effort was a buzz and was still new information for many. Facebook’s keynote on the Open Graph Protocol crated another buzz. More generally, RDFa was definitely a buzz (big time!). I.e., as I said, this industry is lively and continue to be exciting.

But there are of course challenges. The way I feel it the biggest challenge is not technical. Yes, of course, there are technical issues, but those will be solved, eventually. The issue is outreach, to get to those new communities who may understand the value of a Web of Data in general but have not enough guidance on how to start doing something. How to publish the data, how to link it to other data, how to consume it, use it, mash it up… How to talk to “C-level” people, how to reach out to them. There are books, of course, but not enough; there are tutorials and guides, of course, but not enough; there are experts around but definitely not enough. As one of our discussion partners put it: if I go to any better bookshop, there are rows of books on, say, XML (good or bad, but they are there). But books on RDF, on Linked Data, on SPARQL, on SKOS, on OWL: only a few here and there (comparatively, that is), and some of them are actually quite old. Let alone the problem of trying to hire experts that could do the job. I really feel that this is the biggest challenge our community faces. I say “community” and not only a single organization like W3C or other; the challenge is too great to be solved by one group only. We have been fighting with this issue for a while now, but it is still a challenge… And a challenge for us all who care about that stuff!

It was a good week!

May 12, 2010

RIF (Core) and LOD

Linked Data (Semantic Web) candies
Image by reedster via Flickr

W3C has just published a Proposed Recommendation for the Rule Interchange Format (RIF); this means, in the W3C jargon, that the technical work is done, and the W3C asks its members for a seal of approval to publish it as Recommendation.

Somehow the RIF development was not on the radar screen of the Semantic Web community. There may be many reasons for that, and I think we should just accept this as part of history. The future is much more important; we should indeed realize that RIF is an important piece of the Semantic Web technical architecture and let us do our best to get it embraced widely.

RIF Core is the simplest variant of RIF. It is not very complicated. It is a simple rule language; one can define a series of Horn rules, there are some safety features built in so that the rules can be executed, conceptually, by a forward chaining engine, it has the familiar XSD datatypes with the usual operations, it operates on URI-s, and it has a notion analogous to RDF blank nodes. There is a separate document that describes how RIF (Core) rules operate with RDF data and how the various semantics (RIF, RDF(S), OWL) work together. The details are not really important here, suffices it to say that it, essentially, works like one would expect as a layperson… The RIF syntax is a little bit convoluted for the moment, but there may be work coming up to improve that in form of alternative, more readable syntaxes.

So what can it be used for? At the W3C LOD Camp in Raleigh (held as part of the WWW2010 conference), Sandro Hawke already gave a simple example why RIF should be interesting for LOD applications. Let me add a few further examples that might be of interest.

Remember OWL-RL? The OWL Working Group has defined a subset of OWL that can be handled by rules. The rules themselves were also published by the OWL WG, albeit using an abstract notation. Those rules can be described in RIF Core as well; the RIF group has published this mapping in a separate document. Following those rules a RIF Core engine can handle OWL-RL directly.

Why is that interesting?—you might ask. Well, there has been quite some discussions when defining OWL RL on whether the features included in OWL RL represent the right set for users. Some claimed that there are other OWL features that could be added; others said that, on the contrary, the complexity of OWL RL is already too high and the features should be reduced to make them more palatable to users. In some ways, the usage of RIF Core may make this discussion moot. Indeed, users, or user communities, can define the rules they are interested in RIF by cherry picking the rules described by the RIF WG in the document cited above. They can send those rules to their RIF Core reasoner alongside their data, and get what they want. If that rule set consists only of 2-3 OWL rules, because that is all the application cares about, than all the better, the RIF inference engine will just do its job faster. If the user wants to add OWL features that are not in OWL RL, that may also be doable; the OWL 2 RDF-Based semantics specification is such that, in many cases, the extra rules can be extracted fairly easily from the OWL 2 Full semantics, using the patterns in the RIF/OWL RL document (although I have to emphasize that this does not work in all cases!). Note that this model of “sending” the RIF rule set alongside the RDF data to a reasoner is exactly the way RIF reasoning is being defined for SPARQL1.1 in the separate Entailment Regimes document (still in draft). Note also that I referred to OWL RL here, but the same approach could be used with RDFS with, obviously, a smaller RIF Rule set.

Another, albeit related application of RIF came to my mind reading an email discussion on whether inferences should be materialized for large LOD datasets or not and, if yes, which ones. As an answer to Vasiliy Faronov’s question, Leigh Dodds also proposed a text to be added to his Linked Data Patterns book. The resulting discussion thread was really about which inferences should really be materialized. Materializing them all may not be realistic; but if only a selection of the possible inferences is used (eg, subset of RDFS or OWL) how would consumers of the data know? It looks like RIF may come to rescue. Publishers could simply publish the rules they use for materializing their inferences in RIF. (Again, this is not always possible; RIF cannot cover the whole of OWL. But it does cover a very large percentage of the use cases.) Consumers may actually choose whether they want to download all the triples, including the inferenced triples, or whether they choose to download data from the “core” dataset only together with the RIF file, and materialize the inferences locally using a local RIF engine (or use the RIF file with an RIF Entailment aware SPARQL 1.1 engine).

RIF is and should be considered as integral and essential part of the Semantic Web Technology landscape. Let us hope many implementations of, at least, RIF Core will bloom to make this a reality! (There is a public list of existing implementations so far.)

April 17, 2010

AR and Linked Data

Filed under: Semantic Web,Work Related — Ivan Herman @ 16:56
Tags: , , ,

I had the pleasure to be at a an Augmented Reality (AR) Dev Camp today in Amsterdam. It was a very heterogeneous crowd, from Semantic Web people (after all, one of the organizers was Dan Brickley) to artists. But that is probably the nature of AR these days…

AR is of course not a new discipline; I guess the R&D in AR goes back at least 15 years. But the appearance of high-end mobile devices made this, suddenly, a viable business: the fact that the devices have location capabilities and as well as compasses make it possible to create really cool applications. Johannes la Poutré made a nice and short overview of what is happening in this area; another nice example is the “Berlin Wall is back” application.

What does this have to do with Linked Data, you might ask. Well the very essence of these applications is to use data to increase the visual experience of a mobile phone camera. And use lots of data. And use lots of up-to-date and semantically organized data, because applications have to have intelligent filtering to save bandwidth. This means that developers in AR look at linked data with lots of interest; they were pleased to hear about, eg, Dutch governmental data becoming (gradually…) available as linked data, about the LOD cloud, about technologies like Zemanta, Open Calais, RDFa… Yes, AR on mobile might become a significant application area for Linked Data. A space to watch!

(B.t.w., although it was not an augmented reality project, some of you might remember Christian Becker‘s and Chris Bizer‘s work on DBpedia Mobile: that was some sort of a precursor for some of the ideas that appear today as part of AR applications. Just imagine those Wikipedia/DBPedia data appearing on top of what you see with your camera!)

P.S. Putting my W3C hat on: W3C organizes a Workshop on Augmented Reality and Virtual Interactivity, to be held in June, in Barcelona. Interested?

February 13, 2010

Semantic 3D (Visit to the Fokus 3D workshop)

I had the pleasure, in the past two days, to participate at a workshop called Fokus3D. It was the closing event of a European R&D project of a similar name, concentrating on what is called Semantic 3D. I was invited because the project made use of certain types of Semantic Web technologies (e.g., OWL) and, also, because it is the community of my previous professional life: I did spent many years in Computer Graphics… (Which also meant that I met old friends and colleagues that I had not seen for many years, which was really very pleasant…)

So what is this “Semantic 3D”? What does it have to do with Semantic Web?

Here is a a crash course on 3D graphics: when systems display those beautiful graphics 3D objects that we are all used to, the underlying system transforms complex mathematical descriptions of shapes, surfaces, or 3D bodies into a load of (triangular) meshes that are displayed by the graphics hardware. The mathematical descriptions are purely geometrical and define, say, spline surfaces, planes, or some geometric transformations that place those surface description into space.

These 3D objects represent, usually, some real object. A chair, a car, a tree, or a house. The representation of a chair is a combination of several such shapes; some of those describe the arms, the back, etc. But this information, i.e., that this and this combination of shapes is actually the arm of a chair, is usually lost somewhere in the process. Modelers start with a concept, a “semantic”, and end up with shapes; information is gone on the way. This means that many things cannot be done well: one would like to have semantics based search (“searching for the arm of a chair”), one would like to know the origin of a a particular shape (i.e., how was it created, under what process and transformation), one would like to follow the evolution in time of a particular shape to retrace the designer’s actions, etc., etc. And, due to the huge number of shapes, managing this type (meta) data is far from obvious. Keeping that information in a manageable way together with the geometric processing: we get Semantic 3D.

There was, of course, a slight confusion of terms for me: this notion of semantics would be considered as (meta)data for Semantic Web people. That being said, such data requires controlled vocabularies, and very complicated ones at that, so there are strong connections nevertheless. But there is also semantics in terms of knowledge representation: There are relationships among, and classification of, shape elements, these relationships can represent constraints and other features that can be used for reasoning, for inference. So more complex ontologies come into the picture (and OWL is widely used in this space). These ontologies are often application dependent, reflecting the diversity of application areas from CAD to gaming, or from cultural heritage systems to medical and biological applications. In future, such ontologies should also incorporate features like uncertainty (reflecting the fact that, at least in some areas like protein modeling, those relationships are not necessarily crisp); they should also include features such as provenance or time relationships.

Last but not least: there are lots of data there. I mean lots, stored in biological databases, shape libraries, scanned historical artifacts, each representing an object (like the reproduction of the Ramses statue on the figure) with many many shapes. Integration of that data is a challenge even within one application area, let alone with data at large. It will take a long time when this data will be organized in a way that it could be, say, exposed and integrated as Linked Open Data. But we may get there, eventually (and your truly has done his best to convince the community of the value of doing that…). Standard representations have to be developed, algorithms crystallized, vocabularies and ontologies defined, etc. The good news is that there is a community that is determined to continue working in this direction. The workshop organizers plan to write down a research roadmap (to be put on line within 1-2 weeks), and a special issue of the journal “Computer & Graphics” has been announced, co-edited by Bianca Falcidieno, from the CNR in Genova, and myself. So… stay tuned.

Reblog this post [with Zemanta]
Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 2,317 other followers