Ivan’s private site

January 24, 2012

Nice reading on Semantic Search

I had a great time reading a paper on Semantic Search[1]. Although the paper is on the details of a specific Semantic Web search engine (DERI’s SWSE), I was reading it as somebody not really familiar with all the intricate details of such a search engine setup and operation (i.e., I would not dare to give an opinion on whether the choice taken by this group is better or worse than the ones taken by the developers of other engines) and wanting to gain a good image of what is happening in general. And, for that purpose, this paper was really interesting and instructive. It is long (cca. 50 pages), i.e., I did not even try to understand everything at my first reading, but it did give a great overall impression of what is going on.

One of the “associations” I had, maybe somewhat surprisingly, is with another paper I read lately, namely a report on basic profiles for Linked Data[2]. In that paper Nally et al. look at what “subsets” of current Semantic Web specifications could be defined, as “profiles”, for the purpose of publishing and using Linked Data. This was also a general topic at a W3C Workshop on Linked Data Patterns at the end of last year (see also the final report of the event) and it is not a secret that W3C is considering setting up a relevant Working Group in the near future. Well, the experiences of an engine like SWSE might come very handy here. For example, SWSE uses a subset of the OWL 2 RL Profile for inferencing; that may be a good input for a possible Linked Data profile (although the differences are really minor, if one looks at the appendix of the paper that lists the rule sets the engine uses). The idea of “Authoritative Reasoning” is also interesting and possibly relevant; that approach makes a lot of pragmatic sense, I wonder whether this is not something that should be, somehow, documented for a general use. And I am sure there are more: In general, analyzing the experiences of major Semantic Web search engines on handling Linked Data might provide a great set of input for such pragmatic work.

I was also wondering about a very different issue. A great deal of work had to be done in SWSE on the proper handling of owl:sameAs. On the other hand, one of the recurring discussions on various mailing list and elsewhere is on whether the usage of this property is semantically o.k. or not (see, e.g., [3]). A possible alternative would be to define (beyond owl:sameAs) a set of properties borrowed from the SKOS Recommendation, like closeMatch, exactMatch, broadMatch, etc. It is almost trivial to generalize these SKOS properties for the general case but, reading this paper, I was wondering: what effect would such predicates have on search? Would it make it more complicated or, in fact, would such predicates make the life of search engines easier by providing “hints” that could be used for the user interface? Or both? Or is it already too late, because the ubiquitous usage of owl:sameAs is already so prevalent that it is not worth touching that stuff? I do not have a clear answer at this moment…

Thanks to the authors!

  1. A. Hogan, et al., “€œSearching and Browsing Linked Data with SWSE: the Semantic Web Search Engine”€, Journal of Web Semantics, vol. 4, no. December, pp. 365-401, 2011.
  2. M. Nally and S. Speicher, “Toward a Basic Profile for Linked Data”, IBM developersWork, 2011.
  3. H. Halpin, et al. “When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data”, Proceedings of the International Semantic Web Conference, pp. 305-320, 2010

December 30, 2011

Mac OS Lion: The Good, the Bad and the Ugly

Filed under: Mac,Private — Ivan Herman @ 12:33
Tags: , , ,

The poster of the 'The Good the bad and the ugly' MovieI have made use of the winter recess to install Mac’s Lion on my powerbook. I must admit I hesitated for a while (I was not sure that it was worth the trouble) but then, partially driven by sheer  curiosity, I did it. And, as usual, there are pros and cons… Maybe others will find my experiences useful.

1. The Good

My tactic of waiting, i.e., not to install Lion when it was still a cub, paid off. I have seen many stories on the Web, mostly dated back in July, about installation difficulties (e.g., issues about the installation of Xcode). Well, none of these for me. It installed easily, relatively quickly (after download, the installation process was about an hour, with an additional round with the installation of Xcode). Most of the things worked without further ado, although I did have to update some programs (e.g., iTunes, Safari, mercurial, some additional tools for Mail like GPG or Mail Act-On). But these were to be expected and otherwise the system worked smoothly. For example, my local apache server started and worked as before, in contrast to the stories I saw on the Web. There were also some user interface adjustments I had to make (sorry Apple, I do not like the “natural” scrolling, and I also like to have the scrollbar always on), but the web is full of references to the necessary tricks to do these.

The system is faster. Not hugely, but faster in booting, in logging in, and also some applications, like Safari, got some speed improvements. That is always a welcome feature!

I quite like Mission Control. I used “Places” on Snow Leopard, but mission control is nicer, and works well with the full-screen feature. B.t.w., the full screen feature is also great.

I use Mail App as my primary mailer and there are (as far as I am concerned) two major improvements. On the one hand, it has a nice “conversation” feature; the particular aspect I like is that it manages conversations and “related” mails across mail folders (and I have loads of them) regardless of the fact that I use IMAP. This is great. The other nice feature is the improved search, both in speed and in the various options it gives you. Mail is my everyday workhorse, so such improvements made the upgrade to Lion already worthwhile.

I love the fact that, at last, I can resize my windows easily. I change screens often (I have an external screen at home, another one at my institute, and they are different in size…) and the fact that, on Snow Leopard, I had to grab the lower right hand corner of a window to resize it was really a drag.

At this moment I am not at my usual place, meaning I am without an external screen; I can just refer to what I read, namely that handling external screens became smoother in Lion, too. I hope that is true, the old way of closing, restarting, whatnot, was also a pain.

There are a number of additional small improvements (e.g., better spellcheck in Safari; really helpful as I write these lines:-). I am sure I will find out more as it goes.

2. The Bad

Of course, not everything is nice and rosy:-(

I miserably failed with iCloud. I tried to use it to synchronize my iPhone and iPad easily with my Mac. It simply did not work reliably as far as the calendar was concerned. I regularly ran into the problem of adding an event to my calendar on, say, my iPhone, and the result was not visible anywhere else (I tried explicit synchronization when it was clear how to do it, wait for half an hour, etc; no success). I tried it through the built-in calendar application on the iPhone (which I do not particularly like, b.t.w.) as well as some other calendar apps, to no avail. After a while I just gave up, and reversed back to my previous self, i.e., using iTunes’ synchronization. Taking into account that, with IOS 5, one can also sync from iTunes over the Wireless, it is so easy to synchronize that it does not really bother me. It is, nevertheless, surprising that Apple comes out with such a much heralded feature that simply does not work properly.

I did run into some awkwardness in the user interface of the Mail App, too. For example, one would think that this application is a prime candidate to be used full screen. However, beware: if you reply to a mail in full screen mode, you cannot switch windows (e.g., you cannot reply to two mails in parallel, stuff like that) which might make it awkward. In a sense it is understandable, but it was a surprise nevertheless. Another issue is with the conversation feature: I display my mails with increasing date order but, within a conversation, Mail keeps on using decreasing dates; I have not found a way to change that…

And then there is Launchpad. Having it is a great idea, in fact. If set up properly, it gives you an easy way to get to applications, it reduces the size of the Dock (which can be an issue on a small screen), etc. If set up properly, that is. But… I did run into several issues. Some examples:

  • At the start I saw loads of duplicate entries. This is because I organized my Application collection to my own taste before, with subdirectories, aliases, etc; I have too many applications to leave them as a flat list. This led to a bunch of duplicates. Which is understandable, but it is fairly difficult to remove application from Launchpad: although the “official” version is that one can do the same as on an iPhone (pressing an icon, and using a big X on it), but this method did not work for most of the applications. (No idea why.) Fortunately, I have found a program called Launchpad Control, which can do that for you (thank you, Andreas Ganske!)
  • There are missing entries. Hence the big question: how does one add an application to Launchpad? Answer: no idea. I have seen proposals on the Web (e.g., move the application’s icon on top of the Launchpad icon on the Dock or create alias and put it to ~/Applications): none worked for me (Maybe if I restart? I did logged out and in again, that did not change, and I did not want to restart the computer only for this.) For the time being, I gave up on that.
  • Launchpad is the typical case of an application that asks for a keyboard shortcut to start. I have found, after all, a way to do it; but does it have to be that complicated? (Actually, I saw some notes on the Web that the keyboard shortcut will disappear after reboot. I hope that will not be the case…)

Bottom-line: although I will use Launchpad, probably, it is not what it should be. Hopefully later releases will improve this.

3. The Ugly

No new item here, just a remark: it is really surprising to me that Apple would come out with such unfinished products like iCloud or Launchpad. It is perfectly o.k. to come out with Lion, add these programs in the state they are in, and make it clear to people that this is work in progress. Everybody would understand that. But doing it this way simply reduces the credibility of Apple… Pity.

December 20, 2011

“Hungary’s Constitutional Revolution”–a sad example

Filed under: Hungary,Private — Ivan Herman @ 12:32
Tags: ,

Kim Lane Scheppele published an analysis in the New York Times on “Hungary’s Constitutional Revolution”. A, in my view, very good, and fairly depressing analysis of the current situation in Hungary. How can a country possibly slide into some sort of authoritarianism dominated by one single ideological view, following a path that is perfectly “legal” (though morally objectionable) at every step of the way. A sad example:-(

Enhanced by Zemanta

December 16, 2011

Where we are with RDFa 1.1?

English: RDFa Content Editor

Image via Wikipedia

There has been a flurry of activities around RDFa 1.1 in the past few months. Although a number of blogs and news items have been published on the changes, all those have become “officialized” only the past few days with the publication of the latest drafts, as well as with the publication of RDFa 1.1 Lite. It may be worth looking back at the past few months to have a clearer idea on what happened. I make references to a number of other blogs that were published in the past few months; the interested readers should consult those for details.

The latest official drafts for RDFa 1.1 were published in Spring 2011. However, lot has happened since. First of all, the RDFWA Working Group, working on this specification, has received a significant amount of comments. Some of those were rooted in implementations and the difficulties encountered therein; some came from potential authors who asked for further simplifications. Also, the announcement of schema.org had an important effect: indeed, this initiative drew attention on the importance of structured data in Web pages, which also raised further questions on the usability of RDFa for that usage pattern This came to the fore even more forcefully at the workshop organized by the stakeholders of schema.org in Mountain View. A new task force on the relationships of RDFa and microdata has been set up at W3C; beyond looking at the relationship of these two syntaxes, that task force also raised a number of issues on RDFa 1.1. These issues have been, by and large, accepted and handled by the Working Group (and reflected in the new drafts).

What does this mean for the new drafts? The bottom line: there have been some fundamental changes in RDFa 1.1. For example, profiles, introduced in earlier releases of RDFa 1.1, have been removed due to implementation challenges; however, management of vocabularies have acquired an optional feature that helps vocabulary authors to “bind” their vocabularies to other vocabularies, without introducing an extra burden on authors (see another blog for more details). Another long-standing issue was whether RDFa should include a syntax for ordered lists; this has been done now (see the same blog for further details).

A more recent important change concerns the usage of @property and @rel. Although usage of these attributes for RDF savy authors was never a real problem (the former is for the creation of literal objects, whereas the latter is for URI references), they have proven to be a major obstacle for ‘lambda’ HTML authors. This issue came up quite forcefully at the schema.org workshop in Mountain View, too. After a long technical discussion in the group, the new version reduces the usage difference between the two significantly. Essentially, if, on the same element, @property is present together with, say, @href or @resource, and @rel or @rev is not present, a URI reference is generated as an object of the triple. I.e., when used on a, say, <link> or <a> element, @property  behaves exactly like @rel. It turns out that this usage pattern is so widespread that it covers most of the important use cases for authors. The new version of the RDFa 1.1 Primer (as well as the RDFa 1.1 Core, actually) has a number of examples that show these. There are also some other changes related to the behaviour of @typeof in relations to @property; please consult the specification for these.

The publication of RDFa 1.1 Lite was also a very important step. This defines a “sub-set” of the RDFa attributes that can serve as a guideline for HTML authors to express simple structured data in HTML without bothering about more complex features. This is the subset of RDFa that schema.org will “accept”,  as an alternative to the microdata, as a possible syntax for schema.org vocabularies. (There are some examples on how some schema.org example look like in RDFa 1.1 Lite on a different blog.) In some sense, RDFa 1.1 Lite can be considered like the equivalent of microdata, except that it leaves the door open for more complex vocabulary usage, mixture with different vocabularies, etc. (The HTML Task Force will publish soon a more detailed comparison of the different syntaxes.)

So here is, roughly, where we are today. The recent publications by the W3C RDFWA Working Group have, as I said, ”officialized” all the changes that were discussed since spring. The group decided not to publish a Last Call Working Draft, because the last few weeks’ of work on the HTML Task Force may reveal some new requirements; if not, the last round of publications will follow soon.

And what about implementations? Well, my “shadow” implementation of the RDFa distiller (which also includes a separate “validator” service) incorporates all the latest changes. I also added a new feature a few weeks ago, namely the possibility to serialize the output in JSON-LD (although this has become outdated a few days ago, due to some changes in JSON-LD…). I am not sure of the exact status of Gregg Kellogg’s RDF Distiller, but, knowing him, it is either already in line with the latest drafts or it is only a matter of a few days to be so. And there are surely more around that I do not know about.

This last series of publications have provided a nice closure for a busy RDFa year. I guess the only thing now is to wish everyone a Merry Christmas, a peaceful and happy Hanukkah, or other festivities you honor at this time of the year.  In any case, a very happy New Year!

Enhanced by Zemanta

November 7, 2011

November 2, 2011

Some notes on ISWC2011…

The 10th International Semantic Web Conference (ISWC2011) took place in Bonn last week. Others have already blogged on the conference in a more systematic way (see, for example, Juan Sequeda’s series on semanticweb.com); there is no reason to repeat that. Just a few more personal impression, with the obvious caveat that I may have missed interesting papers or presentations, and the ones I picked here are also the results of my personal bias… So, in no particular order:

Zhishi.me is the outcome of the work of a group from the APEX lab in Shanghai and Southeast University: it is, in some ways, the Chinese DBPedia. “In some ways” because it is actually a mixture of three different Chinese, community driven encyclopedia, namely the Chinese Wikipedia, Baidu Baike and Hudong Baike. I am not sure of the exact numbers, but the combined dataset is probably a bit bigger than DBpedia. The goal of Zhishi.me is to act as a “seed” and a hub for Chinese linked open data contributions, just like DBpedia did and does for the LOD in general.

It is great stuff indeed. I do have one concern (which, hopefully, is only a matter of presentation, i.e., may be a misunderstanding on my side). Although zhishi.me is linked to non-Chinese datasets (DBPedia and others), the paper talks about a “Chinese Linked Open Data (COLD)”, as if this was something different, something separate. As a non-English speaker myself I can fully appreciate the issues of language and culture differences but I would nevertheless hate to see the Chinese community develop a parallel LOD, instead of being an integral part of the the LOD as a whole. Again, I hope this is just a misunderstanding!

There were a number of ontology or RDF graph visualization presentations, for example from the University of Southampton team (“Connecting the Dots”), on the first results of an exploration done by a Magnus Stuhr and his friends in Norway, called LODWheel (the latter was actually at the COLD2011 Workshop), or another one from a mixed team, led by Enrico Motta, on a visualization plugin to the NeOn toolkit called KC-Viz. I have downloaded the latter, and have played a bit with it already, but I haven’t had the time to have a really informed conclusion on it yet. Nevertheless, KC-Viz was interesting for me for a different reason. The basic idea of the tool is to use some sort of an importance metric attached to each node in the class hierarchy and direct the visualization based on that metric. It was reminiscent to some work I did in my previous life on graph visualization, though the metric was different, the graph was only a tree, the visualization approach was different, but nevertheless, there was a similar feel to it… Gosh, that was a long time ago!

The paper of John Howse et al. on visualizing ontologies was also interesting. Interesting because different: the idea is a systematic usage of Euler diagrams to visualize class hierarchies combined with some sort of a visual language for the presentation of property restrictions. In my experience property restrictions is a very difficult (maybe the most difficult?) OWL concept to understand without a logic background; any tool, visual or otherwise, that helps teaching and explaining this can be very important. Whether John’s visual language is the one I am not sure yet, but it may well be. I will consider using it the next time I give a tutorial…

I was impressed by the paper of Gong Cheng and his friends from Nanjing, “Empirical Study of Vocabulary Relatedness…”. Analyzing the results of a search engine (in this case Falcons) to draw conclusion on the nature, the usage, the mutual relationship, etc., of vocabularies is very important indeed. We need empirical results, bound to real life usage. This is not the first work in this direction (see, for example, the work of Ghazvinia et al, from ISWC2009), but there is still much to do. Which reminds me of some much smaller scale work Giovanni, Péter and I didon determining the top vocabulary prefixes for the purpose of the RDFa 1.1 initial context (we used to call it default profile back then). I should probably try to talk to the Nanjing team to merge with their results!

I think the vision paper of Marcus Cobden and his friends (again at the COLD2011 Workshop) on a “Research Agenda for Linked Closed Data” is worth noting. Although not necessarily earthshaking, the fact that we can and we should speak about Linked Closed Data alongside Linked Open Data is important if we want the Semantic Web to be adopted and used by the enterprise world as well. One of the main issue, which is not really addressed frequently enough (although there have been some papers published here and there) is access control. Who has the right to access data? Who has the right to access a particular ontology or rule set that may lead to the deduction of new relationships? What are the licensing requirements, how do we express them? I do not think our community has a full answer to these. B.t.w., W3C organizes a Workshop concentrating on the enterprise usage of Linked Data in December…

Speaking about research agenda… I really liked Frank van Harmelen’s keynote on the second day of the conference. His approach was fresh, and the question he asked was different: essentially, after 10 or more years of research in the Semantic Web area, can we derive some “higher level” laws that describe and govern this area of research? I will not repeat all the laws that he proposed, it is better to look his Web with the HTML version of his slides. The ones that is worth repeating again and again are that “Factual knowledge is a graph”, “Terminological knowledge is a hierarchy”, and “Terminological knowledge is much smaller than the factual knowledge”. Why are these important? To quote from his keynote slides:

  1. traditionally, KR has focussed on small and very intricate sets of axioms: a bunch of universally quantified complex sentences
  2. but now it turns out that much of our knowledge comes in the form of very large but shallow sets of axioms.
  3. lots of the knowledge is in the ground facts, (not in the quantified formula’s)

Which is important to remember when planning future work and activities. “Reasoning”, usually, happens on a huge set of ground facts in a graph, with a shallow hierarchy of terminology…

I was a little bit disappointed by the Linked Science Workshop; probably because I had wrong expectations. I was expecting a workshop looking at how Linked Data in general can help in the renewal of the scientific publication process as a whole (a bit along the lines of the Force11 work on improving the future of scholarly communication). Instead, the workshop was more on how different scientific fields use linked data for their work. Somehow the event was unfocussed for me…

As in some previous years, I was again part of the jury for the Semantic Web Challenge. It was interesting how our own expectations have changed over the years. What was really a wow! a few years ago, has become so natural that we are not excited any more. Which is of course a good thing, it shows that the field is maturing further, but we may need some sort of a Semantic Web Super-Challenge to be really excited again. That being said, the winners of the challenge really did impressive works, I do not want to give the impression of being negative about them… It is just that I was missing that “Wow”.

Finally, I was at one session of the industrial track, which was a bit disappointing. If we wanted to to show the research community that the Semantic Web technologies are really used by industry, then the session did not really make a good job on that. With one exception, and a huge one at it: the presentation of Yahoo! (beware, the link is to a PowerPoint slidedeck). It seems that Yahoo! is building an internal infrastructure based on what they call “Web of Objects”, by regrouping pieces of knowledge in a graph-like fashion. By using internal vocabularies (superset of schema.org) and using the underlying graph infrastructure they aim at regrouping similar or identical knowledge pieces harvested on the Web. I am sure we will hear more about this.

Yes, it was a full week…

Enhanced by Zemanta

May 17, 2011

HTTP Protocol for RDF Stores

Filed under: Semantic Web,Work Related — Ivan Herman @ 9:43
Tags: , ,

Last week the W3C SPARQL Working Group has published a number last call working drafts for SPARQL 1.1. Much have been already said on various fora on the new features of SPARQL 1.1, like update, entailment regimes, property paths; I will not repeat here. But I think it is worthwhile calling attention on one of the documents that may not be seen as a “core” SPARQL query language document, namely the Graph Store HTTP Protocol.

Indeed, this document stands a little bit apart. Instead of adding to the query (and now also update) language, it concentrates on how the HTTP protocol should be used in conjunction with graph stores. I.e., what is the meaning of the well known HTTP verbs like PUT, GET, POST, or DELETE  for graph stores, what should be the response codes, etc. It is important to emphasize that this HTTP behaviour is not bound to SPARQL endpoints; instead, it is valid for any Web sites that serve as a graph store. This could include, for example, a Web site simply storing a number of RDF graphs with minimal services to get or change the content of those. (In this respect, this document is closer to, e.g., the Atom Publishing Protocol which includes similar features for ATOM data, and which also plays an important role for technologies like, for example, OData.) Because such setups, i.e., “just” stores of RDF graphs without a SPARQL endpoint, are fairly frequent, it is important to have these HTTP details set. So… worth looking at this document and send feedbacks to the Working Group! (Use the public-sparql-dev@w3.org mailing list for comments.)

Enhanced by Zemanta

April 20, 2011

RDFa 1.1 Primer (draft)

Filed under: Semantic Web,Work Related — Ivan Herman @ 10:21
Tags: , , ,

I have had several posts in the past on the new features of RDFa 1.1 and where it adds functionalities to RDFa 1.0. The Working Group has just published a first draft for an RDFa 1.1 Primer, which gives an introduction to RDFa. We did have such a primer already for RDFa, but the new version has been updated in the spirit of RDFa 1.1… Check it out if you are interested in RDFa!

April 18, 2011

Open data from Fukushima

This is just an extended tweet… Masahide Kanzaki has just posted an announcement on the LOD mailing list on releasing some data he collected on the radioactivity levels on different places in Japan, enriched with metadata (e.g., geo data or time). Though the original data were in PDF, the results are integrated in RDF with a SPARQL endpoint. He also added some visualization endpoint that gives a simple visualization of the SPARQL query results:

Visualization results for radioactivity data for Tokyo and Fukushima, using integrated datasets and SPARQL query

Simple but effective, and makes the point on the usage of open data in RDF… Thanks!

April 9, 2011

Announcement on rNews

Filed under: Semantic Web,Work Related — Ivan Herman @ 6:38
Tags: , ,
Semantic Web Bus / Bandwagon

Image by dullhunk via Flickr

A few days ago IPTC published a press release on rNews: “Standard draft for embedding metadata in online news”. This is, potentially, a huge thing for Linked Data and the Semantic Web. Without going into too much technical details (no reason to repeat what is on the IPTC pages on rNews, you can look it up there) what this means is that, potentially, all major online news services on the globe, from the Associated Press to the AFP, or from the New York Times to the Süddeutsche Zeitung, will have have their news items enriched with metadata, and this metadata will be expressed in RDFa. In other words, the news items will be usable, by extracting RDF, as part of any Semantic Web applications, can be mashed up with other types of data easily, etc. In short, news item will become a major part of the Semantic Web landscape with the extra specificity to be an extremely dynamic set of data that is renewed every day. That is exciting!

Of course, it will take some time to get there, but we should realize that IPTC is the major standard setting body in the news publishing world. I.e., rNews has a major chance to be largely adopted. It is time for the Semantic Web community to pay attention…

Enhanced by Zemanta
Next Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 2,317 other followers