Ivan’s private site

May 12, 2012

The fallacy of scientific publications…

Filed under: Social aspects,Work Related — Ivan Herman @ 14:08
Tags:
Image of Bólyai's original manuscript for non-Euclidean geometry

János Bólyai’s seminal work on non-Euclidean geometry was published as an “Appendix” to his father’s mathematical textbook. Would hardly be considered as an academic publication today…

Yesterday a colleague in the UK, Jeni Tennison, published a great blog on her site. The title is probably very much unclear for the non-initiated (“Using “Punning” to Answer httpRange-14”) and the details are not of relevance for now. Suffices it to say that she touches on one of the “permathread” discussions that regularly rages on the various technical mailing lists related to Semantic Web. Jeni’s blog offers a very clear explanation of the problem and offers a way forward.

Apart from the technical content I was wondering: would that blog ever be considered as part of Jeni’s academic achievements if she was working at an academic institution? And the answer is, sadly, a clear “no”. “No”, because she “just” wrote it is a personal communication, and she did not go through the time consuming road of “official” publications in a journal or a conference. ”No”, because she does not have formal scientific references, “just” references to mailing lists, wiki pages and the like. ”No”, because the blog was not officially peer reviewed; alas! the fact that she had very long and discussions on some of her ideas on public mailing list with some of the best known experts in the field does not count. “No”, in spite of the fact that, if her ideas are accepted by the community (which is, of course, in no way sure), these would influence the technical direction for the work of hundreds of people, as well as practically deployment of systems, software, etc; at the minimum, there will be dozens if not hundreds of reactions and references to this blog in the days and weeks to come. I can easily make the bet that her piece will have a greater influence in the advancement of a particular area of science and technology than many of the hundreds of academically high valued papers that are published this year.

Is this Jeni’s loss? If she is to pursue an academic career then, of course it is. But it is a much greater loss for science that ignores such intellectual achievements by keeping to its outdated scholarly commutation rules. In fact, it shows that science may have to go back to its old traditions of communication: after all, in the good old times, many of the greatest achievements of science were first published as personal letters or journals. Something have been lost…

(If you are interested in these issues, you may consider looking at the Force11 Community’s Web site and the Force11 Manifesto… that community will, hopefully, evolve significantly in the months to come.)

April 24, 2012

Moved my RDFa/microdata python modules to github

Filed under: Code,Python,Work Related — Ivan Herman @ 12:37
Tags: , , , ,

In case you were using/downloading my python module for RDFa 1.1 or for Microdata: I have now moved away from the old CVS repository site, and moved to GitHub. The two modules are on the RDFLib/pyrdfa3 and RDFLib/pymicrodata repositories, respectively. Both of these modules are more or less final (there are still some testings happening for RDFa, but not much left) and I am just happy if others chime in in the future of these modules.

Although part of the RDFLib project on GitHub, the two modules are pretty much independent of the core RDFLib library, although they are built on top of it. I hope that, with the help of people who know the RDFLib internal structures better, both modules can become, eventually, part of the core. But this may take some time…

April 20, 2012

How does Watson work?

Filed under: Work Related — Ivan Herman @ 10:14
Tags: , , ,

A partially revealed Jeopardy! Round board in ...

I was at Chris Welty’s keynote yesterday at the WWW2012 Conference. His talk was on Jeopardy/Watson and, although this is not the first time I heard/saw something on Watson, some things really became clear only at his keynote. Namely: what is really the central paradigm that made the question answering mechanism so successful in the case of Watson?

Well… query answering in Watson is not some sort of a deterministic algorithm that turns a natural language question into a query into a huge set of data. This approach does not work. Instead, a question is analyzed and, based on search in various set of data, a large set of possible answers is extracted. These “candidate” answers are analyzed separately along a whole series of different dimensions (geographical or temporal dimensions, or, which I found the most interesting, putting back candidate answers into the original question and search that again against various sources of information to rank them again). The result is a vector of numerical values representing the results of the analysis along those different dimensions. That “vector” is summed up into one final value using a weight values for each dimension. The weights themselves are obtained through a prior training process (in this case using a number of stored Jeopardy question/answers). Finally, the answer with the highest value (I presume over a certain threshold value) is returned.

I hope I got it right:-). But the mechanism is certainly something like that. And it is interesting: it is different from the traditional question/answer approaches which is, usually, much more “deterministic”. This is some sort of a new computing paradigm (not necessarily invented by the Watson team, but used by them). Is it a really important new paradigm? Well… to quote Chris: “We won!”.

Enhanced by Zemanta

April 18, 2012

Structured Data in HTML in the mainstream

Filed under: Semantic Web,Work Related — Ivan Herman @ 8:31
Tags: , ,

As referred to in my previous blog on LDOW2012, Hannes Hühleisen and Chris Bizer, but also Peter Mika and Tim Potter, published some findings on structured data in HTML based on Web Crawl results and analysis. Both Hannes’ and Peter’ papers are now on line. Hannes and Chris based their results on CommonCrawl, whereas Peter and Tim rely on Bing.

Although there are some controversies as for the usability of these crawls as well as the interpretation of their results (see Martin Hepp’s mail, and the answer by Peter Mika as well as the resulting thread on the mailing list) I think what is really important is the big picture which emerges from both set of results: no one can reasonably dispute the importance of structured data in HTML any more. Although I vividly remember a time when this was was a matter of bitter discussions, I think we can put this issue behind us now. I do not think I can summarize it better than Peter did in another of his emails:

…both studies confirm that the Semantic Web, and in particular metadata in HTML, is taking on in major ways thanks to the efforts of Facebook, the sponsors of schema.org and many other individuals and organizations. Comparing to our previous numbers, for example we see a five-fold increase in RDFa usage with 25% of webpages containing RDFa data (including OGP), and over 7% of web pages containing microdata. These are incredibly impressive numbers, which illustrate that this part of the Semantic Web has gone mainstream.

April 17, 2012

Linked Data on the Web Workshop, Lyon

(See the Workshop’s home page for details.)

The LDOW20** series have become more than workshops; they are really a small conferences. I did not count the number of participants (the meeting room had a fairly odd shape which made it a bit difficult) but I think it was largely over a hundred. Nice to see…

The usual caveat applies for my notes below: I am selective here with some papers which is no judgement on any other paper at the workshop. These are just some of my thoughts jotted down…

Giuseppe Rizzo made a presentation related to all the tools we know have to tag texts and thereby being able to use these resources in linked data (“NERD meets NIF: Lifting NLP Extraction Results to the Linked Data Cloud”), i.e., the Zemanta or Open Calais services of this World. As these services become more and more important, having a clear view of what they can do, how one can use them individually or together, etc., is essential. Their project, called NERD, will become an important source for this community, bookmark that page:-)

Jun Zhao made a presentation (“Towards Interoperable Provenance Publication on the Linked Data Web”) essentially on the work of the W3C Provenance Working Group. I was pleased to see and listen to this presentation: I believe the outcome of that group is very important for this community and, having played a role in the creation of that group, I am anxious to see it succeed. B.t.w., a new round of publication coming from that group should happen very soon, watch the news…

Another presentation, namely Arnaud Le Hors’ on “Using read/write Linked Data for Application Integration — Towards a Linked Data Basic Profile” was also closely related to W3C work. Arnaud and his colleagues (at IBM) came to this community after a long journey working on application integration; think, e.g., of systems managing software updates and error management. These systems are fundamentally data oriented and IBM has embarked into a Linked Data based approach (after having tried others). The particularity of this approach is to stay very “low” level, insofar as they use only basic HTTP protocol reading and writing RDF data. This approach seems to strike chord at a number of other companies (Elsevier, EMC, Oracle, Nokia) and their work form the basis of a new W3C Working Group that should be started this coming summer. This work may become a significant element of palette of technologies around Linked Data.

Luca Costabello talked about Access Control, Linked Data, and Mobile (“Linked Data Access Goes Mobile: Context-Aware Authorization for Graph Stores”). Although Luca emphasized that their solution is not a complete solution for Linked Data access control issues in general, it may become an important contribution in that area nevertheless. Their approach is to modify SPARQL queries “on-the-fly” by including access control clauses; for that purpose, an access control ontology (S4AC) has been developed and used. One issue is: how would that work with a purely HTTP level read/write Linked Data Web, like the one Arnaud is talking about? Answer: we do not know yet:-)

Igor Popov concentrated on user interface issues (“Interacting with the Web of Data through a Web of Inter-connected Lenses”): how to develop a framework whereby data-oriented applications can cooperate quickly, so that lambda users could explore data, switching easily to applications that are well adapted to a particular dataset, and without being forced to use complicated programming or use too “geeky” tools. This is still an alpha level work, but their site-in-development, called Mashpoint is a place to watch. There are (still) not enough work on user-facing data exploration tools, I was pleased to see this one…

What is the dynamics of Linked Data? How does it change? This is the question Tobias Käfer and his friends try to answer in future (“Towards a Dynamic Linked Data Observatory”). For that, data is necessary, and Tobias’ presentation was on how to determine what collection of resources to regularly watch and measure. The plan is to produce a snapshot of the data once a week for a year; the hope is that based on this collected data we will learn more about the overall evolution of linked data. I am really curious to see the results of that. One more reason to be at LDOW2013:-)

Tobias’ presentation has an important connection to the last presentation of the day, made by Axel Polleres (OWL: Yet to arrive on the Web of Data?) insofar as what he presented was based on the analysis of the Linked Data out there. The issue has been around, with lots of controversy, for a while: what level of OWL should/could be used for Linked Data? OWL 2 as a whole seems to be too complex for the amount of data we are talking about, both in terms of program efficiency and in terms of conceptually complexity for end users. OWL 2 has defined a much simpler profile, called OWL 2 RL, which does have some traction but may be still too complex, e.g., for implementations. Axel and his friends analyzed the usage of OWL statements out there, and also established some criteria on what type of rules should be used to make OWL processing really efficient; their result is another profile called OWL LD. It is largely a subset of OWL 2 RL, though it does adopt some datatypes that OWL 2 RL does not have.

There are some features that are left out of OWL 2 RL which I am not fully convinced of; after all their measurement was based on data in 2011, and it is difficult to say how much time it takes for new OWL 2 features to really catch up. I think that keys and property chains should/could be really useful on the Linked Data, and can be managed by rule engines, too. So the jury is still out on this, but it would be good to find a way to stabilize this at some point and see the LD crowd look at OWL (i.e., the subset of OWL) more positively. Of course, another approach would be to concentrate on an easy way to encode Rules into RDF which might make this discussion moot in a certain sense; one of the things we have not succeeded to do yet:-(

The day ended by a panel, on which I also participated; I would let others judge whether the panel was good or not. However, the panel was preceded by a presentation of Chris on the current deployment of RDFa and microdata which was really interesting. (His slides will be on the workshop’s page soon.) The deployment of RDFa, microdata, and microformats has become really strong now; structured data in HTML is a well established approach out there. RDFa and microdata covers now half of the cases, the other half being microformats, which seems to indicate a clear shift towards RDFa/microdata, ie, a more syntax oriented approach (with a clear mapping to RDF). Microdata is used almost exclusively with schema.org vocabularies (which is to be expected) whereas RDFa makes use of a larger palette of various other vocabularies. All these were to be expected, but it is nice to see being reflected in collected data.

It was a great event. Chris, Tim, and Tom: thanks!

March 31, 2012

Political decency (or the lack of it)

Filed under: Hungary,Private — Ivan Herman @ 12:05
Tags: , ,

Here is the story. A high profile politician in a democratic country has a PhD. This also means that he also proudly displays the “Dr” as part of his official name; indeed we are talking about a country where it is the tradition to use that title, and this usually highly respected by society at large.

However, a problem occurs. The rumor is that the PhD has been tainted by plagiarism, i.e., that a substantial part of the PhD thesis is not original work, but had been copied verbatim (though possibly translated if the original was in another language) from other scholarly works. In academic circles this is not considered acceptable; the high standards of academic publishing, let that be a thesis or an average publication, requires the published work to be original. To be blunt: the politician in question is accused of having cheated by transgressing those standards.

Because this is a high profile person, this issue is taken seriously, further investigation follows and it turns out that the rumors are indeed well funded. As a result, the University, that has originally issued the PhD, strips our politician from his title.

How does that affect our politician? Well, you think you have heard this story if you follow the news: indeed you may thing of the (former) defense minister of Germany, Karl-Theodor zu Guttenberg, whose PhD title has been annulled by the University of Bayreuth. Mr. zu Guttenberg (and not Dr. zu Guttenberg any more) has done the only thing a politician of his stature should do: he resigned. A decent choice in a decent, democratic country.

However… not all politicians are equally decent. The very same story happened with the current president, no less, of Hungary, Mr. (and not Dr!) Pál Schmitt. Rumors on plagiarism, public inquiry… and the Semmelweis University of Budapest annulled his PhD because the rumors were indeed well funded.  Does he resign? No. He sees no reason to quit. Indecent choice in a, hopefully, still decent and democratic country, but with an increasingly indecent political leadership.

A shame.

Enhanced by Zemanta

March 16, 2012

Old quote on the UN, also valid for the EU…

Filed under: Private — Ivan Herman @ 21:01
Tags:

I found a nice quote in book written by a Dutch writer called Geert Mak on the European Union (“De hond van Tišma”). The quote is attributed to Dag Hammarskjöld, the second secretary general of the UN. The quote is as follows:

“It was not created to bring us to heaven, but to save us from hell.”

This quote is highly relevant for the European Union, too. Unfortunately, it seems that many of the politicians all around Europe, from Hungary to France and from the Netherlands to Greece,  forget that, turning the EU into the scapegoat for all our problems. There are of course many issues with the EU that should be solved, but it would be wise to remember the reasons for its creation, and not to forget the history of the past 50 years and beyond…

Enhanced by Zemanta

January 24, 2012

Nice reading on Semantic Search

I had a great time reading a paper on Semantic Search[1]. Although the paper is on the details of a specific Semantic Web search engine (DERI’s SWSE), I was reading it as somebody not really familiar with all the intricate details of such a search engine setup and operation (i.e., I would not dare to give an opinion on whether the choice taken by this group is better or worse than the ones taken by the developers of other engines) and wanting to gain a good image of what is happening in general. And, for that purpose, this paper was really interesting and instructive. It is long (cca. 50 pages), i.e., I did not even try to understand everything at my first reading, but it did give a great overall impression of what is going on.

One of the “associations” I had, maybe somewhat surprisingly, is with another paper I read lately, namely a report on basic profiles for Linked Data[2]. In that paper Nally et al. look at what “subsets” of current Semantic Web specifications could be defined, as “profiles”, for the purpose of publishing and using Linked Data. This was also a general topic at a W3C Workshop on Linked Data Patterns at the end of last year (see also the final report of the event) and it is not a secret that W3C is considering setting up a relevant Working Group in the near future. Well, the experiences of an engine like SWSE might come very handy here. For example, SWSE uses a subset of the OWL 2 RL Profile for inferencing; that may be a good input for a possible Linked Data profile (although the differences are really minor, if one looks at the appendix of the paper that lists the rule sets the engine uses). The idea of “Authoritative Reasoning” is also interesting and possibly relevant; that approach makes a lot of pragmatic sense, I wonder whether this is not something that should be, somehow, documented for a general use. And I am sure there are more: In general, analyzing the experiences of major Semantic Web search engines on handling Linked Data might provide a great set of input for such pragmatic work.

I was also wondering about a very different issue. A great deal of work had to be done in SWSE on the proper handling of owl:sameAs. On the other hand, one of the recurring discussions on various mailing list and elsewhere is on whether the usage of this property is semantically o.k. or not (see, e.g., [3]). A possible alternative would be to define (beyond owl:sameAs) a set of properties borrowed from the SKOS Recommendation, like closeMatch, exactMatch, broadMatch, etc. It is almost trivial to generalize these SKOS properties for the general case but, reading this paper, I was wondering: what effect would such predicates have on search? Would it make it more complicated or, in fact, would such predicates make the life of search engines easier by providing “hints” that could be used for the user interface? Or both? Or is it already too late, because the ubiquitous usage of owl:sameAs is already so prevalent that it is not worth touching that stuff? I do not have a clear answer at this moment…

Thanks to the authors!

  1. A. Hogan, et al., “€œSearching and Browsing Linked Data with SWSE: the Semantic Web Search Engine”€, Journal of Web Semantics, vol. 4, no. December, pp. 365-401, 2011.
  2. M. Nally and S. Speicher, “Toward a Basic Profile for Linked Data”, IBM developersWork, 2011.
  3. H. Halpin, et al. “When owl:sameAs Isn’t the Same: An Analysis of Identity in Linked Data”, Proceedings of the International Semantic Web Conference, pp. 305-320, 2010

December 30, 2011

Mac OS Lion: The Good, the Bad and the Ugly

Filed under: Mac,Private — Ivan Herman @ 12:33
Tags: , , ,

The poster of the 'The Good the bad and the ugly' MovieI have made use of the winter recess to install Mac’s Lion on my powerbook. I must admit I hesitated for a while (I was not sure that it was worth the trouble) but then, partially driven by sheer  curiosity, I did it. And, as usual, there are pros and cons… Maybe others will find my experiences useful.

1. The Good

My tactic of waiting, i.e., not to install Lion when it was still a cub, paid off. I have seen many stories on the Web, mostly dated back in July, about installation difficulties (e.g., issues about the installation of Xcode). Well, none of these for me. It installed easily, relatively quickly (after download, the installation process was about an hour, with an additional round with the installation of Xcode). Most of the things worked without further ado, although I did have to update some programs (e.g., iTunes, Safari, mercurial, some additional tools for Mail like GPG or Mail Act-On). But these were to be expected and otherwise the system worked smoothly. For example, my local apache server started and worked as before, in contrast to the stories I saw on the Web. There were also some user interface adjustments I had to make (sorry Apple, I do not like the “natural” scrolling, and I also like to have the scrollbar always on), but the web is full of references to the necessary tricks to do these.

The system is faster. Not hugely, but faster in booting, in logging in, and also some applications, like Safari, got some speed improvements. That is always a welcome feature!

I quite like Mission Control. I used “Places” on Snow Leopard, but mission control is nicer, and works well with the full-screen feature. B.t.w., the full screen feature is also great.

I use Mail App as my primary mailer and there are (as far as I am concerned) two major improvements. On the one hand, it has a nice “conversation” feature; the particular aspect I like is that it manages conversations and “related” mails across mail folders (and I have loads of them) regardless of the fact that I use IMAP. This is great. The other nice feature is the improved search, both in speed and in the various options it gives you. Mail is my everyday workhorse, so such improvements made the upgrade to Lion already worthwhile.

I love the fact that, at last, I can resize my windows easily. I change screens often (I have an external screen at home, another one at my institute, and they are different in size…) and the fact that, on Snow Leopard, I had to grab the lower right hand corner of a window to resize it was really a drag.

At this moment I am not at my usual place, meaning I am without an external screen; I can just refer to what I read, namely that handling external screens became smoother in Lion, too. I hope that is true, the old way of closing, restarting, whatnot, was also a pain.

There are a number of additional small improvements (e.g., better spellcheck in Safari; really helpful as I write these lines:-). I am sure I will find out more as it goes.

2. The Bad

Of course, not everything is nice and rosy:-(

I miserably failed with iCloud. I tried to use it to synchronize my iPhone and iPad easily with my Mac. It simply did not work reliably as far as the calendar was concerned. I regularly ran into the problem of adding an event to my calendar on, say, my iPhone, and the result was not visible anywhere else (I tried explicit synchronization when it was clear how to do it, wait for half an hour, etc; no success). I tried it through the built-in calendar application on the iPhone (which I do not particularly like, b.t.w.) as well as some other calendar apps, to no avail. After a while I just gave up, and reversed back to my previous self, i.e., using iTunes’ synchronization. Taking into account that, with IOS 5, one can also sync from iTunes over the Wireless, it is so easy to synchronize that it does not really bother me. It is, nevertheless, surprising that Apple comes out with such a much heralded feature that simply does not work properly.

I did run into some awkwardness in the user interface of the Mail App, too. For example, one would think that this application is a prime candidate to be used full screen. However, beware: if you reply to a mail in full screen mode, you cannot switch windows (e.g., you cannot reply to two mails in parallel, stuff like that) which might make it awkward. In a sense it is understandable, but it was a surprise nevertheless. Another issue is with the conversation feature: I display my mails with increasing date order but, within a conversation, Mail keeps on using decreasing dates; I have not found a way to change that…

And then there is Launchpad. Having it is a great idea, in fact. If set up properly, it gives you an easy way to get to applications, it reduces the size of the Dock (which can be an issue on a small screen), etc. If set up properly, that is. But… I did run into several issues. Some examples:

  • At the start I saw loads of duplicate entries. This is because I organized my Application collection to my own taste before, with subdirectories, aliases, etc; I have too many applications to leave them as a flat list. This led to a bunch of duplicates. Which is understandable, but it is fairly difficult to remove application from Launchpad: although the “official” version is that one can do the same as on an iPhone (pressing an icon, and using a big X on it), but this method did not work for most of the applications. (No idea why.) Fortunately, I have found a program called Launchpad Control, which can do that for you (thank you, Andreas Ganske!)
  • There are missing entries. Hence the big question: how does one add an application to Launchpad? Answer: no idea. I have seen proposals on the Web (e.g., move the application’s icon on top of the Launchpad icon on the Dock or create alias and put it to ~/Applications): none worked for me (Maybe if I restart? I did logged out and in again, that did not change, and I did not want to restart the computer only for this.) For the time being, I gave up on that.
  • Launchpad is the typical case of an application that asks for a keyboard shortcut to start. I have found, after all, a way to do it; but does it have to be that complicated? (Actually, I saw some notes on the Web that the keyboard shortcut will disappear after reboot. I hope that will not be the case…)

Bottom-line: although I will use Launchpad, probably, it is not what it should be. Hopefully later releases will improve this.

3. The Ugly

No new item here, just a remark: it is really surprising to me that Apple would come out with such unfinished products like iCloud or Launchpad. It is perfectly o.k. to come out with Lion, add these programs in the state they are in, and make it clear to people that this is work in progress. Everybody would understand that. But doing it this way simply reduces the credibility of Apple… Pity.

December 20, 2011

“Hungary’s Constitutional Revolution”–a sad example

Filed under: Hungary,Private — Ivan Herman @ 12:32
Tags: ,

Kim Lane Scheppele published an analysis in the New York Times on “Hungary’s Constitutional Revolution”. A, in my view, very good, and fairly depressing analysis of the current situation in Hungary. How can a country possibly slide into some sort of authoritarianism dominated by one single ideological view, following a path that is perfectly “legal” (though morally objectionable) at every step of the way. A sad example:-(

Enhanced by Zemanta
« Previous PageNext Page »

Theme: Rubric. Blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,021 other followers