Ivan’s private site

December 29, 2009

Stories of a move (from WindowsXP to Mac) Part II

Filed under: Private, Work Related — Ivan Herman @ 13:08
Tags: , ,

A few days ago I wrote a note on my move from Windows XP to Mac. I received some comments since, and have also discovered some additional tricks; maybe it is helpful if I write a follow-up… It serves as a set of notes for myself but, if it is useful for anybody else, all the better!

First of all, as one of the comments by Daniel Grace to my previous note made me understand, I could have used the installation DVD I got with my machine to install Xcode, instead of getting on the Web for that. My impression was that this DVD is there only when one has to re-install Snow Leopard, hence I did not really consider this. My bad, I could have saved some transfer time…

So here are some other smallish tricks and discoveries

  • You can recycle a bunch of hardware goodies that you might have had for your PC:
    • My mouse works out of the box and, in contrary to the popular belief, the right click also works automatically. Nothing to do…
    • I also have a cheap Logitech USB keyboard: just plugged it in and it worked. The ‘Windows’ key (the one with the Windows Logo, ironically) maps to the most typical Mac key, the one with this curved symbol and usually referred to as ‘Cmd’; the ‘Alt’ and the ‘Ctrl’ are simply reused.
    • Actually… the default keyboard setup, though works, is not ideal. There are indeed some unfortunate small differences in the physical layouts (I use a US keyboard): the horizontal order of the ‘Cmd’ and ‘Alt’ keys are reversed compared to the laptop‘s own keyboard and the ‘~’ character and the ‘±’ characters are also switched. Disagreeable, because one‘s fingers get messed up. But the flexibility of the Mac comes at your rescue for the command keys. Indeed: go to the Keyboard setup in the system preferences, click on the “modifier keys”, be careful to choose the right keyboard on the top menu, and change the setting. I switched the command and option keys and, voilà! it is exactly like on the laptop.
    • I also switched the default setup for the laptop’s keyboard so that the function keys would behave, by default, like the external keyboard’s function key (instead of the built-in facilities like dimming the screen). This helps my fingers remembering the right usage… If you begin to use things like Exposé (most of us have already seen Mac users displaying a small version of all windows on the screen to switch quickly among them, that is the one!) it is good to have the same keyboard setup than on your external keyboard. You can do that change in the ‘Keyboard’ setup panel, too.
    • I also have a small Polycom® Communicator C100 that I use for Skype: although the Polycom site claims that it is usable with Windows only, that is actually not true. I plugged it in and it works. The only thing you cannot do is to start up Skype using the button on the device. Big deal.
    • One difference, though, that cannot really be handled: the PC-s, usually, have two slots for headphones, ie, one for listening and one for the microphone. If you want to use a headphone on the Mac for Skype, for example, you will have to invest into a separate headphone with USB. Which is unfortunate because the Mac laptop has only two USB slots, which is not much these days. That being said, the mike of the laptop itself may be good enough, in which case any headphone will do for listening.
  • I need accented characters, plus some other special characters like quote marks or ellipses. Most West-European characters (e.g., for French or German) are available using a two-key solution. For example, to type the character ‘ü’, you have to type ‘Alt-u’ then ‘u’. You also have a help tool: go to the Language & Text setting and choose the ”Keyboard & Character Viewer”. You should also click the ”Show input menu in menu bar”. You will get a symbol on the upper right hand corner of the screen and you can then get a virtual keyboard on your screen which shows what you have to type. The rest is just trying and getting used to. B.t.w., you can also add other keyboard types; e.g., I checked the Hungarian keyboard, too. What happens is that using the same menu item you can change the keyboard to be Hungarian. Although the physical keyboard remains the same, using the virtual keyboard you can get characters like ‘ű’ or ‘ő’. A bit convoluted (better use a real Hungarian keyboard for this case) but can be helpful in some cases.
  • The Mac user interface, e.g., the Finder, is the land of keyboard shortcuts. It will take many weeks before I get used to all of them. If you do have David Pogue‘s book, keep the relevant appendix under your pillow. The possibilities in getting around in the Finder are rich and well worth getting used to.
  • One of the tiny goodies: if you want to have your shell windows’ title to show the directory you are in, add this to your profile (I use bash, so it is in ‘.bash_profile’:
    PROMPT_COMMAND='echo -ne "\033]0;${PWD/#$HOME/~}\007"'

    A bit cryptic, but it works… (thanks to Carine and Coralie).

  • As Karl said in his comment, some GNU software, that are usually installed on a Linux box or with cygwin (e.g., wget) does not come installed. But downloading the source code from the GNU site, going through the configure+makefile dance seems to work. I tried it with wget, although I had to run the configure script with –prefix=$HOME (ie, install the program in my home directory, not into /usr/local). I presume that this is related to the super user vs. administrator account that I noted in my earlier blog.
  • Of course, there are programs that crash or have otherwise strange behavior although, truth must be said, until now I had only problems with Thunderbird (in combination with some extensions) and with Komodo Editor, i.e., not with Mac software (I know, this will come:-). It is therefore good to know about the “Force Quit” menu entry under the Apple menu (upper left hand corner). It gives you an easy way to shoot a program.
  • OpenOffice (or its Apple equivalent, NeoOffice) is fairly easy to find and well documented. I had more difficulties finding LaTeX, but I found, after all MacTeX. It is a fairly standard (though large) Mac distribution and it seems to work (add /usr/texbin to your Path variable if you want to use it from bash).
  • Bluetooth is always black magic. I paired the Mac with my Nokia and (with the extra driver I had to install, see my previous blog) it synchronizes and I can also browse the content of the phone (submenu for the small bluetooth sign on the upper right hand corner), send files to and from. Great. But I also paired it with my EEE PC (runing Linux); I can send files to the EEE PC but not from. No idea why (this worked without problems on Windows).

December 22, 2009

Stories of a move (from WindowsXP to Mac)

Filed under: Private, Work Related — Ivan Herman @ 20:46
Tags: , ,

A few days ago my laptop has changed. After about 13-14 years of Windows usage I decided to take a deep breath and change for a Mac running Snow Leopard. I was never a pure Windows user in the sense that the first thing I always did was to install cygwin to give me a pseudo-Unix environment on Windows (I had used various Unix look alikes for about 15 years before and I still use various linux boxes on and off). Also: I stopped at Windows XP, never used Vista or Windows7 (I have heard that some of the features I found on the Mac are now around on those, too). Finally, I am a computer person, working on and with computers, so I do need some features that the lambda user does not. I thought writing down my journey may be useful for others.

A dear friend and colleague of mine used to say “I know the jungle, and therefore I am afraid of the jungle”; ie, with all the praise you hear about OS 10, I was still a bit weary and expected hiccups. And there were of course small issues, essentially finding the right information; some of my colleagues both at W3C and at CWI were of a great help. And, of course, when you do not find something, there is also a google search, which often yields the answer. And the bottom line of my 3 days’ experience: this jungle is friendly:-)

First of all, the book of David Pogue, “Switching to the Mac: The Missing Manual, Snow Leopard Edition”, was of a great help (as an aside, kudos to O’Reilly that all their books are available as electronic only, too…). That book, plus some chats with my colleague Jack Jansen at CWI gave me some information in advance that may not be absolutely obvious at first. As an example, and in contrast to Windows or Linux, the “thing” you click on when starting a program is not an executable, but a special folder, that carries everything the program needs. This why installing a program, moving it around, etc, becomes so much more easier than on a Windows; no trace of that damn registry that makes re-installation and uninstallation so complex there.

So here are some of the issues I did hit, however (I do not want to spend time on installing, say, Thunderbird. That just goes smoothly and is well documented…)

  • Snow Leopard does not come with CVS installed. Bugger. However, after poking around on the Web, I found out that you have to install the XCode tools from the apple developers’ site. You have to register as a Mac developer (it is free), installation is simple, and it does install CVS. To be honest, I am not sure what else is installed…
  • There is an installed Apache server on the machine (to be precise, Apache 2), but it is fairly well hidden. I expected to find it as a program to be started from the command line (that is the only way I could get it reliably working on Windows for various reasons) but that is not the case. Apple->System Preferences->Sharing gives a bunch of preferences, and you have to check the “Web Sharing” box to start the server (not really obvious, I must say). Then it almost works, except for PHP: luckily, I found a blog item from Kev Chapman that gives details on how this should be done. Essentially, the http.conf should be changed (good I found it because I had my own extra settings to add).
  • Although the machine is mine and I am an administrator, I am not a super user automatically. One has to use, say the sudo commands in some rare cases. I have seen that in Linux, but is unknown to an average Windows XP user… something to get used to.
  • Coming from cygwin I was used to be able to start up an editor for a specific file from bash (it was not always easy to set that up in Windows, but that is another matter). After my queries, a bunch of colleagues at ran to my rescue (thanks to Coralie, Bert, Yves, Carine, Thomas from W3C and Jack from CWI) telling me that the open command can be used to open a file with its default “handler”; even better, it can even be overridden. Eg, to open a file with the Komodo Editor, one can say open -a /Applications/Komodo\ Edit.app fname and off you go.
  • I was of course a bit wary of the old files moving over from the old environment. No real problem. The only slight issue I had was with iTunes: I expected to simply move my sound files, set iTunes to take that as its library. Nop. You have to import the sound files to the local iTunes set up. No big deal, just takes a bit of time with the 40+GB of music I have on my disc. All other moves were just a piece of cake from my external PC hard discs.
  • At first my Nokia E90  did not synchronize with iCal and Address out of the box. Thanks to Thomas I found out that one has to install an extra driver from the Nokia site and then it works.
  • The only failure: my old, HP printer+scanner does not work as a scanner (although it works without problems as a printer). Unfortunately, Snow Leopard has scrapped this old, 10 year old model from its list. Nothing I can do about it. A little investment to come…
  • It took me a while to find out how to use the Mac with an external display and only the external display (eg, with the lid of the Mac closed). After a while (and poking around the Web) I found out: you set up the external display with mirror (that is relatively straightforward), then you close the lid (ie, the Mac goes to sleep) then you, say, hit a key on the external keyboard, put something into the USB slot, or something similar. Ie, you wake the system up with the lid closed; it will use the external screen. I found that a bit convoluted (maybe there is a better way), this is usually a matter of a function key on Windows…

Of course, I had to install a bunch of extra software. This is largely a matter of taste, though, not really of a major interest here. Many programs (Komodo Editor, Mendeley desktop, Skype, browsers like Opera or Firefox, mailer like Thunderbird) have a version for both Windows and Mac, so that was an easy choice for now. I found Colloquy as an IRC client; it seems to work well. I found the ease of the backup system (TimeMachine) remarkable; backup has always been such a complicated issue on Windows…

Many people told me that once you have transferred to a Mac, you do not look back. I cannot say that yet,  of course, but the transfer has been remarkably smooth.  Maybe these notes will make it even easier for others…

(I have received some useful comments since the original version of this blog. In case you face the same transition problem as I did and you read this blog, make sure to read the comments! See also a continuation blog…)

December 12, 2009

RDFa usage spreading…

Filed under: Semantic Web, Work Related — Ivan Herman @ 14:53
Tags: ,

It may be that I was not attentive enough, ie, some of these may be old(er) news. But I did hit two interesting RDFa related news yesterday and today (both via twitter, b.t.w.) that I think are really noteworthy.

1. A blog from Priyank Mohan “Online retail : How is using Semantic Technology to define a new trend” reported about a talk given by Jay Myers from Best Buy. Best Buy started using RDFa a while ago already, but they recently added statements using the GoodRelations Ontology that Martin Hepp published. What Jay said (quoting from Priyank’s blog here):

  1. GoodRelations + RDFa improved the rank of the respective pages in Google tremendously… In fact, if you try the query “BestBuy Ferris Bueller” on Google, then the page comes on rank # 1 ahead of the much more established page . This indicates a strong effect of GoodRelations + RDFa on Google’s appreciation of a page.…
  2. Jay also reported a 30 % percent (!) increase in traffic on the BestBuy stores pages
  3. Yahoo observes a 15% increase in the Click-through-Rate (CTR). Nick Cox from Yahoo also recently reported that augmented search results, e.g. those with GoodRelations / RDFa in Yahoo get a 15 % higher Click-through-Rate (CTR).

There has been some discussions on twitter whether those numbers (eg, 30%) are really reliable, and maybe these statements are indeed too good to be fully true. But even if the 30% is only 15%, it is still quite an achievement!

2. This morning I found out that O’Reilly has begun to systemically add RDFa to their catalog pages. Eg, the page on the “Switching to the Mac” book can produce the RDF information using the RDFa distiller. Note the code uses well established vocabularies: Core FRBR, GoodRelations, Foaf, Dublin Core… ie, using this data with other mashup sites become much easier!

Great news. And, by the way, it worth noting that both also relate to Martin’s GoodRelations Ontology. That stuff is really coming to the fore, too…

December 6, 2009

LOD and the top 10 SW products 2009

Filed under: Semantic Web, Work Related — Ivan Herman @ 11:19
Tags: ,

Richard MacManus has just published the “Top 10 Semantic Web products of 2009” (see part I and part II) in ReadWriteWeb. What I found interesting on that list is to see that products have been included that  are related to, and are using, the output of the Linked Open Data project: Open Calais, Zemanta, BBC’s Semantic Music project, Freebase, DBPedia, Data.gov. (Ok, listing DBPedia as a “product” may not be absolutely right, but, well…).

Why is this interesting? Because one of the negative comments that one hears sometimes (often?), related to the LOD, is that this is nothing more than an academic exercise, ie,  it does not make any sense for business. Well, here we are!

November 12, 2009

Pay to be free…

Filed under: Social aspects, Work Related — Ivan Herman @ 17:00
Tags:

I may not be well informed, so this may be a known approach for some of you, but it is the first time I see this…

There has been a tension between (scientific) publishers and authors for a while on whether one is allowed to put one’s publication on the Web. When dealing with traditional publishers the author usually gives away his/her copyright and the papers are rarely available on the Web (which is a source of constant frustrations to readers). Fortunately, this is not always the case; for example, the proceedings of the World Wide Web conference series are published by ACM, but the papers are nevertheless available on the Web for free (thanks to IW3C2).

Well, a counter-proposal from a publisher is quite amazing. A Hungarian publisher, Akadémiai Kiadó, offers authors a deal, called the “Optional Open Article”: if you pay the nice sum of 900€, then the paper is also put onto an on line edition and is made freely available on the Web. (The fact that it is then freely available is clear in the agreement posted on the web site). Pay for your freedom. Isn’t this wonderful?

And, to make it clear: this is a very prestigious publisher in Hungary, is related to the Hungarian Academy of Sciences and, therefore, the prime publishers locally of Hungarian scientists…

I find it appalling.  But this may only be me.

November 2, 2009

Promise hold (NYT and the LOD)

Filed under: Semantic Web, Work Related — Ivan Herman @ 11:41
Tags: ,

I was at the SemTech conference in June when Evan Sandhaus from the New Your Times gave a keynote and when he announced that the NYT would gradually publish many of their data as Linked Data using Semantic Web technologies. Unfortunately, I had to leave on the last day of the ISWC2009 last week when they announced to keep their promise, and release the first 5,000 subject headings tags to the LOD. Which is really great news.

I remember Evan saying in Santa Clara (maybe privately, I do not remember that detail) that they are newcomers in this area, and it will be difficult to get it right (and, well, there are bugs, as, for example, Eric Hellman or Richard Cyganiak pointed out in their respective blogs). But I think we should really applaud when such a promise is held…

October 30, 2009

ISWC2009 4-5

Filed under: Semantic Web, Work Related — Ivan Herman @ 0:59
Tags: , , , , , ,

Fourth day

Shame on me, but I missed the morning keynote… I was a bit late arriving to the conference site and I got stuck in a conversation at breakfast. Things happen…

The most notable event in the morning, at least for me, was the SPARQL WG panel. All members of the Working Group (me included) were on the panel and the room was full. I mean, full, people were standing in the back. And I regard that as a success by itself, it shows not only the overall importance of SPARQL, but the real interest around the new version, ie, SPARQL 1.1 (in case you have missed it, the first working draft has just been published a few days ago). Lee Feigenbaum (co-chair of the group) gave a quick overview of the new features and then questions came.

The difficulty of the SPARQL 1.1 work is that it has to find a balance between what is realistic to standardize in a relatively short time frame and what could be good to see in a new query language. As a consequence, there are features that the community has discussed but have not made it into the document, or only in a simple format. That came up during the discussion but I had the impression that the audience, by and large, understood this balance. Actually, for some, the set of new features were even too much for an efficient implementation. I have the feeling that  the WG will have to publish a separate conformance document (a bit like OWL 2 has), because there is a certain confusion on whether a conforming SPARQL implementation will have to implement, say, update or inference regimes or not. That clearly came up through the questions. Anyway, remember one email address (yes, it is a bit of a mouthful): public-rdf-dawg-comments@w3.org this is where comments have to be sent on SPARQL 1.1!

I chaired a session on the use track in the afternoon.  The paper of Daniel Elenius et al on reasoning about resources (for military exercises) was interesting to me because it was based on reasoning with relatively large OWL ontologies plus rules. The OWL ’side’ was not very complex (Daniel referred to DLP, today I would say probably OWL 2 RL) but extended with extra rules. What this shows that when RIF will be finished and published, the combination of OWL with RIF may become very important for tons of practical applications. (As an aside, a nice little joke from Daniel: what is the system used by the military today when planning for exercises? The system is called BOGSAT. It stands for ‘Bunch Of Guys Sitting Around a Table’…)

Roland Stuhmer gave a very different style presentation on how user events (clicks, combination of clicks, etc) can be collected, categorized, and integrated into an application, analyzed with some rules for, eg, targeted ads. The system is based on harvesting not only the structure of the Web page, but annotations appearing in the Web page via RDFa. The result is an RDF structure describing the events that can be sent to a server, analyzed locally, distributed, etc. Nice usage of RDFa, but also important to have a Javascript API that can retrieve the RDF triplets from the RDFa structure attached to a specific node. (B.t.w., the old graphics standards of the 80’s and 90’s, called GKS or PHIGS, had notions of combined event structures with different event types. I do not remember all the details any more, but may it be worth looking at those again in a modern setting?)

Personally, the highlight of the day was the presentation of the semantic web challenge finalists. I was member of the jury, which meant that I had to review the submissions in advance and we had two very enjoyable discussions with the rest of the jury on the submissions. We had the first selection the day before, and this time all finalists gave their presentations and demos. And it was a tough task to choose (that is why we had such long discussions:-) because, well, the submissions were great overall. I do not really want to analyze each of the entries; I do not think it would be appropriate for me in this position. But the winner entry for the challenge, namely TrialX, really made a great impression on me. In short, the application is a consumer-centric tool through which patients can find matching clinical trials where they want to participate; it also helps those who organize those trials, etc. It is some sort of a matchmaking tool using all kinds of medical ontologies and vocabularies, public health record data and the like. We should realize the importance of this: here is a great Semantic Web application, winner of the challenge, which is really an application, not only demonstration, already deployed on the Web (soon as an iPhone app, too), and, to be a bit dramatic, may (and possibly has already) save lives. What else to we want as a proof that this technology is not only an academic exercise any more?

Fifth day

Only a partial day for me, as far as the conference goes, because I had to fly out before the end… But I could listen to the last keynote of the conference, ie, that of Nova Spivack.

Not surprisingly, Nova talked about Twine-2, a.k.a. T2. I did not really know what T2 was to be, I only heard that Twine, ie, T1, is moribund. As Nova acknowledged, it is too complicated, it is too hard for users to really figure it out; in fact, most of the users used it for search. Which is not the strongest feature of T1 in the first place.

So T2 is (well, will be) all about semantically backed search. It semantically indexes the Web, with an attempt to extract semantic information from the pages. The user interface would then be some sort of, essentially, faceted interface that would automatically classify the search hit results into different tabs; the user can use these tabs, drill down along other categories, etc. So far nothing radically new, though the user interface Nova showed was indeed very clean and nice. All this is done, internally, via vocabularies/ontologies, using RDF, RDFS, or OWL.

The interesting aspect of T2 (at least as far as I am concerned) is the incorporation of collective knowledge. First of all, T2 will include a system whereby users can add vocabularies that T2 will use in categorization. Users can get back those ontologies in OWL/RDF, they can improve them, etc. The other tool they will provide is a means to help semantically index pages that are, by themselves, not semantically annotated. This can be done via a Firefox extension; users can identify parts of the web pages (I presume, essentially, the DOM nodes) and associate these with classes of specific ontologies. The extension produces an XSLT transformation that can be sent back to the T2 system. Some social mechanism should of course be set up (eg, webmasters annotating their own pages should get a higher priority than third party annotators) but, essentially, it is some sort of a GRDDL transformation by proxy: T2 will have information on how to find transformation to semantically index specific pages without requiring the modification of the pages themselves (in contrast to GRDDL where such transformation is to be referred to from the page itself).

Of course, the system was a bit controversial in this community; indeed, it was not clear whether T2 would make use of the semantic information that do exist in pages already (microformats, RDFa, …) let alone the Linked Open Data information that is already out there. When asked, Nova did not seem to give a clear answer though, to be fair, he did not specifically say no and he also said that the semantic index might be put back to the public in the form of linked data. To be decided. It is also not fully clear whether those proxy-GRDDL transformations would be available for the community at large (hopefully the answer is yes…). It will be interesting to see how it plays out (T2 comes out in beta sometimes early 2010). Certainly a project to keep an eye on.

From a slightly more general point of view it is also interesting to note that two out of the three Semantic Challenge winners are also semantic search engines with different user interfaces (though sig.ma and VisiNav definitely do use the LOD cloud, no question there…). Definitely an area on the move!

I had the time and, frankly, the energy to really listen to only one more paper in the regular track, namely the paper on functions of RDF language elements, by Bernhard Schandl. A nice idea: imagine a traditional spreadsheet, where each cell is a collection of resources from an RDF Graph, or functions that can manipulate those resources (extract information, produce new set of resources, etc). Just like a spreadsheet, if you modify the underlying graph, ie, the resources in a cell, everything is automatically recalculated. Because, just like for a spreadsheet, a function can refer to the result of another function in another cell, one can do fairly complicated transformation and information extraction quite easily. Neat idea, to be tried out from their site.

That is it for ISWC2009. I obviously missed a lot of papers, partly because social life and hallway conversations sometimes had the upper hand, and sometimes simply because there were too many parallel sessions. But it was definitely an enriching week… See you all, hopefully, at ISWC2010, in Shanghai!

October 28, 2009

ISWC2009 2-3

Second day

In fact, there is much less to say… In the morning I was on two workshops; I was at the Uncertainty Reasoning on the SW one for a while, but then I was asked to participate at a panel at the Semantics for the Rest of Us one, so I had to switch. This was a bit unfortunate, because I could not really ‘dive in’ to any of the two. And my afternoon was taken up by ‘networking’, catching up with some people on many many issues that are not worth blogging (yet?).

I listened to Kathryn Laskey’s presentation on how to combine probability theory in the mathematical sense (the good old Kolmogorov axiomatic theory on probability that I learned at university in a distant past…) with first order logic. I cannot claim to have really understood all the details but it made me curious enough to put reading her paper on my to do list…

As for the panel “Little vs Large Semantics: What’s next for the Semantic Web languages?”, with Leigh, Kendall and Ora on the panel besides me… it was not that exciting, I must admit. Maybe the main message I take away from it was the passionate request of Chris Welty to re-open RDF (see also Pat’s keynote below!).

Third day (well, first real conference day)

Preamble: I would have wanted to add links to papers. And I couldn’t: I have not found the papers on the Web. Neither on Springer’s site nor elsewhere. I may have missed a reference somewhere, if somebody knows then tell me. But if the papers are not available, I think it is a shame…

The conference began with a keynote of Pat Hayes. Entertaining and also thought provoking; Pat is a great speaker. What really interested me is his talk on ‘RDF Redux’; I was actually anxious to listen to that one at SemTech last June but he had to call this off back then. So he repeated it here. This is typically the kind of talk that needs more thinking afterwards to understand it (and Pat has promised to write it down!), but he essentially proposed to re-think and re-do some of the fundamentals of RDF semantics. Instead of set-based model theory which we have today, and which makes the treatment of b-nodes, shall we say, a bit complicated (some would use harsher words:-) we should consider RDF graphs as ‘things’ on a ’surface’ (think of it as a real surface on a sheet of paper) and b-nodes are just ’scratches on that  surface’. (A bit like ‘context’ of a graph?) Because these surfaces are different from one graph to the other, when a merge occurs then in fact a new surface is created where the unified graph is put, and the issue of b-nodes becomes natural (instead of the ‘renaming’ procedure that the current semantics document describes). Pat claims that the whole semantics could be re-written that way and none of the current RDF implementations would change. But one can go one step further: there may be different kinds of surfaces (eg, negations) and surfaces can have a name (a bit like named graphs) and all can be put together to provide a powerful semantics for these entities. His further claim was that such an extended semantics of RDF could be powerful enough to describe, conceptually, RDFS or even OWL, ie, the semantics should not be layered any more.

No way I would accept all this argumentation on face value:-), so I have to think about this and, mainly, read whatever Pat may want to write down to understand it. In the meantime, I may have to look into the concepts of conceptual graphs, and the Peircian notation of logic that Pat referred to as inspiration…

A more general take away (see also Chris’ remark above): maybe it is time to look into RDF again? A scary thought. Touching to something that is fundamental on the SW has to be done with extreme care… We will see.

There were two papers in the same session that were very close in subject and topic: one of Jesse Weaver and Jim Hendler on the parallel materialization of RDFS graphs and the one of Jacopo Urbani et al on using MapReduce for RDFS reasoning. (Sigh…, this is where I would like to put a reference!) Both aimed at similar challenges, namely the materialization of RDFS inference results of a graph using parallel computing methods. And there was one more similarity: both had some sort of a classification of the rules in the rule set described in the RDF Semantics document to help improving the processing. (Eg, to analyze which rules should be duplicated among processing nodes and which one can be handled without, or which one need a special treatment for a map-reduce pair). It seems that it would be worthwhile to see if some of these classifications (‘ontology rules’ and the like) could be extended to OWL 2 RL (Jesse Weaver told me afterwards that they want to look into this).  But, to put things into perspective: we are the points when billions of triples can be expanded with relative ease. Who would have thought a few years ago? There was also a remark on one of Jesse’s slide (I do not remember the exact wording) which said that RDFS is insanely parallelizable:-)  It was a really interesting session.

The SW in use session included  a paper from Landong Zuo et al “Supporting multi-view network analysis to understand company value chains”.  Integrating a bunch of data in the UK on companies, integrating them in an RDF store, and let users get information on the ‘value chains’, ie, how companies relate to one another as producers/consumers. Technically, the interesting point was the fact that users had the possibility to interactively add new relationships, new classifications to the system, essentially new rules that could be evaluated. The whole system seemed to be a really cool, a well engineered and well functioning machinery. As the speaker put it, although all conclusions drawn from the system could be found by the users by analyzing databases, but it would take weeks to do what this system can give them in a few minutes. This is exactly the kind of message we need for the outside  world about the usefulness of Semantic Web technologies.

On another session Martin Szomszor presented an experiment they conducted at the ESWC conference, combining RFID-based personal badges with an underlying SW system. The resulting system could be used to show personal contacts among delegates, could help people find others with similar interest, could retrace later whom one met at what point (“I remember talking to that chap, but I do not remember his name!”), etc. Lots of privacy issues, for example, but I would have liked to see that in practice, that is for sure!

Stéphane Corlosquet’s presentation on SW and Drupal was really exciting. I already knew about the plans of Drupal 7 to incorporate RDF management from the start, that all Drupal 7 pages will be annotated via RDFa. The RDFa community has been  fairly excited about that for a while now. But the work done by Stéphane and others provide some additional modules that makes it easy to add a SPARQL endpoint to a Drupal based site easily, to import other RDF content, or to manage the vocabularies used on the pages and the like. They already have such a system running with the current Drupal, but these modules will become part of the standard Drupal 7 module set that one can download from the drupal site. And that is cool.  It significantly lowers the barrier to build Web sites that are prepared to be part of the Linked Data cloud, even if the system administrators are not SW experts. I expect this to open up quite a lot of possibilities…

Off to the next day! More paper and the presentation of the Semantic Web challenge finalists…

October 26, 2009

ISWC2009 I.

20091026046This year’s ISWC is held in Chantilly, Virginia. In a nice conference building in a beautiful park with autumn colours that, for reasons I do not really know, is always much more striking and amazing in America than in Europe. It is a bit of a pity that it is so far from Washington but, well, you can’t get it all…

First day: tutorials.

(For me, because there were also a bunch of workshops.) In the morning I was at the tutorial on how to consume Linked Open Data, by Juan Sequeda, Jammie Taylor, Patrick Sinclair, and Olaf Hartig; in the afternoon I went to the one on legal and social frameworks for sharing data on the Web, by Leigh Dodds, Jordan Hatcher, Tom Heath, and Kaitlin Thaney.

Juan and his  friends had actually a difficult task, and that became clear right at the start during the intro of Juan: part of the audience did not really know what LOD was all about, whereas there were also others who were, shall we say, old timers on the subject. I think the speakers did a really good job in navigating through these constraints, making short introductions to what LOD is all about but talking about issues and showing examples that were interesting for all of us. Kudos to that. Issues were raised by the audience that were really to-the-point (who should create sameAs,  links, how trustworthy are they, how to choose vocabularies and how they map to one another, etc) and, in his closing slides, Juan actually gave a list of the open  R&D issues in LOD. Worth looking at those (and no reason to repeat the list here…). B.t.w., the slides of the tutorial are on line.

One very interesting technology I heard about that, shame on me, but I did know was a tool based on a traversal based execution scheme for SPARQL called sqin.  Olaf did a presentation on that. What essentially happens is  as follows. At the beginning the default graph of the SPARQL query is empty. However, the system would systematically fetch RDF triples by dereferencing URI-s in the query pattern, adding those to the default graph. The query is matched against it, some variable will match thereby ‘adding’ new URIs to the pattern. And the process starts again, possibly yielding a complete solution (or more) to the original query. At the end of the process, solutions will be found on the Web, even if the system itself does not have any ‘real’ data behind it at the start. Of course, no one can secure that all solutions will be found, and you need some ’seed’ URI-s in the original query pattern, but it nevertheless looks like a very powerful tool to explore, say, the LOD.  Very interesting!

Then there were some examples on how LOD is used. Jammie talked about Freebase, and how Freebase is, in fact, a way for everybody to easily add information to the LOD (after all, Freebase works like a wiki, and all the data is reflected on the LOD).  He also had a very important message that is worth repeating (go to his slides for the rest): it takes very little effort to add a republishing capability to your triples store based application, thereby extending the general LOD. So… do it! This is how the system evolves…

Patrick described a quite geeky system that the BBC folks have developed (hopefully will become public soon): take the BBC’s musical data in RDF (which is available), plus the LOD cloud, plus… an IRC bot. What you get is an IRC channel which will pick up data on music, including the sound tracks, photos, etc, and display it on the machine. I presume you  can give orders and preferences through the IRC. Obviously a geeky stuff not for the masses:-) but shows what you can do…

The afternoon tutorial on the Legal and Social frameworks was of course very different. I think one of the many, but maybe the most important aspect of this tutorial is that… it took place! This may sound a bit strange but it is important for all our community to realize that we will have issues around copyright, licensing, waivers, etc, when it comes to the Web of Data, whether we like these issue or not. Tutorials like this, written notes and information, etc, are essential. Let us face it: most of us do not understand the details of the legal issues. So I was simply listening and trying to absorb what I heard…

I do not want to repeat the details of what I heard here; one thing I learned over the years is that I should leave legal argumentations and descriptions to those who really understand that. Ie, look at the slides. It is worth it. But just to show the complexities: I did not know or fully realize that there are major differences what can or cannot be copyrighted among countries: for example, a phone book cannot be copyrighted in the US or Europe, but can in Australia. That the seemingly simple notion of ‘attribution’ can, in fact, become an endless pit when it comes to data and the queries thereof (eg, if I have a filter in a query that results in data, should I give an attribution to the fact that were, in fact, filtered out?). Etc.

There is also a takeaway message for me (though it may be quite trivial) among the things I learned. Tom showed some practical examples on how can one add, say, licensing information to data by adding some RDF triples. However, for a larger data set the licensing may be different within the dataset. Eg, if you retrieve data from somewhere, and you enrich it with additional metadata, the metadata itself may have a different licensing (it is yours) than the data that you use (which may have its own licence). What this means is that when you organize your data internally, you should think about the licensing information you will add well in advance: organize your URI-s accordingly, for example. If you don’t, and you want to add license at the end, you might find yourself in trouble! Sounds like a simple message, but it is important. (Reminds me of what accessibility people always say: if you take accessibility issues into account right at the beginning when you build up a Web site, it is not complicated; but if you have to add accessibility features after the facts, it may become hell…)

By the way, Leigh has made a kind of an overview of the current ‘blobs’ on the LOD cloud to see whether any kind of licensing information is available or not. He has an overview of the results in his slides. The main fact is: the majority of data sets has no information whatsoever (or, at least, nothing that can be found in about 10 minutes)…

It was a good day. Looking forward to the rest.

October 16, 2009

Seduce with free services?

Filed under: General, Private, Social aspects, Work Related — Ivan Herman @ 8:01

I ran into this two times in a week. I hope it is just a coincidence…

The story is simple. You find some service on the Web which looks nice and helpful. There are various options: you may take a minimal service, which is free of charge, or you can also choose extra services for a fee. It sounds like a decent choice: if the minimal service fits your needs, you are happy, if you need more, you pay something. I presume we all use services like that.

But then… if you take the free option, you may get a mail after 2-3 years’ of  usage saying that sorry, the free service is discontinued next month; you are welcome to upgrade for the paying service, otherwise, well, good bye. As I said I got this type of mail twice in a week: one from a service giving a minimal synchronization of my phone’s calendar with Google’s, the other providing a simple email certificate for signing my mails. On a matter of principle I will not upgrade; I do not find this approach really acceptable.

So… will Gmail, WordPress, or other similar services decide that they have attracted enough customers, they can now start charging? As I said, I hope this was just a coincidence and not some sort of a general direction…

Next Page »

Blog at WordPress.com.