Ivan’s private site

January 18, 2014

Some W3C Documents in EPUB3

Filed under: Code,Digital Publishing,Python,Work Related — Ivan Herman @ 13:04
Tags: , ,

I have been having fun the past few months, when I had some time, with a tool to convert official W3C publications (primarily Recommendations) into EPUB3. Apart from the fact that this helped me to dive into some details of the EPUB3 Specification, I think the result might actually be useful. Indeed, it often happens that a W3C Recommendation consists, in fact, of several different publications. This means that just archiving one single file is not enough if, for example, you want to have those documents off line. On the other hand, EPUB3 is perfect for this; one creates an eBook contains all constituent publications as “chapters”. Yep, EPUB3 as complex archiving tool:-)

The Python tool (which is available in github) has now reached a fairly stable state, and it works well for documents that have been produced by Robin Berjon’s great respec tool. I have generated, and put up on the Web, two books for now:

  1. RDFa 1.1, a Recommendation that was published last August (in fact, there was an earlier version of an RDFa 1.1. EPUB book, but that was done largely manually; this one is much better).
  2. JSON-LD, a Recommendation published this week (i.e., 16th of January).

(Needless to say, these books have no formal standing; the authoritative versions are the official documents published as a W3C Technical Report.)

There is also draft version for a much larger book on RDF1.1, consisting of all the RDF 1.1 specifications to come, including all the various serializations (including RDFa and JSON-LD). I say “draft”, because those documents are not yet final (i.e., not yet Recommendations); a final version (with, for example, all the cross-links properly set) will be at that URI when RDF 1.1 becomes a Recommendations (probably in February).

February 22, 2013

Browsers and eBook Readers

Filed under: Digital Publishing,Work Related — Ivan Herman @ 23:02
Tags: , , ,
eBook Readers Galore

eBook Readers Galore (Photo credit: libraryman)

My last week was all around digital publishing: first, I was at the W3C Workshop on eBooks and the Open Web Platform, that I helped to organize. If I extrapolate from the discussions at the W3C Workshop, there are good prospects that this topic will become more important at the W3C, and that it will also keep us busy (in addition to my role on the Semantic Web). By the way, the minutes of the W3C Workshop (both for the 1stand the 2nd days) and the presentations are public; a somewhat more detailed workshop report should also be available soon.

The Workshop was followed by O’Reilly’s Tools of Change (TOC) conference: a first time for me. And it was extremely interesting to find myself in a new environment where I have never been before. I have seen some great keynotes (e.g., Mark Waid’s on “Reinventing Comics And Graphic Novels For Digital”, or Maria Popova’s, from Brain Pickings), learned a lot at some of the session (for example, at Bill Rosenblatt’s session on some of the legal aspects surrounding eBooks).

My interest in this whole area is, primarily, on how digital publishing in general, and electronic books in particular, relate to technologies developed at W3C. For those of you who may not realize that: if an electronic book uses the ePUB standard (and more and more books do) than the book is, in fact, a “frozen” Web site (depending on the ePUB version either based on XHTML1 or HTML5). Technically, it is a zip file containing all the files necessary to render the content, plus some ePUB specific files to manage table of content, to help readers to display the content even more quickly, etc. Actually, as far as I know, most of the ePUB readers are based on the same core technology as many of the Web browsers, namely Webkit). The strong relationship between publishing in general, and eBooks in particular, was emphasized several times at the conference, especially by the keynote of Jeff Jaffe, the CEO of W3C.

But then… if so, why do we need separate eBook readers, either in hardware or in software? (Let us put aside for now the issue of DRM, vendor lock-in, etc; these are of course reasons but let us hope the business will evolve towards a more open environment where those issues will be less relevant.) Do we really need a separate ePUB reader software on, say, my iPad, or should we simply rely on the browsers taking care of ePUB files either directly or through some extensions? (There is, for example, a project called Readium to add such capabilities to Chrome.) And the answer is not obvious, there are proponents of both approaches. My 2 cents here is: it is not a core technology issue, but a user experience and interface one. Reading a book, electronic or otherwise, is a different intellectual activity than an average Web page. Here are some differences that I feel are important, and I am sure there are more, much more:

  • A book must be available off-line; this is, actually, its natural state. This difference is obvious, but worth noting: for example, the user interface for books has to be able to list what is and what is not available at a given moment (all readers have some sort of an imitation of a traditional bookshelf).
  • The amount of “information” you want to absorb is different. A typical Web page is not terribly long; even the more detailed Wikipedia articles, when printed, are rarely longer than 4-5 pages. Compare that to an average book that may be hundreds or even thousands of pages. What this mean in practice is that, whereas a Web page is usually read, understood, “absorbed” in one go, reading a single book may take several days or weeks. This has all kinds of consequences on how one navigates, uses traditional bookmarks (not the ones browsers usually provide, i.e., to store URL-s, but what used to be bookmarks in the past), tables of content, indexes, glossaries, etc. These features are essential for books but much less so for an average Web page.
  • Modern Web pages have more and more interactive features, they are related to various social sites like Twitter or Facebook; very often these pages are Web applications with very complex features (think of gmail, for example). Obviously, browsers have to be prepared for a high level of interactivity and have to be optimized to offer an optimal user experience. Books are much less interactive. Although newer generations of books may include some level of interactivity, and these are important for, say, the educational book markets or for children’s books, but it is a far cry compared to what Web sites do. Also, some readers (like Kobo’s) try to include some level of Social Web facilities (sharing information about books with friends, that sort of things); to be honest, I never found those social features interesting or important (o.k., I may just be old-skool). Reading a book for me remains a linear reading activity, whether it is a fiction, poetry, history, or politics. I want my eBook reader to optimize on that, and avoid distractions.
  • There are some features that a good eBook reader should offer and browsers do not traditionally do. A prime example is annotation facilities. Many people like to scribble on their books, underline full sentences, highlight words; I still have not found any tools to do that properly in a Web browser, although all the eBook readers that I have tested so far have such functionality. This is a typical user interface difference that comes from different demands. (Another example that comes to my mind is a quick on access to a dictionary, to an encyclopedia, etc.)
  • Some sort of a payment/right management system must be part of the reader. I personally consider the current DRM system, as used in the eBook world as fundamentally broken insofar as may drive people away from this market. However, I recognise that something should be available that allows authors of books to get some reward for their work. Whether that is some sort of a watermarking, social DRM, or whatever, I do not know, but something is needed, and the reader environment has to handle this.

I realize, of course, that this is a continuum: with ePUB3 we have the ability to make eBooks much more interactive, possibly with scripts, multimedia, etc.; in effect, electronic books are becoming more and more like Web applications. I.e., some of these differences may disappear or become less important. Nevertheless, I believe there will always be a difference in user expectations, in the emphasis that a software (or hardware) may have. eBook readers are not browsers, although electronic books are, in fact, part of the Web just like other types of Web contents. Is it a sign that we may need a more diverse landscape of accessing the Web than we have today?

The Rubric Theme. Create a free website or blog at WordPress.com.

Follow

Get every new post delivered to your Inbox.

Join 3,616 other followers