Opera CTO on Microsoft Office and Open Document Format — E-books using XHTML+CSS

Picture of Håkon Wium LieToday on CNET, Opera’s CTO Håkon Wium Lie (portrait at right) critically looks at both Microsoft’s Office Open XML and the Open Document Format. He describes both as essentially “memory dumps with angle brackets.” Instead, he believes that the better way is to build upon the long-universal standards of XHTML and CSS. To demonstrate this, he and co-author Bert Bos wrote a book (published by Addison-Wesley Professional) entirely mastered in XHTML and CSS3, using the powerful Prince application for direct conversion of the master to PDF for the print book edition. An example chapter from their book is available online: one view optimized for online reading, and one view optimized for online printing. Both versions are derived from essentially the same XHTML document as used in the high-quality Prince-generated print version, showing the flexibility of this approach. (Also look at this beautifully styled online article written by Bert and Håkon.) The future of e-books? The ramifications to the future of the digital publication industry (including e-books) are significant. The reason is that this approach begins to merge the two worlds of online reading using web browsers with the traditional world of fixed-page formats primarily intended for print. It is now possible to reach both worlds from the same XML-based source because of the new CSS3 properties.

And before someone brings up the “Tower of eBabel,” this is exactly the road already taken by both IDPF’s Open eBook Publication Structure (OEBPS) and the similar OpenReader Format. Both of these compatible e-book formats take the XHTML+CSS approach advocated by Håkon Lie. The only difference is that both of these formats add a special document (OEBPS “Package” and OpenReader “Binder”) to enable certain useful e-book features (including some important for accessibility) which are otherwise kludgy to enable using the bare-bones XHTML web content approach.

In essence, both OEBPS and OpenReader can be described as “turbocharged XHTML+CSS.” They thus fit into Håkon’s vision.

The last piece just fell into place: the Generalized Container Format

Kangaroo with a joeyThe IDPF OEBPS Container Format (IDPF/OCF) was released last September to define a portable container to conveniently distribute OEBPS Publications as e-books. The recently announced draft Generalized Container Format (GCF) builds upon and generalizes the excellent foundation established by IDPF/OCF. We now have an e-book optimized means to distribute e-books formatted as web content (XHTML+CSS) for use by web browsers. Because GCF is ZIP-based, it will be almost trivial for most web browsers on most platforms to support it.

Certainly, OEBPS and OpenReader enable a superior e-book reading experience compared to ordinary web content, but the release of the first OEBPS/OpenReader capable readers are still a few months off (OSoft’s interactivity-capable dotReader and Adobe’s Digital Editions), while standards-capable web browsers are ubiquitous, on almost every handheld (including cellphone), laptop, and desktop computing device in the world.

This suggests an intriguing step-by-step strategy to topple the Tower of eBabel: start off with bare (X)HTML+CSS contained and distributed in GCF. Then naturally evolve web browsers (in addition to the specialized readers like dotReader) to support the turbocharged OEBPS/OpenReader formats. Current web browsers like Opera and Firefox could support (anytime they want) the OEBPS/OpenReader formats with pretty much the same quality of presentation planned by Adobe’s Digital Editions — they are tantalizingly close!

We live in interesting times.

The author of this article, Jon Noring, is VP of Development for DigitalPulp Publishing.

5 Comments on Opera CTO on Microsoft Office and Open Document Format — E-books using XHTML+CSS

  1. Great post, Jon,
    HTML is such an obvious basis for eBooks it’s amazing that anything else was even considered. If I understand the package concept, this could include graphics, allowing off-line reading which is clearly desirable (anyone who flies needs off-line reading, at a minimum). HTML allows flowing to automatically meet the needs of different sized displays (great to have the same book in the same format on my PC, my Jornada, and my Palm), and if zip-based packages are used, high compression can be supported with HTML, although this is less of an issue than it was a decade ago when most access was dialup.

    Rob Preece
    Publisher, http://www.BooksForABuck.com

  2. The ‘Index’ to the Bos/Lie book is electronically generated and shows dramatically the weakness of this approach – even for a document of four pages! Words like ‘rule’ which make sense in context become meaningless when extracted to an ‘index’ of this kind. And where are the entries for ‘Bach’ and ‘CSS Specifications’? It would be more accurate and kinder to call it a ‘list of words italicised in the text’ – but who would want one of those anyway?

    Electronic compositing may be ready for prime time. Electronic indexing is not.

    Jon.

  3. The reason is that this approach begins to merge the two worlds of online reading using web browsers with the traditional world of fixed-page formats primarily intended for print. It is now possible to reach both worlds from the same XML-based source because of the new CSS3 properties.

    It has long been possible to create documents for different media by applying transformations to a single source format. The reason this hasn’t been done before is not because the vapourware kings haven’t come up with the right format until now, but because it is rarely useful.

  4. Jon, a very interesting post.

    The Wikipublisher project takes an open source approach to this problem. It uses the PmWiki engine to transform wiki markup into print-oriented XML, then a typesetting server transforms this into LaTeX and PDF. This means one typesetting server can support many wiki web sites and produce typeset versions of the pages on request, on-the-fly. A reader can customise the look of the printed output, including choice of chapter styles, font sets, watermark, paper size, and so on. You can typeset individual pages, pre-defined page collections, and search results. It supports typical print artefacts such as table of contents, list of figures, list of tables, page number cross-references, long tables with running heads, footnotes, floating images and tables, and so on.

    It creates a book-like structure for long multi-page collections by combining nested page lists and page headings into chapter / section / subsection / … / subparagraph. In this model, the web page and print page are both created on-the-fly from the wiki source. It would, I think, be relatively straightforward to transform XHTML into print XML, giving an open source print option for content marked up as XHTML.

    For more information, see http://www.wikipublisher.org.

  5. John, thanks for your information. There’s really a multitude of ways to go from XHTML to print, and this is one more.

Leave a comment

Your email address will not be published.

*



wordpress analytics