Jon Noring will tell you OpenReader is the next generation of the Open eBook spec, encompassing many of the proposals to improve its scope and usability made since OEB appeared in 1999.

David Rothman will tell you OpenReader is a consumer-friendly, non-proprietary, single-file format. A universal consumer format is needed in any medium to prevent lock-in to one vendor’s program. This of course is as true of Microsoft’s office applications (countered by the OpenDocument office formats) and Apple’s iTunes (mp3 is something of an alternative), as it is of eReader, Mobipocket, Sony or Gemstar [1] e-book formats.

Are those reasons enough for OpenReader to exist?

PDF and the 1999 Open eBook publication standard have legitimacy that OpenReader lacks, but also many flaws that OpenReader purports to address in unequal-able ways. Hopefully that’s true, if OpenReader moves to a standards organization and achieves its published goals.

But I say that so far OpenReader hasn’t dealt with the biggest issues of openness that underlie its inception. Politically open, anti-proprietary open, these could be dealt with in more than one way. (For instance, expanding OpenDocument to incorporate an e-reading vocabulary.)

If OpenReader will have any use, I believe it must go much farther than currently planned and incorporate these principles:

  • Use one file for reading, distribution, presentation, study. The reader can access and change — re-author — the book. Immutability should not be a goal. Right now, plain text formats like Project Gutenberg and zml, html, xml, OpenDocument and Sophie provide this.
  • Specify one file in non-binary form. If distribution in zip or jar formats is useful because they collect multiple files into a single package, this shouldn’t be the only alternative that OpenReader provides. Michael Hart, who often seems like a prophet crying in the wilderness, has warned us against non-text formats for years and years. We shouldn’t think the evils of inaccessibility stem only from DRM and proprietary formats; binary by itself has long-term consequences.
  • Stipulate APIs that enable plug-ins to work in any e-reader that’s OpenReader-compliant. Yes, the plug-in might have other dependencies — python on your system, for instance — that OR can’t guarantee, but a Chemical ML translator shouldn’t be limited to one e-reader: it should work anywhere an OperReader text can be opened.
  • Allow any XML vocabulary whatsoever. HTML has its purposes; that doesn’t mean it should provide the structure for every text ever to be electronified. Anything an e-reader needs to know for display or accessibility can easily be required. This would open up every text in structured form and almost everything ever written and formatted without losing that information.
  • Open metadata. Whatever metadata the ebook has should be accessible in the e-reader.
  • Make annotations shareable. This doesn’t mean “follow this single standard or else.” It just means it has to be exportable in XML. Highlighting? Bookmarks? Yes — not because I’m giddy with the power to make the e-book world do as I say, but because these can be extremely useful in a class or collaborative environment. Not “Class, turn to page 38,” but “Class, download my bookmarks and jump to bookmark 7.”

I believe OpenReader should think through the implications of what being “open” means — it’s not just the e-book that has to be kept free but your annotations as well.

Why not specify search requirements while we’re at it? Because OpenReader shouldn’t be in the business of specifying features except as they pertain to keeping things open and compatible. (I include requiring interactivity and multimedia support as opening things up.) But specifying sub-pixel rendering? Hyphenation engine? Kerning? Those are not the realm of standards but of feature-table comparisons. You have kerning, you don’t have kerning — that shouldn’t affect compliance with a document format.

What Jon and David have brought to our attention are significant matters. Let’s not lose the focus by trying to legislate e-reader quality, but instead deal with the full implications of what an open e-book format means.


[1] You remember Gemstar, don’t you? There appear to be more abandoned and inaccessible e-books sold by Gemstar, either locked up or lost permanently, than any others on the e-book scene.

8 COMMENTS

  1. Hi, Roger. I’ll let Jon address the details here–other than to express my main concern. The software reader should be for the human readers using it, and that means making sure that the capabilities are present for optimal presentation. Jon may know of situations where the actual document format would affect the end results in ways you and I wouldn’t envison. With OSoft around to raise the bar for other implementers, of course, things such as subpixel font rendering will come whether they’re in specs or not. So I myself would be flexible. About about the possible end results, I won’t be–including those for the print-impaired.

    Meanwhle, if there’s any doubt on the openness of the standards process as I personally envision it, methinks this blog should dispense with ’em 😉 I’m confident Jon would feel the same.

    Cheers,
    David

  2. I think it a waste to say, “Your company is so small that you can’t afford to invent a technology like ClearType, and therefore you can’t be OpenReader compliant,” or some such, when there are bigger issues at stake. That’s what this post is about. And if Jon has some secret knowledge of the sort you attribute to him, please, let him share it.

    As for doubts about the standards process, or the openness of the standards process, I haven’t expressed any here. As for dispensing with them, that would mean to give up those doubts.

    If I thought OEB or PDF, say, were better candidates for what we truly need in an ebook format, I would be pushing to improve them. But I do not think advocating instead for OpenReader inherently means refraining from pointing out its flaws as conceived or managed. Hopefully OpenReader will move from the editorial pages of this blog and other websites and into a standard standards process soon.


    Apologies to early readers. Apparently in adding a rule above the footnote I inadvertently removed the conclusion of this post. I have revised and restored it.

  3. Well, as I said, Roger, even little OSoft has a ClearType equivalent–so the features will indeed come with the territory. That’s why I myself would be flexible on stuff like that.

    > Hopefully OpenReader will move from the editorial pages of this blog and other websites and into a standard standards process soon

    Jon does have mailing lists he’s publicized, but I welcome discussions here as well. Even better, it’ll be great to have a formal process. That’ll come in time. Considering the budget of OpenReader, $0, we’ve come a long way as a true grassroots standards effort. Meanwhile the IDPF continues to get funding from Adobe and the like, and so far the results show it. Our way is harder, and it’s painful not to have the formal infrastructure right now, but we will. Needless to say, we’ll welcome support from companies that want honest standards.

    Thanks,
    David

  4. > Specify one file in non-binary form.

    I’ve said this before, and I’m saying it again: There is no non-binary form. The so called “plain text” is a myth. Inside computers text is always encoded somehow, and since there are many text encoding schemes in use, data can’t be regarded as “just plain text”.

    Mind you, that doesn’t stop idiots from pretending that “plain text” exists. E.g. various linux filesystems completely ignore what encoding file names are in, causing a plethora of i18n woes (e.g. a single path can be using 5 different, incompatible encodings at the same time, and there is no way to know which).

    So why don’t we just use some existing file format and add some encoding info to it and be happy about it? Because then 90% of the users won’t (because they use existing tools that don’t do that), and we’re right back where we started, except now we have even more text that we don’t know how to decode.

    Take for example the random moron who edits an XML file with his text editor and inserts e.g. an ä in the file, saves it in the platform default encoding (e.g. iso-8859-1) but ignores the “encoding=’UTF-8′” in the xml declaration at the top of the file (or, more specifically, the text editor ignores the declaration). The resulting file claims to be UTF-8 although it isn’t (it’s not even a valid UTF-8 string, so it might even crash the parser). This kind of easy-to-miss errors are what we get when we make new formats that are slightly compatible with existing tools. Happy, happy, joy, joy.. not.

    I don’t see any other decent solution than to make a new file format that is free (both kinds), and new tools that enforce compliance. If a suitable file format already exists and doesn’t have tools that don’t enforce compliance then that could be used, but I haven’t seen such a format yet.

  5. My e-book display is 225-pixels-per-inch. There are more than 5 times as many pixels in a square inch there as on my laptop, and you cannot improve the lettershapes by sub-pixel font rendering. That’s an artifact of the time when 72ppi or 96 ppi were (are) typical screen resolutions and rendering tricks got you slightly better letters. But to embed in an e-book format standard requirements that are not needed with the incoming technology and are solely a quality issue otherwise just seems, well, inappropriate.

  6. To me the OEB group is a group of very bright people squatting around under a freeway debating whether the wheel should be square or triangular. There is already a perfectly adequate method for displaying text and graphics which suits hundreds of thousands of developers, millions of authors and billions of readers. It’s been under development for fifteen years, it’s an open system and there are a plethora of gadgets and add-ins available for those rare cases where it doesn’t work straight out of the box. It’s called HTML. I read thousands of words in HTML format every week. So do millions of other people including – I suspect – most members of the OEB group.

    Unless and until the OEB group can demonstrate that there is a genuine market for something HTML can’t do – supplemented with Flash, ASP etc. where necessary – I will continue to regard this all as a tremendous waste of time and talent. A well-designed portable Web browser would be immeasurably more useful to everyone than any new eBook format, no matter how many hoops it can jump through.

    Jon.

  7. In many ways I agree with Jon Jermey, as probably would most of the other long-time contributors to OEBPS. Why? Because OEBPS, as well as OpenReader, is based on the HTML paradigm, but with necessary improvements for use as an ebook-industry-strength digital publication standard. Think of the OEBPS specification as HTML on steroids; and an OEBPS Publication as a “super web site.”

    In the first days of OEBPS development back in 1999, the starting point was to represent ebooks and similar publications with a standards-conformant web site. But as the real-world (not academic) requirements rolled in (from publishers, ebook sellers, the accessibility community, librarians/archivists, etc.), it became apparent that a few additions were necessary. These changes did not materially require a change in existing HTML rendering engines, but simply required a few “front-end add-ons,” mostly of a bookkeeping nature.

    The biggest problem with assembling a set of web pages and calling it a “book” is that there really is no “central” focal point. The so-called “home page” is simply a particular page in the set of pages. How does one identify it in a portable package? Where does one add publication metadata? How does one know what’s actually in the publication as it is being packaged and transported around? And a slew of other questions arising from the hundreds of requirements submitted by real-world folk.

    Thus was invented the OEBPS Package, which is simply a separate XML document that lists and organizes all the HTML content, CSS style sheets, image/multimedia objects, and whatever else comprises the publication. The OEBPS Package is sort of a “control center” of the publication. But the content is still in HTML (as XHTML), the CSS is still CSS, the multimedia objects are still the same multimedia objects used in web pages, and hypertext linking looks and acts just like it does on web sites, etc.

    In building OEBPS (and OpenReader) reading systems, one could use the same rendering engines built for web browsers. In fact, a plug-in to Opera and Mozilla to render OEBPS/OpenReader Publications is certainly possible (that plug-in would have to do some of the required front-end bookkeeping, setting up a table of contents, and maintain a “library” of books — things which web browsers don’t natively do now but which the ebook reading public must have!)

    Interestingly, it is the rendering engine (parsing the XML, decorating it with CSS, and then splashing the glyphs on the screen) which is what really does the heavy lifting. Everything else pales in comparison. For example, OSoft’s new reading system to replace its current ThoutReader, which is planned to render OpenReader, OEBPS, etc., will use the Mozilla rendering engine.

    So, in summary, while I understand Jon Jermey’s argument, he may have forgotten, or was not aware, of the HTML pedigree of OEBPS and OpenReader. In the important aspects, we are on the same wavelength!

  8. Well, in addition to a website being representable on the web as, well, a website, it can be represented in a single file as CHM.

    So in a number of ways, Jon J’s complaint isn’t really being answered currently by Jon N. What was true then doesn’t say much for what OEB and OR should be doing now to achieve Jon J’s test of utility: What are we getting from a new standard that we can’t manage already in our current environment?

    Annotations and highlighting are among the things that Jon N means when he talks about functions book readers want that aren’t typically provided in browsers alone. As I recall, OEBPS passes by these without comment.

    It’s my strong opinion, voiced often, that now is the time for significant steps, revolutionary rather than evolutionary efforts in an e-book standard. Otherwise, what’s the point? Like Jon J, I think if OpenReader is only marginally better that won’t be enough to cause people to adopt it in preference to the alternatives.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.