What’s so open about OpenReader?

Is it David Rothman’s contention that every e-book should be able to be opened in more than one e-reader? Laudable as that is, without other incentives it may require divine intervention to prevent OpenReader’s becoming merely an Esperanto of e-book formats in the Tower of eBabel he jeremiads about.

And what’s so open about Jon Noring’s single-author OpenReader spec? Is it his acceptance of most every good idea anyone has proposed for e-books these last seven years? His open attitude to making others’ ideas fit in with his own firm notion of what is needed isn’t exactly the best way to create an industry-wide spec, a notion that he infuriatingly seems to be open to.

To my mind, there’s a different “open” in OpenReader worth pushing for. It’s like the “open” in open-source, and the “extensible” in XML. This open means open-ended.

In open-source, the code is available for anyone to add to its capabilities. Users frustrated by a program’s (or an OS’s) limitations are free to fix the problem; they have the code to work from so they’re not stuck. Nothing’s blocking them from improving on it. Being open-ended is why open-source came about.

XML is open-ended as well, allowing users to choose precisely the terms they need to make sense of their content (data or text). Don’t find the <p> tag enough, but need <poem>, <stanza> and <line-of-verse> to mark up your literary efforts? You can make them. And to prevent a Tower of xBabel, proper XML documents are self-documenting. Anyone who encounters your file should be able to figure out what a <poem> element is for, even if there’s no schema or dtd accompanying it.

All the e-book formats around are based on HTML or its close relative, XHTML. And as anyone who works with text knows, HTML isn’t rich enough to represent all the things we want to encode in our books — you want <poem> for your book of poetry, and I want <journal-entry> and <letter> for my parents’ book about the year they spent in Cuba in 1938.

I want to use my own markup to publish the journal of that gypsy year, not transform it down to html’s level just so it can be displayed in the e-reader. The copy people are reading should be just that: a copy of the master file, not a version with all the non-readable content stripped out. That’s like publishing a screenplay with only the dialog, and no indication of who’s talking, or where, or when scenes change or what the screen is showing.

So the OpenReader I envision doesn’t limit an OR file to a limited set of tags, the way every e-book spec in creation does. It wouldn’t even start with a set of pre-accepted tags, the way OpenReader seems destined to (yes, the ones in XHTML — what else?). Sure, adding DocBook and TEI along the way will push the limit out to something technical publishers and academics will find roomy enough. But that’s not enough.

No matter how many tags you make, there’ll always be another one you want for some book or another.

I say make OpenReader open enough to handle any tag. Forget specifying which tags are OK in advance. XHTML isn’t enough and XHTML plus DocBook and TEI isn’t enough, and really, there’s nothing you can add that will be enough for every circumstance. You just require the OpenReader binder to reference a file (CSS? DTD? XSD? RNG?) that has all that an e-reader needs to know to render whatever it should.

That’s something XML editors have been doing for at least six years. It’s not like I’m proposing something radical.

Let’s make the open in OpenReader mean open-ended.


PS: I suppose you don’t have to have the rendering file in your package. If the e-reader recognizes XHTML, DocBook or TEI vocabularies, then the binder would reference its built-in knowledge. And even with other vocabularies the e-reader is ignorant of, the package doesn’t have to carry the reference file if an extension/plug-in has been added. Publishers would include a format plug-in with e-book purchases — “Put this in your e-reader plug-ins folder, and your OpenReader-compatible e-reader will render any of our nonfiction books marked up using Historical Event ML.”

Coming up in Part II: A more radical meaning of “open”

11 COMMENTS

  1. Hi, Roger. Jon can answer in more detail when you’re done. I do agree that flexibility is important. Just the same, it’ll be great if consumers don’t have to mess too often with plug-ins, etc. OpenReader as Jon envisions it can handle most any typographical situation that you’d encounter on paper, and to me that’s no small feat. As for OpenReader as Esperanto–well, we’ll be starting right out the box with a powerful first implementation in the radically redesigned ThoutReader. You’ve seen for yourself what ThoutReader can do, and I hope that others will follow. Thanks. David

  2. Just so there’s no confusion — I’m not saying OpenReader is an Esperanto. I’m saying it could turn into an Esperanto if universality is the only feature it has that people find useful.

    And saying that OpenReader “can handle most any typographical situation that you’d encounter on paper” doesn’t really face up to the issue at hand. How does OpenReader handle something like this line, marked up in Theological Markup Language:

    <scripture passage=”Mark 7:16″ version=”NKJV”>If anyone has ears to hear, let them hear!</scripture>

    There’s no option other than to discard the information that it’s a passage of scripture, Mark 7:16, and that the translation is the New King James Version. Doesn’t have to be that way.

  3. Roger wrote:

    <scripture passage=”Mark 7:16″ version=”NKJV”> If anyone has ears to hear, let them hear!</scripture>

    This discussion is interesting and important in that this touches upon the relative advantages of requiring a particular XML vocabulary vs. full vocabulary agnosticism.

    Certainly, with CSS “display” (plus XLink as needed for hypertext linking and image/multimedia embedding), it is possible for one to ““roll their own” markup vocabulary (and to some extent the associated grammar). The current OEBPS 1.2 Specification allows one to add their own elements to documents — CSS is used to instruct the reading system how the element is to be rendered.

    However, from a reading system perspective, the downside to letting anyone roll their own vocabulary, where CSS is used to “decipher” the vocabulary, is that CSS “display” is quite coarse in describing what something is, even more coarse than HTML. For example, display=”block” only says the element is a block-level element. In HTML, there are quite a few block-level elements whose meanings have been predefined (e.g. <p>, <blockquote>, etc.) CSS does not give us any level of structural and semantic precision. Furthermore, CSS is intended for visual rendering — it was never intended to convey metadata about elements in any standardized, universal fashion.

    Knowing document structure and semantics is important for various applications, such as text-to-speech, high-quality searching, language translation (where layout conventions may differ from Western conventions), etc. — and of course understandability of the markup by others.

    The accessibility community has been pushing for years for a fixed, high-quality markup vocabulary that unambiguously tells the text-to-speech engine what something is, at least to a level useful for high-quality text-to-speech. Furthermore, if future use by others is important, then markup following some norm is useful. For a minimum set of the structural/semantic items the accessibility community has in mind, refer to the Digital Talking Book 2005 Specification.

    Remember, the blind cannot “see” typographical layout (which helps convey document structural and semantic information) in linear text-to-speech presentation — to them, the text just comes streaming. (Those familiar with recital poetry, or even reading books to their children, know that voice inflection, timing, even verbal statements of structure metadata, are necessary to convey the structure of documents to the listener. Otherwise, it’d just be a monotonous stream of words and it is oftentimes impossible to discern the document structure in a stream of words without audible cues. Having adequate knowledge of the structure/semantics associated with a snippet of text, the text-to-speech engine can aurally present it in a way the listener would know what it represents.)

    For example, let’s say I use a particular markup vocabulary to be as light as possible in size. I’d consider using one letter element names, like <a>, <b>, <c>, <d>, etc. Their meanings (document structure; text semantics) are known to me, but unless I communicate somewhere what these elements mean, how will someone (or especially a machine) understand what the tags mean? And CSS is not a help since all CSS does is style — and now we’re back to the issue of inferring document structure and semantics by intelligent analysis of visual presentation. How would a text-to-speech engine know what the <g> tag stands for or how to aurally present it? Even visual CSS will provide no clue other than that it either represents block-level content, or is inline.

    There are two answers to this conundrum that come to mind:

    1. Everyone settle upon a few established markup vocabularies, or
    2. Devise some sort of “Rosetta Stone” system which assigns pre-defined document structure/textual semantics metadata to custom elements and attributes.

    In the short-term, OpenReader plans to strictly define the vocabularies that could be used. We are starting off with a structurally-oriented subset of XHTML. But as Roger noted, HTML is actually quite poor at communicating markup metadata. So OpenReader is being designed to allow ready expansion to support other and better markup vocabularies, such as TEI and DocBook (or some well-defined subsets of them.) There are also specialized vocabularies such as NewsML that are of interest. And Digital Talking Book. Even the FictionBook 2 vocabulary is of especial interest.

    The “Rosetta Stone” idea is intriguing. I’ve talked with a couple accessibility advocates about it, and they are intrigued. It essentially would be a systematic way to apply pre-defined metadata to elements so reading systems would “know” what the elements mean. For example, in the example I gave above, we may be able to assign the metadata meaning “blockquote” to the <g> tag. The formalism of XSLT might be used in the metadata assignment/mapping, but it would not be used to effect a real document transformation, but simply to tell reading systems (such as text-to-speech engines) what a particular element means. Then, anyone who “rolls their own” markup vocabulary would be asked to provide the standardized metadata describing what each tag means in a document structure/semantics sense.

    And since Bowerbird is likely to reply to this, let me note that even though his ZML system employs plain text, the plain text is regularized to a pre-defined set of document structures and semantics. So it is the equivalent of building a particular markup vocabulary (it may even be possible to build an XML-based markup vocabulary equivalent which allows round-tripping from ZML to XML and back again to the original ZML.) From my few looks at ZML, it is superior to HTML in a few ways in conveying important document structures and semantics in a pre-defined standardized manner (in HTML one has infinite ability to assign document structure/semantics in a non-standardized way using the “class” attribute and the <div> and <span> elements.) The question is if it can represent most if not all of the document structures and text semantics supported by the Digital Talking Book (DTB.)

  4. I’m going to chide Jon here — if you know what OpenReader is going to do (“[it] plans to strictly define the vocabularies … starting off with a structurally-oriented subset of XHTML”) before it’s issued, then McCall’s point is telling. If a preliminary draft is going to go out for comment and there is some way to process those comments and adapt the draft and establish voting rights and voting members then approve some or none of the changes, then no one can say with finality what the spec will include or not. Authorship of the proposal doesn’t provide sole approval of it in a community — even Thomas Jefferson only got one vote on his proposal. 🙂

    But I know this is just a slip of the pen as you shift roles from lone visionary to participant in, well, an open form of governance.

    You describe something here, though, that confuses me. In a system like MS Reader with lit files compiled from an OEBPS package, I suppose the piece that renders the page doesn’t know <h1> from <p>.

    But if an OpenReader-compliant e-reader is reading a non-compiled file, wouldn’t it still be aware of a <scripture> tag? And to reiterate what I wrote yesterday, it should be clear what every element in your file is for, even if there’s no schema or dtd.

    (As a side point: Anyone perverse enough to use <a>, <b> and <c> tags in direct contravention to the self-documenting intent of XML seems unlikely to me to supply the additional metadata for a Rosetta Stone approach. Remember that two of the guiding principles of the XML working group were that “XML documents should be human-legible and reasonably clear” and that “terseness in XML markup is of minimal importance” [I quote from the recommendation itself].)

    Users who want more than displaying the words on-screen in page format will benefit from getting the original markup. Why strip it away to deliver it in an OpenReader package? Why have to convert it to TEI, when the explicit rationale for creating a Theological ML is that TEI has proved insufficient for that interest group’s needs? [1]

    One possible approach seems to be “Use XHTML tags for ver 1.0, add DocBook and/or TEI or possibly subsets in ver 1.1, and then 1.2 or 2.0 or some such can permit user-definable vocabularies.”

    Another approach is “OpenReader 1.0-compliant e-readers must be able to display any XML file accompanied by certain designated information (including fallbacks). Period.” OR ver 1.1 might then pre-supply all the necessary information for certain vocabularies such as DocBook or TEI (or maybe subsets thereof), enabling any e-reader maker or user to incorporate those preferred vocabularies. And ver 1.2 might contain … well, you’d better read Part II to see what I expect of that.

    As you can see. I prefer the second approach. But my opinion would only count for one vote, of course.


    [1] The introduction of the ThML defining document at http://www.ccel.org/ThML/ThML1.04.htm explains how TEI among other alternatives is unsuitable.

  5. Roger does make good points, and his chiding of me is also deserved.

    The issue at hand boils down to whether the primary vocabulary used in a document is a well-accepted standard or is “home brewed.”

    Any home-brewed vocabulary makes it much more difficult for user agents (and user agents are not restricted to simply provide visual rendering!) to “understand” the markup.

    For example, text-to-speech engines really, really, really would like to know when a snippet of marked-up text represents a header (and better yet, what level the header is at.) CSS “display” is not sufficient to identify any block of text as a header. It can only say it is a “block-level” tag, and that’s it. (And user agents for other applications will also benefit; for example, text search engines are enhanced when they can differentiate between various important types of document structures, such as section headers.)

    At least with (X)HTML, which defines what the markup tags mean, user agents can be programmed to know that something is a header. (There is the minor issue that someone may use something other than a header tag to represent a header in HTML, but then that’s poor markup practice, not a lack of variety of markup tags.)

    If OpenReader 1.0 is to allow arbitrary vocabularies from the get to, it must, to meet with accessibility community approval, address and finally solve the issue of “How do we assign document structure/text semantic metadata to elements?” Without it, the free-for-all that will be introduced will make for an accessiblity nightmare, and more importantly it cannot be easily undone. And if we rush through some solution, without adequate discussion, review and testing, we may make a big mistake, causing problems in trying to later fix the spec.

    So, as an OpenReader Working Group contributor, I would recommend that we start off conservatively with a well-known, well-defined, even if deficient, markup vocabulary (such as a well-defined subset of XHTML 1.1 where the junk has been cleaned out), and only allow that for the time being.

    After the OpenReader Framework 1.0 is released, we can then begin work on supporting other “standardized” vocabularies, as well as begin the process of “vocabulary agnosticism” and investigate various solution pathways, such as the “Rosetta Stone” method I previously described. This allows us to expand in the future in a controlled fashion.

    But if we just open up the flood gates without careful thought, we may regret it later. W3C is still trying to pick up the pieces of all the bad decisions made in the early years of the HTML spec — the inertia of having billions of quirky, malformed, mis-marked-up web pages, and a huge established tool base to produce poorly structured HTML, is making it difficult for the web to be upgraded to web standards that benefit everyone.

    The bottom line is that it is better to be overly constrictive and open things up slowly and carefully. But if one opens up too much too soon, we may unleash a genie we can’t put back in the bottle. I opt for the former approach so long as it is not so constrictive that it chokes the life out of the spec. Fortunately, OEBPS 1.2 has proven to be quite stable (even if deficient), so using that as a starting point for the next-gen framework, OpenReader, makes a lot of sense.

  6. I think Jon is taking the wrong lesson from the example of HTML.

    If OpenReader is YAEBFBOH (yet another e-book format based on html), then you’ll get people creating e-books to the dumbed-downness of the HTML tag set. And thereafter people will create to OR 1.0, just as they did to OEB 1.0, even after improvements are implemented — it seems people will upgrade to higher-numbered software versions but create to earlier specs. (HTML and XHTML are a perfect case in point, but ask Adobe what percentage of PDF or SWF files are created to the latest version. I bet it’s lower than that created to the version that was standard five years ago.)

    If instead OpenReader is the one format that allows authors to preserve the semantics they feel are necessary in their markup, by virtue of being the first one through the technological door OpenReader will become the format of choice for everyone with text that’s one inch outside of HTML.

    And denigrating work DTD’s and industry DTD’s as “home-brewed” is grotesquely inappropriate — those are the real “consumers” who will benefit from a universal consumer format. DRM issues may really prevent trade publishers from adopting such an openly structured format, but will be irrelevant to anyone not worried about losing sales.

    As for accessibility — yes there are some structures like volumes and parts and chapters and sections and subsections that might all take a <title> element as a child. If that’s confusing (and I concede it may be), it’s easy enough to add to the accessibility requirements that a user be able to learn an element’s parent and grandparent and other ancestors. Or, if you must, require the e-book creator to supply an accessibility map in the package. But apart from a few basic structures, assigning semantics to elements apart from the names their creator gave them is a morass you probably don’t want to enter.

    So really all you’re saying is that <h1>, <h2> and <p> elements are “good” for accessibility because they’re part of such a limited set of tags that you can tell something about a level, when it shows up. But surely you don’t think that <poem>, <stanza> and <line-of-verse> elements are less useful to someone listening to a tts engine read a book?

    I would say that the better way to provide accessibility is to stay the heck away from HTML, except as a route for taking material already marked up in it (presumably for the web).

    I agree with the subtext of your comment, that OpenReader will only have one chance to make a first impression. Just seems like we’re drawing opposite conclusions on how to do that. You say, “Don’t rush into it,” with “it” meaning vocabulary agnosticism. And I say, “Don’t rush into it,” with “it” meaning version 1.0.

  7. Again, Roger brings up good points.

    A point I’d like to bring up is that one can’t rely on user agents (especially accessibility-related ones) being intelligent enough, at least in the short-term, to be able to figure out and assign pre-defined metadata to “home brewed” tag sets. For example, OpenReader is intended to be international. For <poem> user agents would also have to recognize variants of that, such as <verse>, and also the variants in a myriad of languages (such as German, French, Japanese, etc.) That’s the idea behind the Rosetta Stone proposal — it provides a standardized way (that user agents can easily understand) to assign semantic metadata to home brewed tags — at least to the ones important for many user agent needs. Refer to the Digital Talk Book 2005 Specification for an idea of the kinds of document structures and text semantics that accessibility experts believe should be identified (and they probably would like more, which the Rosetta Stone system could easily support.)

    Regarding using XHTML, the proposed OpenReader variant is a very well-defined subset of XHTML 1.1. It removes the presentational markup present even in OEBPS 1.2. It forces much better authoring of HTML — it forces XML validity and well-formedness. In addition, it is a good place to start since OEBPS documents can be upgraded fairly easily (depending on how bad they were in the first place) to the proposed OpenReader variant of XHTML 1.1/Basic OEBPS 1.2. I can guarantee that many with HTML will squak and ask why their favorite tag (usually a presentational tag) isn’t allowed.

    I am not pessimistic as Roger is that XHTML will hard-wire so much into OpenReader that it will prevent future upgradeability to other vocabularies, and therefore prevent us from the long-term goal of vocabulary agnosticism. In the current working draft, it continues to be pushed that we plan to expand support for other non-XHTML vocabularies, so user agent developers need to start building their user agents from the start with this future expansion in mind (the Mozilla engine is already pretty markup agnostic, for example.) That is, they should not use some brain-dead presentation engine which is hard-wired into HTML, for example.

    So, again, I would not advocate we allow full markup vocabulary agnosticism in OpenReader 1.0 because, frankly, we probably would not get anything out until the end of 2007, if even that. Why? Because it has to be done right or else there may end up to be thousands or millions of OpenReader books done wrongly which will be an impediment to fixing things. Anyway, the current proposed path does not inhibit the long-term goal of reaching vocabulary agnosticism because it is constrained (after all, XHTML is just another vocabulary.) So that means we start off with a well-defined, yet flexible vocabulary. Again, if we totally ditch even the highly-constrained subset of the current OpenReader proposal, then we’d have to look for another which is stable. Again, we are talking about 2007 or 2008. Let’s get OR 1.0 out, and then begin in earnest to add support for other vocabularies and building the underlying mechanisms to achieve vocabulary agnosticism. My focus on the 1.0 architecture is to easily allow this expansion, which is something that OEBPS 1.x cannot do — it is truly hard-wired into XHTML as a careful reading of the OEBPS 1.x specs will show.

    (Another argument is that there are a number of work flows producing very detailed markup that will be difficult to directly render in the near future, such as DocBook and various flavors of TEI. The usual approach here is to use XSLT to convert that to XHTML, so OpenReader’s Basic Content Document vocabulary is a good transformation target, at least in the short term.)

    Roger and I are in total agreement of where we want to end up — where we differ is in the short-term and long-term strategy of how to get to the Promised Land. <smile/>

  8. I want to add that after I finish the first preliminary working draft of the proposed OpenReader 1.0 Framework Specification, that the working group effort will begin in earnest. We are looking at more formal homes to host the specification work. We are making contacts at IDPF and elsewhere. I am wooing quite a few experts in the XML document community (which is a large and diverse community) to assist with specification development. At that time is when the full set of requirements will be established, and the work to get to the final OR 1.0 will begin. It is probable there will be many changes to the first preliminary draft — some changes could be light-years different. This is where Roger and other experts will contribute. There will likely be changes that I don’t agree with, but I do intend for it to be a true, cross-stakeholder community effort. The purpose of the preliminary working draft is to make it a lot easier to start from somewhere.

  9. […] Some good news from TeleRead Have you ever heard that good news comes in batches? Well, sometimes good blog news comes in like the tide, and at TeleRead the tide is in! Take a look at some of the great stuff that’s on topics near and dear to our hearts: * The really open reader * USA Today columnist calls for e-book standards * The iPod IS a threat to listeners? ears * PDF, PDAs and the reflowability question * Downloading Dickens: Inevitable, or a Fantasy?? * IDPF survey of e-book buyers: PDA platform and eReader lead?and, yes, book prices matter * Update on HP e-book reader * Manybooks.net readers use Palm, Pocket PC, desktop PC […]

  10. I have been following the development of OR for a while now, but I really don’t get the whole fuss about homebrewed vocabulary.
    As I understand it (correct me if I’m wrong) OR trys to separate content from format, putting all the content in an XML-file and leaving the rendering (ergo the formatting) of the content to the reader and its CSS repository.
    But what we’re talking about here is not some stupid IE who crashes the moment he finds a tag he doesn’t know. In my opinion the solution is quite simple: OR defines a predetermined vocabulary to cover “most any typographical situation that you’d encounter on paper” thus creating a standard that can be read on any reader, device, ect.
    So far so good, but what happens if the reader encounters a tag he doesn’t know? Well not much really: he looks into the CSS, where he doesn’t find what he’s looking for but where a “default” style is waiting for exactly this to happen. If you want to make it even more user-friendly, a message pops up and asks the user which style he would prefer to use for the unknown tag.
    This however falls all flat the moment the publisher publishes a customized CSS file along with its content that contains the non-standard tags with all the necessary style-info. Again with user-friendlyness in mind as soon as the software discovers and unknown tag it asks the user if he would prefer to just use a standard style, but informs him also of the alternative CSS files that are on the system.
    So unless I missed some point here, I don’t see where the big problem should be.

  11. I agree with you, Paolo. I don’t see the big problem either.

    It may be that you and I are focusing on the block and inline aspects of things, and it’s the more obscure types that could present difficulties.

    Maybe the restriction will be that this non-XHTML/DocBook/TEI vocabulary will have to be represented as a block or inline element and nothing else is permitted. Or, rather, let the e-reader decide whether or not it will attempt to render anything that’s not block or inline.

    If it’s well-formed XML, then displaying the element content in a default font doesn’t seem to me to be a terribly awful fallback for those situations where the e-book contains things the e-reader doesn’t know about.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.