“It’s OK for libraries to put things in their EPUB books.” That’s what Bill Kasdorf, a member of the EPUB Working Group, told me last week at the IDPF Digital Book 2011Meeting. He checked with EPUB Revision Co-Editor Markus Gylling to make sure. I had been curious if libraries could put all their cataloging information inside an EPUB file instead of siloing it in their catalog system.

It may seem an odd question if you don’t know a few things about EPUB. EPUB is a standard format for ebooks. It’s used by Apple, Barnes and Noble, Kobo, Overdrive and many others not named Amazon. EPUB is near the end of a revision process that will result in EPUB 3.0.

The EPUB specs define a lot more than just a file format. Both EPUB 2 and EPUB 3 define a container format (in EPUB 3 it’s called the EPUB Open Container Format(OCF) 3.0, and then go on to define a number of file formats for files that go inside this container. These files are the resources- texts, graphics, etc. that make up the ebook.

OCF uses the ubiquitous ZIP format to wrap up all a book’s resource files into a neat, transportable package. That’s pretty much standard these days. Java “.jar” and “.war” files use the same mechanism, as do MacOS’ “.app” files.  As a consequence, you can use any unzip utility to look inside an EPUB file and manipulate its contents.

There’s even a reserved name for a file to contain book level metadata in OCF: META-INF/metadata.xml, as well as another file for rights information, META-INF/rights.xml. Another file, META-INF/signatures.xml can be used to prove who made parts of the file and determine whether anyone has mucked with them. When Gluejar issues Creative Commons editions of newly relicensed works, we’ll use the rights.xml file to make sure the CC declaration is explicit.

The new EPUB revision is coming fast. Last Monday, Bill McCoy, Executive Director of the International Digital Publishing Forum (IDPF) announced the release of the full EPUB 3 proposed specification. My guess is that when we look back on this event 10 years hence, we’ll recognize this as the moment EPUB began to revolutionize the world of information, and with it, the book industry.

Although Amazon still uses the aging MOBI format on its kindle devices, it seems only a matter of time before the infrastructure accumulating behind EPUB pushes them into the embrace of the IDPF. Already, most of the content flowing into the Amazon system is being produced in EPUB and converted to MOBI. Don’t expect this shift to happen soon though; in his IDPF presentation, Joshua Tallent of eBook Architectsdescribed rumors that this would happen soon as “bunk”- but it will happen sometime.

EPUB 3 comes with lots of goodies. The revision adds several modules of sorely needed capability. It includes MathML, SVG and JavaScript over a substrate of HTML5 and CSS2.1. While MathML and SVG are essential for education and technical markets, JavaScript has been somewhat controversial because of the difficulty of making sure things work securely and without connections. Most of the reading systems inherit javascript capability from the WebKit rendering engine they’re based on, so a lot of javascript functionality will work in ebook readers regardless.

(left) Autography Founder and Author  T. J. Waters

All this capability will remain latent unless people find compelling uses for it. I’m not worried. As the BookExpo itself got started, I met two different companies who were manipulating ebook files to solve the same problem: how can an author sign a book when the book is digital? Both companies, Autography andInScribed Media, create personalized experiences that leave artifacts of an author-consumer interaction inside ebook container files. Both of these companies have compelling solutions; they differ in their business models. Autography is structured as an author focused bookstore; InScribed is developing partnerships with existing bookstores.

InScribed Media Founder and Author Alivia Tagliaferri

To some extent, InScribed and Autography are forced to be a bit convoluted in the way they deliver their product because they need to live inside DRM green zones; users don’t have access to the files inside books without cracking the DRM (which is rather easy, by the way!). It’s unfortunate, because personalization of ebooks could be a good way to encourage responsible use. I certainly don’t want that picture of me torrenting around the world!

Libraries face a similar dilemma. The insides of an EPUB file could be greatly enriched by  libraries, which have every motivation to enhance discovery both of the book and the information inside of it. But DRM gives the publisher and its delivery agents the exclusive ability to build context inside ebook containers. Libraries and readers are locked out. I think that for DRM systems to survive they will need to accommodate a more diverse set of user manipulations; author signatures are just the tip of the iceberg.

Coming soon, I’ll report on EPUB 3 metadata.

Via Eric Hellman’s Go to Hellman blog

 

5 COMMENTS

  1. I don’t see why Amazon will move to EPUB 3.0, now or in the future, Just as EPUB has been updated to expand its capabilities, Amazon can update its AZW format. In addition, Amazon could go to its own HTML5-based format, just as EPUB 3.0 is now largely based on HTML5.

  2. “EPUB is a standard format for ebooks”? On the publishing end, perhaps. On the consumer side, EPUB is a disastrous mess of incompatible “standards”, one that can be (and is) happily ignored by the majority of e-book buyers.

    For all of EPUB’s unquestioned technical advantages, it won’t become a serious player unless it goes DRM-free, universally (which the Big Six publishers won’t go for) or they adopt (and force) a single, universal DRM scheme a la CSS for DVDs (which Adobe and Apple won’t go for).

  3. “But DRM gives the publisher and its delivery agents the exclusive ability to build context inside ebook containers. Libraries and readers are locked out. ”
    The vast majority of libraries are not hosting the content, but are going through Overdrive instead. If you aren’t hosting the content, then you can’t modify it – DRM or not.

    For readers, the metadata, manifest and spine are not encrypted and can be modified regardless of DRM, ( in either ePub2 or ePub3 btw). So at a technical level it is possible to add content( but this takes tools such as Oxygen – and definitely isn’t end user friendly).. Modifying existing content, such as fixing typos, is what is hard to do.

  4. @Peter- I would argue that epub is chosen more often than other formats by users. With the exception of Kindle users of course.

    From my sales numbers epub’s sell the most, followed by pdfs, and then the rest of the formats. Personally I chose epub when I can for my personal ebooks, especially if they are not drm’d (which I try and avoid as much as possible).

  5. Len,
    I agree with you re the not-often expressed idea that Mobipocket (which Amazon bought) may be revising MOBI as well, to improve the layout capabilities to match what ePub 3 can do.

    Amazon also bought Lexcycle, makers of Stanza, focused on ePub. That Amazon could not get the Mobi layout features more up to date would be a curiosity — otherwise what are those people doing?
    I did read though that there was an exodus of people who headed a lot of the work on Stanza.

    Then there is the HTML5-based work and the web versions of Kindle books they did announce they’ve been working on. I’d think that if publishers can continue to submit in ePub but see much better results from Amazon’s conversion to an updated Mobi, they’d be pretty happy.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.