Small pieces loosely joined: Lessons from Unix for e-book developers
May 28, 2008 | 6:43 am
By Liza Daly of threepress.org
Moderator: Liza Daly, our newest contributor, runs threepress.org, an open source project. See her bio at the end. Welcome, Liza! – D.R.
"Do one thing, and do it well" is the core of the Unix Philosophy. Unix is the third major flavor of operating system besides Windows and Mac OS (actually a certain-flavored Unix), and it’s the platform that serves most of the content on the Internet. Whether you are aware of Unix or not, its software development ideology has had pervasive influence in making the Internet an open platform not dominated by any one corporate interest.
I’d like to see this philosophy of loosely coupled, single-use tools applied more widely to digital publishing, and e-book development in particular. This is the time for publishers to look beyond the monolithic, closed-source frameworks that have defined conversion and digital workflows to date.
Three tenets in software development can apply here:
1. Most technical problems have been solved before. Start with those solutions and customize only when necessary.
2. Less code is better than more code. Specialized ("domain-specific") languages such as XSLT can dramatically reduce the amount of code that one has to write because they are so tightly coupled to the source XML.
3. Find ways to make all these different programs work together. If a better one comes along, make it easy to switch it in.
A lot of this philosophy was reflected in the thinking about the ePub standard:
1. XHTML and CSS already have the vocabulary and software support to display reflowable digital content.
2. ZIP has dozens of implementations in different programming languages and is widely understood.
3. Import from other formats where XHTML and CSS aren’t a good fit: DAISY NCX, SVG, DTBook.
Of course, reader support still lags, and some concerns remain about the ePub standard itself. This is where vendors can meaningfully contribute.
4. Mistakes or omissions at the specification stage don’t have to be the end of a technology. At some point it’s time to just start writing code. Early versions of HTML were terrible: they didn’t address non-textual media at all, and there were dozens of arbitrary semantic tags and no clear distinction between semantic and presentational markup. Yet here we are today.
5. Commercial interests help as often as they hurt: Netscape introduced JavaScript (but also the blink tag); Microsoft released the first major browser with CSS support and invented Ajax (and then left us with IE).
6. Open source is great at low-level tools, server software and standards, but rarely has the same end-user focus that commercial software does.
Both camps contributed to the runaway success of the web.
The goal for e-book developers, then, is to innovate without forgoing standards, address problems with an agile, "fix it in the mix" approach, and above all, keep things flexible. Moving fast will be best way to prevent the market from being locked up by a single company that may not have the best interests of readers and publishers in mind.
///////
Liza Daly is a software engineer who specializes in applications for the publishing industry. She was the lead developer on major online products for Oxford University Press and has designed reference sites for Columbia University Press, Rosen Publishing and SAGE Publications. Currently she runs a publishing consulting company and is the developer of threepress.org, an open source platform for experimenting with e-books and online reference material.



Previous

SUBSCRIBE TO RSS
Comments:
Great article, Liza! And welcome to the TeleRead blog contributors fraternity.
Your assessment of why ePub is designed the way it is, is spot on. I contributed to many of the OEBPS and OPS/OPF/OCF working group meetings starting in 1999, and so understand the many stakeholder requirements and thought processes that went into the specs.
Definitely, leveraging already-established technologies was important in the design of OEBPS for the reasons Liza cites. Why re-invent the wheel without good reason?
The real innovation of OEBPS, which today’s OPS continues, is the “Package” construct (the OpenReader format continued this with the equivalent “Binder”.) During the time OEBPS 1.0 was hammered out in 1999, it was clear from the many requirements (including numerous ones from publishers) that we could not design OEBPS based on the “web site” paradigm. Thus was invented the Package.
(The second most important innovation of OEBPS, which unfortunately is not yet well-known, is the “out-of-spine” feature, now officially called “auxiliary content” in OPF. One of these days I’ll write a blog article explaining this feature and why it needs to become better known and implemented. Interestingly, this feature is trivial to enable in web browsers, and is one reason why leveraging web browsers to render ePub Publications is so intriguing.)
Of course, the Package decision has kept OEBPS in a world somewhat separate from web content and browsers, since the web paradigm does not use anything like the OEBPS Package. Instead,
index.htmlhas become the defacto “package mechanism” for most web site “publications.”Since 1999, Web technologies have evolved and improved, including web browsers. Today we have a richness of web browser technology we did not have in 1999. The real innovators in the web browser space include Opera, Mozilla/Firefox, and the fast-upcoming Safari. This bounty of technology is one reason, among several, why ePub must not ignore the web browser, and as ePub evolves it needs to move closer to the Web world, and not away from it as some others are advocating.
Maybe a full merger is possible in a few short years? I don’t know, but it is certainly worth exploring. I am now exploring a digital publication format that is much more compatible with web browsers, while retaining conversion compatibility with ePub. Possible? Definitely. The current ePub is excellent at representing primarily linear narrative publications. But as soon as one considers other types of digital publications, which tend to be more non-linear and topic-oriented, ePub does not represent those well.
Furthermore, when one looks carefully, linear narrative books comprise only a minor portion of the full digital publication universe. ePub needs to evolve to embrace non-linearity, and the web site paradigm is certainly the best “current technology” way to accomplish this goal at the consumer level since everyone already has a powerful web browser, and web browsers are becoming ubiquitous on nearly all hardware, even cellphones.
No need to reinvent the wheel, as Liza stressed in her article.
Totally agree, Jon! Liza nicely summed up the issues at just the right technical level for the TeleBlog community (we’re a mix of publishing folks, other book-lovers and coders and people who overlap within these categories). We’re always open to new contributors, especially those as thoughtful as Liza. – David
[...] first on the TeleRead blog is up: Small pieces, loosely joined. This reflections my thinking in working with epub these last few weeks and with open source [...]
Mac OS X *is* UNIX. It’s BSD atop a Mach microkernel. HTH, HAND.
–C
Moderator: We’ve made a tweak. Thanks, Cerebus. – D.R.
Im pretty sure microsofts reader’s format is based extensively on mhtml(IE’s way of packaging whole webpages as single files) and i would not be surprised if many of the others are looking a lot like TEI-C or Docbook if you take off the custom packaging and encryption. When it dont look like e-book vendors are using standard document markup tools it not because they dont but because they base base their DRM business on the fact that you don’t know how it works.
One of the other traits of unix was that ther never really was a central authority defining what unix should do there was a bunch of companies doing their own thing within a set of loose conventions ensuring minimal interoperability.