11

c89db834-6cb3-437a-918b-af57e9996d98On the A List Apart website, Joe Clark has written an extremely good, extremely long essay on why HTML-based formats are becoming the new standard for e-books, and what needs to be done to clean that standard up.

Clark points out that HTML “is great for expressing words”—and not just words in websites, but the form of words used for most fiction and some non-fiction books—what Craig Mod called “Formless Content”. Every e-book reader on the market can display some HTML-based formats—everything but the Kindle can do ePub, and the Kindle’s AZW format is just HTML-based in a different way.

Of course, every format decision blocks off other avenues, possible roads not taken. Clark is not equivocal that in advocating adoption of HTML, he may be blocking off new forms of “book” that have yet to be invented. But on the other hand:

I am happy to contribute to the death of “vooks” and other multimedia websites masquerading as books. (I do not want a rectangle of video yammering at me while I’m trying to read.) They’re like animated popunder ads in that no actual “user” wants them, but somebody with an agenda does. Exterminating that species is something to which I am proud to contribute. For other forms of books, advocating strict HTML markup will cause as-yet-unknowable harm.

He then goes into details about problems that need to be solved in order for HTML to be successful as an e-book format of choice. The semantics have to be cleaned up and standardized, so that e-books can be created with valid HTML code. Also, production methods have a lot of room for improvement—especially the early generations of e-book created largely out of unproofed scans of paper books.

Clark goes so far as to suggest that manuscripts should be written in HTML, then converted to Word for editing and change tracking, then passed to InDesign. (Though he does admit this point of view is “so optimistic as to be ridiculous.”)

Instead of avoiding errors to begin with, the publishing industry may choose to fix errors after they’re made—but only if authors, especially big-name authors with ruthless literary agents, complain loudly until publishers have entire imprints’ E-books repaired. This will not result in authors writing good strong HTML for new books, but will clean up part of the mess.

After this, Clark goes into a considerable level of detail as to what formatting tasks should be handled by CSS, and what tasks by the reader software. He also talks about changes that should be made to e-book text to make for easier format conversions (such as using an endash with spaces instead of emdash with none).

This article is well worth reading—and in a perfect world, publishers would be taking it to heart. I have to agree that adopting these standards would go a long way toward cleaning up the currently execrable state of a lot of e-book conversion efforts.

Standards would mean that a lot of conversion tasks could be automated, and it would be considerably easier for publishers and self-publishing authors to create e-books out of completed manuscripts.

But standards or not, there is no denying that HTML is becoming a de facto standard for presenting text, simply because it is in so much use on the web and relatively easy to work with. And it will be perfectly adequate for most of the e-texts people read.

It will be interesting to see where it goes from here.

 
11