My recent article on the non-breaking space paragraph issue created some controversy among book markup experts, many of whom disagree with the idea of using such a paragraph to represent a blank line. Typesetting expert Laura Brady suggested it would be a good idea to attend the BISG Ebook Accessibility Workshop to learn more about proper steps for creating accessibility-focused EPUB files. I’d certainly be happy to if I could, but given that it’s in New York and I’m in Indianapolis, it’s not really in the cards for me at this point. (I’m just glad I’m going to be able to attend BookExpo America next week in Chicago; for a while it was looking iffy.)
The discussion on that post has generated some interesting comments. Nate Hoffelder of The Digital Reader points out that the job of e-reader apps is not to make value judgements about whether a book uses the “correct” formatting—it is to display the book exactly as coded. The non-breaking space paragraphs date back to the early days of e-books, pre-dating the use of CSS and modern rules—but enough old e-books from those days are still out there that readers should nonetheless be able to display such books correctly.
Jim Chapman, developer of Freda, chimes in that EPUB3 is such an expansive standard that to create apps that support it in its entirety would take a huge budget.
This would matter less if we were happy for all e-book reader development to be done by big software enterprises, charging $19.99 a pop for their products. But the market expects its e-reader apps to be cheap (or even free/ad-supported). That necessarily means that they are lightweight simple programs … and as such they can only handle lightweight simple ebook representations. I, for one, would really welcome it if someone came up with an “EPUB0” standard that was just about putting the words on the page, with a decent minimum of support for images, references, tables of contents, titles, chapters, sections, block-quotes, footnotes, and so forth. It would not need to be large or complicated (actually, FB2 format shows the way!). All the rest (colour, alignment, font … ) is something that (in my opinion) we should let book user decide anyway, according to their preferences and reading environment.
Therefore, reading XHTML and applying CSS styles is basically a ‘roll-your-own’ development task, which gives results just as bad as you might expect.
That is why basing EPUB on HTML+CSS was basically a bad decision by the standards-writers: books are not web-sites.
In another comment, Chapman discusses the problems further and adds:
I’m sorry to bang on about this – but I have personally spent months wrestling with this problem, and they have been wasted months. Every time that I have tried to switch Freda over to a Webkit/whatever approach, I’ve eventually hit a brick wall, and had to go back to my tried, tested (and clunky) custom parsing and rendering. I agree with you – it would be nice if this stuff worked. But it doesn’t.
In the original MobileRead thread where I was discussing the question of blank lines between paragraphs, JSWolf pointed out that section breaks aren’t the only reasons to have blank lines within e-books, either. In some cases, people might want to set off text within a section, such as when a sign or other object is quoted centered.
Meanwhile, I heard back from Scrivener developer Ioa (aka “AmberV”) on the Literature & Latte forum in regard to my question about why Scrivener uses non-breaking space paragraphs to separate sections in the EPUBs it creates. For a bit of background, Scrivener is a text processing app whose users write stories one scene at a time. The scenes are arranged in outline-style trees, so that writers can reorder them simply by dragging them to different positions within the tree. When the book is created, each of those scenes ends up separated by a blank line.
Given that Scrivener keeps those scenes separate anyway, I asked why it separates them with a blank line in the final product rather than semantic markup like <hr />. Ioa explained that Scrivener actually doesn’t create e-books from the section layout directly. Before converting to EPUB, Scrivener translates the Scrivener project into a single rich-text document. It then converts that document to the HTML that goes into an EPUB file. Because RTF doesn’t have a semantic scene separator element, the EPUB created from it doesn’t get one either. Ioa adds:
Have you considered using MultiMarkdown with Scrivener? A lot of what I’m saying here is owing to the limitation of being an RTF based editor and trying to generate clean HTML out of that. MMD works by ignoring all of the rich text stuff and using Scrivener more like a plain-text editor with a simple syntax based heavily on Markdown. MMD itself does not have an ePub generator, but (a) the HTML5 it produces is super clean and semantic, and (b) there is another tool called Pandoc which can take MMD files created in Scrivener and turn them into ePubs—it does a pretty good job of it, too.
I was honestly a little surprised to learn that Scrivener generates its EPUBs from a RTF file. If you’re already creating separate sections within the editor itself, it would seem like it’s throwing information away just to slap those sections together in one document before converting it to an e-book. But I suppose the whole section-based layout is actually meant to help writers in organizing their work as they write it, not necessarily for consideration in how to structure the e-book.
It seems that the question of proper e-book markup comes down to a dichotomy between simple users and power users. The EPUB markup standard is complex enough that people who know the code can do pretty much anything they want to in terms of e-book arrangement, within the restrictions of the format.
But people who don’t know how to code and aren’t interested in knowing how to code have tools they can use like Scrivener or Calibre to make e-books without having to know any of that sort of thing. And those tools take shortcuts. When you get right down to it, that’s the nature of any tool that simplifies a complicated process. They miss nuance in the name of being “good enough” for most people most of the time.
From that point of view, the MultiMarkdown suggestion isn’t really helpful. As a writer using Scrivener, I only really care that it produces an e-book that looks right to me. If I’m reading an e-book in an e-reading app, do I particularly care whether it was coded with <hr /> or with <p> </p> where a blank line is required? No, I just care that the blank line is there and it looks right. If it looks right to me, why am I going to want to bother to learn more complicated procedures for the sake of it being technically correct in every aspect?
Telling people who use these tools that they should learn to code their e-books properly instead is neither helpful nor useful. Many writers have better things to do with their time—such as, for example, write. And if Scrivener looks good enough to them, why should they have to learn something new for the sake of adhering to a technical standard that will still look exactly the same in most e-readers? (As I noted in my previous column, the popular e-readers that honor the non-breaking space paragraph vastly outnumber the ones that don’t—especially now that Freda’s updated to honor it as well.)
Yes, Scrivener should figure out some way to generate semantically-correct, standards-compliant code for its EPUBs. And I strongly encourage everyone who feels that way to contact Literature & Latte and ask its developers to come up with some way to fix it. But until and unless they do, it’s going to keep right on breaking standards with non-breaking spaces.
As long as these quick-and-dirty e-book-creation methods like Scrivener are around, e-reading apps need to be able to support both users who use such shortcuts, and the ones who use advanced tools to create letter-perfect standards-compliant versions. That’s why I made a big deal about Freda needing to support non-breaking space paragraphs, and why I’m glad Jim Chapman went ahead and added the feature.
So, where do we go from here?