Moderator: Our newest contributor is a familiar name, Hadrien Gardeur—co-founder and CTO of the Feedbooks site offering e-books for many devices and in many formats. Feedbooks supports standards for both e-books and the Semantic Web. A hearty welcome to you, Hadrien! – D.R.

image EPUB as a standard—it’s actually three standards, OPS, OPF and OCF—is a real step forward for e-books.

I really like how flexible the OPF standard is: with proper fallbacks you can very easily support all sorts of devices, and extend the document. But there’s still room for improvements, and the best way to support EPUB is to discuss how things could be better. For example, how about improved means of linking to other books? Or adding new capabilities to travel guides displayed on a GPS-enabled device?

In the official document for OPS there’s a section named “Future Directions”:

“This specification is designed to take advantage of current practices while preparing for future developments. Although details of subsequent versions of this specification remain to be determined, it is the expectation of the Publication Structure Working Group that continued evolutionary development will occur. The themes driving the creation of version 2.0 of OPS are: standards compliance (e.g. full namespace support), accessibility support, support for any XML document typesupport for a wide-range of XML document vocabularies, enhanced navigation support, and improved content presentational fidelity.

Other themes deemed important for future versions include: more rigorous separation of content and presentation, greater accessibility, better support for international content, Reading Device-specific presentation control and/or Reading Device profiles, enhanced support for inter-Publication linking, layering and managing markups (e.g. inking, highlighting, notes) within Publications, application-specific markup (e.g. math, chemical), multiple reading orders, and support for active content (e.g. multimedia, scripting), all while maintaining alignment with relevant standards. Additionally, maintaining backward compatibility to this version of this specification ought to remain a high priority. Future directions can be tracked at http://www.idpf.org.”

For my first post, I will focus on how we can annotate markup with semantics, and why this is useful in a book through two examples:

  1. linking to another EPUB document
  2. a travel guide on a GPS-enabled device

Right after Adobe released Digital Editions 1.5 beta 1, I had a little chat with Peter Sorotkin from Adobe and the Digital Editions team over at MobileRead. You’ll notice that in every book on Feedbooks we link to additional resources: other books from the same author at the beginning of the book, recommendations at the end of the book. I wanted DE to handle these links properly and directly download and add these books to DE’s library instead of going through the browser.

Here’s what Peter told me:

“Yes, it occurred to us before that this would be a useful feature. Here is what I think:

I would not want to download a book if I already have it. Also, an inter-book link may point to a specific section of a target book. So I think we need more information than simply a book download link. Ideally, there should be a reference file inside epub container that would describe target book metadata, so you could match it in your library if it is there, download link(s), so that it can be downloaded if it is not, and a reference to a specific section of the book (if needed). A quotation from the target book would be nice as well, so that we can attempt navigation even if user has different edition by just searching for the text.”

That’s a very interesting point of view, and it’s similar to what IDPF defined in its future directions: “enhanced support for inter-Publication linking”.

But instead of a specific document that would strictly reference other publications, I really believe that we should use something much more powerful: RDF.

Aside from reference to another document, there’s thousands of ways we could process semantic information to create an enhanced experience for the end-user.

Now the real question is: should we add inline semantic markups (RDFa) or should we reference all this information in a RDF file inside the container ?

From a content producer perspective, I believe that it’s much easier to use RDFa, although it could be possible to pre-process the XML file, parse the RDF and create a separate file.

From a reading system perspective, parsing a separate file is easier, although it shouldn’t be too difficult to parse RDFa either (it’s not that much of a resource hog, compared to CSS and pagination).

From the end-user perspective, there are three ways you should be able to interact with these sets of metadata:

  1. click on a link defined as a RDF resource
  2. the reading system could highlight some elements, add an icon, and let the user select an action
  3. at the end of the flow, or at the end of the book, a human-friendly list of the RDF resources

Let’s see through two examples how this could work in a book…

Inter-Publication Linking

There’s a few ways we could identify this type of resource:

  1. using an identifier (ISBN for example)
  2. with DublinCore (title, creator etc…)
  3. using an URI

The best solution would be to use a mix of all three. Here’s a basic example:

<rdf:rdf xmlns:rdf="http://www.w3.org/1999/02/22-rdf-syntax-ns#">
    xmlns:dc="http://purl.org/dc/elements/1.1/"
    xmlns:dcterms="http://purl.org/dc/terms/">   <rdf:description rdf:about="urn:ISBN:0143039717">
    <dc:creator>Jack London</dc:creator>
    <dc:title>The Iron Heel</dc:title>
    <dc:language>en</dc:language>
    <dc:date>1908</dc:date>
    <dcterms:hasformat>
      <rdf:description rdf:about="http://www.feedbooks.com/discover/epub/2381">
        <dcterms:format>application/epub+zip</dcterms:format>
      </rdf:description>
    </dcterms:hasformat>
  </rdf:description>
</rdf:rdf>

In this example, the subject is the ISBN of the book. We could therefore link a footnote about this book using urn:ISBN:0143039717. The reading system could check the metadata to see if the book is already available on my computer, download the book on Feedbooks or redirect me to an online shop.

It would be interesting too to redirect the user to a specific point in the book. I can’t express this sort of information using DublinCore, but another ontology could be used for this (although it’s very hard to reference to a specific point of the document: using the name of the flow is a very bad idea, DOM ID won’t work that well either, I think that Peter is right when he said that you should include a quotation. This solution would work across multiple editions of the book, although it wouldn’t work with multiple translations of a book).

A travel guide on a GPS-enabled device

image Now let’s see another example of what we could do using RDF in EPUB. I’ve just downloaded a travel guide on my GPS-enabled smartphone. I’m visiting Paris, I’ve just spent the whole day visiting le Louvre, and I’d love to have something to eat/drink. Flipping through the pages of the travel guide (it’s in EPUB), I’ve just found a place named Angelina: they have the best hot chocolate in Paris. Hopefully, my file is tagged with GeoRSS, I can click on Angelina and select the right action:

  • tell me how I can go to Angelina from my current position and how long it’ll take
  • or display Angelina on the map

Conclusion

Let’s add something as powerful and flexible as RDF in the next generation of the EPUB standard. With the right implementation on the reading system, the potential is unlimited. Instead of using a new type of document strictly limited to inter-publication references, it would be very easy to explore the possibilities of RDF/RDFa.

Technorati Tags: ,

4 COMMENTS

  1. So the URL would need to indicate the fragment of the text corresponding to the link. The ISBN could be the local copy which could be checked against before trying to find it externally. Is it possible to use a hyperlink now which could open up an ebook at a specific spot? (with digital edition or mobipocket?)

  2. Hadrien,

    I’m very glad to see a discussion of linking! I’ve pondered the issue myself here, and you bring up interesting additional concerns. While RDF is great at describing relationships and metadata, when it comes to linking, I think the URI approach is the most important one for the IDPF to address. Beyond identifying the resource itself, a URI should be able to:

    -identify any DOM element in the publication, WITHOUT relying on a unique ID
    -identify any range of elements or characters in the document
    -identify any COLLECTION of ranges of elements or characters in the publication
    -identify any point in the document (corresponding to a unicode point)
    -identify a fallback mechanism if the resource has changed or been removed: SUBSTITUTE, SEARCH, or REDIRECT

  3. Well honestly I think that in this case, a URI is mostly an identifier. All the things that you listed could be represented as triples. You can’t strictly rely on a URI approach for linking: in some cases, you’ll have to use the DublinCore (author, title etc..) describing the resource to find and launch the right download, in another case you might use the resource that you provided using dcterms:hasFormat etc…
    You could list all these attributes somehow directly in an URI but it’s much cleaner to use something like RDF for this.
    The real challenge of linking is the existence of multiple editions of a book. If you know which edition you’re linking too, it’s pretty easy to use DOM elements for example to identify everything. Unfortunately, most of the time, you won’t have this sort of information.

  4. I like the possibilities of RDF but it seems heavy to me right now. My own experience with parsing it for practical reasons has not been good or fun. From a user agent perspective, it’s not lightweight enough, IMO.

    So you would propose a way to link to an RDF contained within the OPS? And within that would be information about highlights, bookmarks, ids, or whatever?

    My problem comes from having a purely web-based perspective, I think. I want epub to be something that browsers can run with without much overhead, and right now, many browsers don’t know what to do with RDF, and if you want to have them parse it, you’re talking about lag times.

    It would be nice for people, for example, to be able to have a list of hyperlinks on their blog or home page, each of which links into an OPS structure, and perhaps describes passages or collections of passages. That would be more in keeping with the state of the web right now.

    Re: multiple editions — this comes down to UUIDs or ISBNs, and back to URIs, right? The spec states that each package should have a unique identifier and a specified type for that identifier, so whatever it may be, each edition of a book should have a new one.

    I really like the fact that the spec for OPS allows for alternate formats of a book to exist within the package itself, such as PDF/HTML/XHTML versions and partitioned/non-partitioned or graphical/non-graphical versions all existing in one package. But I do think different editions should be separate packages.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.