“The slow diffusion of e-books is related to hardware issues, issues concerning software, the e-book format chaos and unresolved conflict concerning intellectual property and DRM systems,” says Terje Hillesund, a Norwegian e-book expert.

In his paper “OpenReader, Critical Text Editions and Digital Libraries,” he tells how OpenReader could serve academia along with other app areas. I’ll reproduce the paper in full in the “read on” part of this post. The paper not only makes recommendations for OpenReader but also provides valuable links to some interesting academic digitization efforts. Exactly the kind of feedback the OpenReader Consortium needs!

Whether you’re in the newspaper business or a medical journal publisher, we’re eager to hear from you–about e-book formats or what you’d want in e-book-reading software. And if you want to write at length about the important issues for OpenReader, then so much the better. You can also particpate in our e-mail lists.

OpenReader, Critical Text Editions and Digital Libraries

Terje Hillesund
University of Stavanger
June 2005

Introduction

These reflections are based on a seminar held at AKSIS at the University of Bergen in May 2005. In the seminar a group of Norwegian and French researchers discussed critical text editions: encoding, reading and user agents. Even if I refer to my own and others presentations and to the group discussions, this is not a résumé of the seminar; it is rather an afterthought in which I focus on my concern: The potential use of OpenReader in relation with critical text editions and digital libraries.

Need of a universal reader format

In my presentation in Bergen I discussed the relatively low sales of e-books and moderate use of free e-books in digital libraries, referring to an e-book report from The Oxford Text Archive. The slow diffusion of e-books is related to hardware issues, issues concerning software, the e-book format chaos and unresolved conflict concerning intellectual property and DRM systems. For teachers and researchers there is an additional problem of quality assurance (the Gutenberg Project has been criticized for erratic quality). In my presentation I was especially pointing to the lack of common, interoperable reading formats and reading applications as a problem for many digital libraries.

As part of my talk I presented the vision of OpenReader and the effort to establish an interoperable open standard end user format (a common reader format) and to develop an open source user agent (a reading application) that can be used on various platforms and different sized computer displays. I argued that a widespread OpenReader System would ease some of the e-book problems; especially those connected to the format chaos and proprietary and protective instincts of many of the players in the field (both hardware/software manufacturers and publishing companies).

I also spoke of OEBF and OEBPS and the original plan of developing a standardized reader format. By demonstrating MS Reader and ReaderWorks I exemplified how this plan had been torpedoed. In ReaderWorks you can put OEBPS-conformant documents into RW, but the moment you want to make an e-book you must use the “build e-book” command that wraps the whole thing into the proprietary LIT-format. Often copyright protection is added as well. I said that OpenReader had taken up on the original plan of OEB, hoping to let e-book producers and readers out of the proprietary wrapping trap.

As part of the presentation I invited the research group to reflect on how an OpenReader system could be useful for critical text editions. I got several comments and my experiences form the discussions are summarized here.

Different modes of reading

First of all it was pointed to the fact that there is a great diversity of reading practises and many ways of reading. Digitizing of the text and use of the Web have generated series of new reading modes. These include reading in connection with searching and browsing and reading as part of new interactive and communicative forms; chat, discussions, and web logs. For the sake of argument we divided the concept of reading in two main categories relevant to the discussion.

Intentional reading is concentrated reading of literature, journal articles and textbooks, it is a primarily activity we perform in order to get entertained, receive information or learn something. Usually this form of reading has certain durations. Intentional reading is what we usually think of when we talking about reading, it is a reading concept heavily influenced by the dominating book culture, a culture internalized through upbringing and education. In the discussion we also called the intentional form of reading, “primarily reading”.

My presentation obviously gave the impression that the OpenReader System, with its emphasis on typographical quality, flexible layout and paging, was designed to facilitate intentional reading, first of all sustained reading of books. I guess this impression is partly correct. One of the main objectives of OpenReader is obviously to create formats and software suitable for sustained reading of lengthy texts. For some critical text editions intentional reading is a highly relevant use of the texts. However, during the Bergen seminar it became quite clear that for many critical text editions intentional reading will be but one of several uses of the text corpuses.

In addition to intentional reading we can speak of functional reading, as when we engage in handling different kind of contents, for instance when we search text databases, browse the Web for information or when we write or produce texts. In these instances reading practices are parts – although integral parts – of wider activities, such as finding, studying or creating texts. Most of our screen reading is functional reading. The reading is an integral part of performing complex tasks. The uses of critical text editions are often dominated by functional reading. Researchers do close readings of texts in order to study word forms, grammatical constructions, literary styles or philosophical arguments. Different tools are used for zooming or searching text or for doing concordance or statistic studies.

If we think of reading practises as a continuum moving from pure intentional reading (as reading of suspense novels) to pure functional reading (as reading in writing), it is obvious that OpenReader is meant to cover the intentional reading end of the continuum. But it may also be that the success of Open Reader will be dependent on its inclusion of reading practises that also involve certain forms of functional reading.

Potential use of OpenReader by Critical Editions

During the Bergen seminar several critical text editions were presented and even more were referred to. The uses of the corpuses differ substantially and to some editions OpenReader will be irrelevant, whereas OpenReader might be interesting to others. The presentations and discussions suggest that the potential use of OpenReader by critical text editions is dependent on four factors: to what degree the use of the editions ise meant for primarily reading in the first place, to what degree the editions are marked up properly, to what degree OpenReader integrates tools related to functional uses of the texts and to what degree OpenReader permits direct access to whole text corpuses.

In The Norwegian Newspaper Corpus (including 350 000 000 words) the editorial texts of 6 major Norwegian online newspapers are gathered (once a day using w3mir), automatically tagged in XML and stored in an ever-growing database. The occurrence of new Norwegian words are registered and published and linked to their original web pages. The database is primarily used (in a web browser interface) by lexicographers and linguists in statistic studies of word use, for instance of foreign influence on Norwegian language. This kind of statistic use of text corpuses is at the far functional end of the reading continuum and OpenReader is irrelevant.

Another type of text editions is based on images – or digitally reproduced photographs of handwritten, typed or printed pages (of manuscripts, letters, official documents, newspapers or books). The Library of Congress American Memory provides many examples of collections of photographed manuscripts (for example The Abraham Lincoln Papers). Many collections of medieval and ancient manuscripts are also reproduces photographically (Bodleian Library, University of Oxford). These collections are not for intentional reading; historians, literary scholars, linguists and medievalists use them for closer studies. The user interface is often ordinary web browsers or in some cases Adobe Reader or DjVu.

Obviously the OpenReader user agent is not an image viewer, but the more of the verbal text in critical editions that is properly encoded, the more relevant is OpenReader. In France a group of researchers is currently publishing digital versions of 19th century Lyon newspaper “L’Écho de la Fabrique”. The weekly newspaper came out a few years in an important revolutionary period from 1832 and one new issue of the paper is now “republished” in a critical version once a week. The text of “L’Écho de la Fabrique” can be read online and image versions of the issues can be studied using DjVu. As the text volume increases an OpenReader e-book version of the newspapers could be an interesting publication format. In an e-book images of the paper would probably have a mere illustrative function.

At University of Bergen there are three major text encoding initiatives, the eldest being “The Wittgenstein Archives at the University of Bergen”. The Austrian/British philosopher Ludwig Wittgenstein published one tiny book in his lifetime (“Tractatus Logico-Philosophicus”), but he left over 20 000 pages of handwritings. This “Nachlass” has been encoded in its entirety, first in the mark-up language MECS, later in a TEI guided XML version. University of Oxford Press has published the whole corpus on CD-ROM. In this “Bergen Electronic Edition” one can study Wittgenstein’s writings in three versions: photographic reproductions of the written pages (facsimile), diplomatic transcriptions and normalized text versions. When working with a document in one mode, one can easily access the other versions. The manuscripts are thoroughly indexed and the whole corpus is fully searchable.

Such an extensive use of a text corpus as the “Bergen Electronic Edition” offers is out of the scope of OpenReader. But the Wittgenstein Archives are discussing the future use of the corpus. In the past, several edited printed editions have been produced on the basis of the encoded texts. OpenReader versions of such book editions would be an interesting alternative, given all the advantages of e-books over paper books.

The Medieval Nordic Text Archive (MENOTA) is also seated in Bergen. This is an enterprise in which every single word of old Nordic manuscripts are encoded according to a TEI DTD in three different versions: text facsimiles, diplomatic transcriptions and normalized texts (using Icelandic characters). To be able to present the texts on screen the project has developed several new digital characters (and glyphs) for which they seek Unicode recognition. The use of the corpus is not yet decided, but it will be aimed at experts (there are not many people around familiar with the Old Norse language). Web browsers will probably be chosen as the user interface. Users will be historians, archaeologists and literati, but mostly linguists that will study word formations, grammatical construction and history of language. The facsimile and the diplomatic versions are not for intentional reading, and the normalized versions will not find many people that want to use the texts for pure reading. OpenReader is thus of little relevance.

Collections of classical literature

Otherwise it is with “Henrik Ibsen’s Writings”. Henrik Ibsen was an influencial Norwegian playwright and in this project all first editions of Henrik Ibsen’s plays and poems are marked up in a modification of TEI, along with Ibsen’s letters, his other writings and a large collection of commentaries. This critical text edition will be published both digitally and in printed book form. The book edition will consist of 30 volumes, 15 volumes of text and 15 volumes of commentaries.

The digital use is yet to be decided. The hope is that the entire corpus will be freely available to the public, but it is not decided in which form. In presentations the project uses a publication model.

Henrik Ibsen’s Writings:

[Click here for related graphic on the Ibsen project.]

Even if the original Ibsen language is old-fashioned Danish style, most of the texts are very well suited for intentional reading (that is why they are published in printed versions in the first place). Open standard interoperable e-book versions of the 30 volumes would therefore be an interesting alternative, provided the format permits high quality presentations of plays and poems. E-book publications of single plays would also be very useful. If OpenReader materializes it could be a very good alternative for the Ibsen edition.

At the seminar in Bergen it was also made clear that “Henrik Ibsen’s Writings” wants the digital collection to be used for studies and research. For students and scholars to be able to compare works and to follow themes and idea through Ibsen’s entire authorship, it will be necessary to search the whole corpus for input. Cross-linking between books, especially between the commentaries and the literary works would also be very useful. The OpenReader encapsulation system, wrapping books into rather closed entities with little network integration, makes OpenReader unfit for these kinds of studies. In addition to a possible use of OpenReader for intentional reading, “Henrik Ibsen’s Writings” will have to find other end users solutions for research activities.

Different it is for digital libraries, such as The Oxford Text Archive and Electronic Text Center at University of Virginia Library, holding collection of classical literary works in e-book formats. These collections of free e-books are meant for pupils, teachers, students and the general public. The use is first of all intentional reading. The texts are marked up in XML and today the libraries present the e-books in a variety of formats, most of them proprietary and some limited to certain operating systems. For this kind of digital libraries a widespread OpenReader format would be both liberating and advantageous.

Conclusion

The potential use of OpenReader by critical text editions will differ. Some text databases are mostly for statistic analyses; other critical editions are collections of images of manuscripts. Yet other text collections are primarily made for research, in which access to the entire corpus is essential. For all these critical editions OpenReader will be of little use.

For other editions OpenReader might be useful. The more dominated critical editions are of lengthy texts and literary works, the more useful OpenReader will be. OpenReader will have its greatest potential with digital libraries keeping collections of literary works.

Some critical editions (not mentioned above):

Canterbury Tales Project
http://www.cta.dmu.ac.uk/projects/ctp/

Lyrical Ballads – an electronic scholarly edition
http://www.rc.umd.edu/editions/LB/

The Digital Nestle-Aland project, Universität Münster
http://nestlealand.uni-muenster.de

The European Society for Textual Scholarship
http://www.cta.dmu.ac.uk/ests/

The Victorian Web
http://www.victorianweb.org/

The William Blake Archive
http://www.iath.virginia.edu/blake/

The Walt Whitman Archive
http://www.whitmanarchive.org/

Victorian Women Writers Project
http://www.indiana.edu/~letrs/vwwp/

The Women Writers Project
Brown University
http://www.wwp.brown.edu/

Goethes Werke im WWW
Weimarer Ausgabe
http://goethe.chadwyck.com/

The Augsburg Web Edition of Llull’s Electoral Writings
Universität Augsburg
http://www.math.uni-augsburg.de/stochastik/llull/

NO COMMENTS

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.