Some public domain books online are nightmares for academics–they’re full of errors and in some cases even combine print edititions without leveling with readers. Do other people care? Well, actually, if you go by the results of a survey of eBook Community list members, they do–a lot. Granted, the list isn’t necessarily representative of readers at large. But here are the results:

– I prefer the public domain etexts I read to be faithful to an authoritative printed edition, 21 votes, 52.50%
– I have no preference, 14 votes, 35.00%
– I don’t know (need further clarification on the poll question), 5 votes, 12.50%

Luckily, through means such as Distributed Proofreaders, the main supplier of digitized text for Project Gutenberg today, the public domain community is far better about accuracy-related matters than it has been in the past. Still, I can’t wait for the day when images of all the important classics–not just the texts alone–are online.

18 COMMENTS

  1. Rather pointless, this one.

    would you rather: have a car that might crash, do not care, or want a car that will not crash?

    now I venture to bet that HUGE numbers of those polled will want a car that will not crash. But I also bet that many people in the real world would (and do) choose cars based on other factors (sex appeal, speed, economy, luxury, style, etc. — and price) … and that safety is either far down the list or not even considered.

    Free books appeal to real readers based on price. Look at the pirate groups.

  2. Thanks for your thoughts, Pond. Slightly flawed books can be fine for casual recreational reading, but in the long term, won’t it make sense to go to a little extra trouble and do things right–in terms of both accuracy and format? Someday digital books will be the norm. Shouldn’t we care about our cultural heritage? If the poll is representative–I don’t know if it is–others just might feel the same way. – David

  3. Would you rather have a car that is an exact reproduction of the original designs, or one that is largely built according to the design, but with undocumented changes made according to later insights and allowing for the builder’s skills and experience? That is the question here.

    Yet even that comparisson is flawed. What some of the scientific community is interested in, is artefacts. Some scholars would like to know everything there is to know about a certain print copy of a book that has existed at some point in time, because they would like to be able to compare them to other print copies of the same book. Darwin, for instance, is known to have revised his books in between print runs. It is interesting to study the differences, because they show how Darwin’s thoughts on a certain subject may have changed over time. It is furthermore interesting, because you want to be able to refer people to an author’s most matured thoughts on a subject, or you want to avoid revisionist history. Project Gutenberg has got an uncensored edition of “De Zoon van Dik Trom” (a children’s book). Apparently, none of the modern versions have the bits that were censored by the Nazis during their occupation of the Netherlands.

    These discussions rage at Project Gutenberg from time to time, and I always get a little bit miffed at how PG’s output gets portrayed as failing considerably, even by some of its volunteers. It is of course encouraging to see how volunteers want to improve the process, but it is discouraging to see that our products are being called utterly worthless, because they cannot be traced to a single edition. Yes, such strong opinions exists about PG etexts, and some of those that hold them are PG volunteers.

    From the poll: “The phrase “faithful to an authoritative printed edition” means the etext quite accurately reproduces the original text from a specific printed edition which both scholars and amateur enthusiasts highly recommend as being a good representation of the Work.

    My position in the most recent of these discussions at Distributed Proofreaders: some scholars will never accept anything that comes out of PG “as being a good representation of the Work”, because it would undermine their position as a gatekeeper.

    And of course, “good representation” is a pretty meaningless phrase. You need to define the sort of use you want to make from a work (or its copy), and you need to define what representation will be acceptable for that use. The suggestion that all scholars have exactly the same needs is preposterous.

    You can never please everyone. You may please some, but Project Gutenberg already does so admirably.

  4. In addition to accuracy of transcription of a Work (which implicitly includes faithfulness to the original) there is the issue of using an authoritative representation of the Work.

    For example, there exists an online edition of Sir Richard F. Burton’s translation of the “Perfumed Garden of Sheik Nefzaoui.” The problem is, the online edition is not the original “Perfumed Garden”. Instead, it comes from a significantly censored, pirated edition (dated ca. 1913) that for some reason is more common than the original. I’ve tried telling the folk administering the archives that the work is not the real “Perfumed Garden”, but I never heard back from them, nor did they either remove the title or at least add a disclaimer. The point is that once a book is digitized and placed online, it gets a life of its own, and even if greatly flawed will permanently persist. This is exacerbated when the digitized book says nothing internally about where it came from. Is this the way to carry our Public Domain into the digital era?

    Another example is that in sampling only one work from a major online public domain archive, Networker (an eBook Community regular) determined it to be identical to a modern, apparently edited edition published by a major publisher (the edits are probably covered under copyright, although some will argue.) Now this does not prove the text was derived from a copyrighted work (maybe the publisher simply adapted an older edited version), but because there was no metadata associated with the digitized version (no idea what the source book was), the end-user does not know the pedigree of the work vis-a-vis copyright. Nor do they have any idea of the potential edits done to the work because there’s no way to determine that in the digital realm, at least at this time.

    I could go on to describe more examples, but they are troubling to me and quite a few other people I’ve talked to. Some may believe the digital public domain will correct itself over time, so no need to make an issue out of it. But I’m not so sure myself. Rather, I see a digital repeat of what we historically see with many Works, where scribes significantly alter the original as it passes through their hands — and after a few generations the original Work has been so altered as to be quite different. And it may actually be worse in the digital realm, because of both greater persistence and, without metadata, the inability to recognize pedigree. At least with a paper book one has a built in way to determine pedigree (by bibliographic analysis.)

    When a public domain digital text says only “title” by “author”, how do you know it is a faithful transcription from an authoritative source? Do you care if it has been significantly edited from the original, or the original was itself not a faithful representation of the Work in question (like the “Perfumed Garden” example)? Maybe you don’t care if you are told this information, but what if all you are told is “title” by “author”? Do you care then? Do you just assume that it must be faithful and authoritative?

  5. I whole-heartedly agree with Branko that some scholars will never be pleased no matter how rigorous the digital transcription process is done. But my focus on bringing up this topic is not from a scholarly perspective, but from a defacto preservation and end-user (consumer) perspective. Somehow this distinction is being missed (maybe because of the incorrect assumption that only scholars care about faithfulness and authenticity, with which I disagree.)

    I won’t cover the “defacto preservation” aspect (which I do in the “Book People Forum” posts of the last few days.

    From the consumer-side, I believe when a reader downloads a digital book which has for metadata only “title” by “author”, that they implicitly assume the work is a true representation of the given title (the “Work.”) It’s amazing how much trust exists by those who download digital works, especially if it comes from a source they’ve heard about. They assume that if it comes from (fill-in-the-blank) that the Work is acceptable.

    The reason for my “poll” is to try to understand better what end-users of public domain texts really want — to understand how important the issue of faithfulness and authenticity are to them. It is very important to me, but maybe my position is in the minority. The “poll”, however, suggests that my view is not in the extreme minority (as I’ve noted, I don’t consider the results the final word). More people care about faithfulness and authenticity than many of the pioneers of digitizing the public domain have assumed. That they may trust the pioneers does not indicate that they don’t care about faithfulness and authenticity.

  6. Authority presumes trust, and trust requires identity. You must either have access to the physical book that an etext was derived from, or you must trust the provider that the book did derive from the source that s/he listed.

    Project Gutenberg does not provide editions in the sense that you cannot claim that a certain copy you have is The Project Gutenberg Edition. Too much caution has led to a policy at PG to remove information about the source edition from its etexts, but a little over a year ago this policy was reversed. It wasn’t a hard policy to begin with, because PG’s white-washers (the volunteers that form the last barrier to books entering the PG database) could decide for themselves whether they would let this information through.

    The main reasons for not mentioning publisher and copyright date were different, though. PG does not want to create the impression that books are still copyrighted (which a mention “copyright 1912” at the start of an etext might create), and it wants to stay out of legal quarrels with publishers over their trademarks.

    For the past three years, the bulk of Project Gutenberg books has come from the Distributed Proofreaders, who as a whole are much more concerned about preservation than perhaps the original PG volunteers. We tend to leave in information about who published the copy we work from, and when. If you trust us, we are authoritive to you.

    There are, of course, many intermediate levels of access to a book that may lead to more or less trust. A physical copy of a rare book may be locked up in a university’s vaults. I may need to trust the scholar who says he saw phrase X on page Y, becaue I may not get physical access to the physical copy itself.

    For me, having free (both as in beer and speech) access to the book or scans thereof or the services of the conservator goes a long way in trusting the source. When a university, especially one that is funded from public money, only allows non-commercial distribution of its book scans, or asks money for access to its physical copies, it presents me with a secondary motive for giving access to its artefacts. It means I can trust the source less. In that sense, recent Project Gutenberg etexts may be even more authoritive than their Alma Mater based counterparts, at least to me. We have no other motive than the desire to make documents available for dissemination.

  7. Great discussion above. Re Branko’s comments below…

    > For the past three years, the bulk of Project Gutenberg books has come from the Distributed Proofreaders, who as a whole are much more concerned about preservation than perhaps the original PG volunteers. We tend to leave in information about who published the copy we work from, and when. If you trust us, we are authoritive to you.

    DP is doing great things and will be (even) greater when the resources are there for images to be available. I like DP’s approach of leaving helpful info about parent works.

    Happy holidays,
    David

  8. Branko makes excellent points. He mentions Distributed Proofreaders, which I have come to regard (and admire!) as the preferred system to digitize public domain texts. The main reasons are that they take a more preservationist approach (that is, they aim for faithfulness which I do believe is not only a moral imperative but also what consumers want), plus their process is more open/transparent to the public, where multiple people participate and (hopefully) check each other’s work out. This makes DP’s work product much more trustworthy.

    Hopefully DP will be able to make their page scans publicly available as well. Since the Internet Archive will gladly archive them once a job is finished, this is no longer an issue. I’ve volunteered to assist DP in getting their scans online (as I am able to help), but there are a few technically- and resource-related issues which are delaying DP taking the time to start this (with almost 8000 etexts, it is a major endeavor, even if the scan sets are not to be “normalized” in any fashion, such as uniform filenaming and metadata.)

    It’s been difficult to talk about the “trustworthiness” of a lot of older public domain etexts since I know some to many of them are, if they are to be studied closely, authoritative and faithful (and thereby trustworthy if proven to be such.) I’ve met a few of the people who have laboriously typed in the books by hand and carefully proofed them. They’ve poured their hearts and a significant portion of their lives into the etexts, to make sure they are done right. I trust their work. But the problem is not a failure on their part, but the system that they contributed to which was not committed to the basic principles of AFT. How will people in the future know if their work is trustworthy? How can they tell? Does the source metadata exist so we can determine authenticity and faithfulness if that is our desire? Does the etext metadata even mention who did the transcribing? (The last would be nice if somehow a particular person can establish themselves to future generations as being a careful, meticulous and trustworthy “scribe”. But I fear this will not happen.)

    This, again, is the power of Distributed Proofreaders. By its nature and production processes it has an eminently higher level of trust than text conversion projects done by individuals working alone. Plus, because they have saved the scans (to eventually place them online), are dedicated to faithfulness of text reproduction, and have sufficient metadata, simply adds to the trustworthiness of their work product. Project Gutenberg should be proud that a significant portion of its etexts comes from DP, and hopefully over time nearly all of them will come from DP and similarly run projects.

  9. well, the noring/rothman spin machine finally has this “debate”
    in a place where they can control the spin of it, how convenient!

    if people want to see a more well-rounded treatment, they can
    visit the archives of the bookpeople listserve, which are found at:
    > http://onlinebooks.library.upenn.edu/webbin/bparchive
    you’ll find most posters are not in accordance with the spin here.

    you can look at the messages made around this time (december 7),
    or find the matter discussed numerous times (too many to count!)
    in the past. noring still hasn’t seemed to gather even enough people
    around his position to form a group to start to “solve” this “problem”.
    i wonder why that is?…

    -bowerbird

  10. I am sure the average reader would like their (free) ebooks to be “authoritative” editions, but in most cases have no clue what constitutes an authoritative editions, whether it’s a PG text, a Penguin classic, or an edition printed by one of the mass market American publishers in 1902. Quick, outside of a few SF buffs and English Lit majors, how many know that there are two primary editions of Frankenstein? What’s that? Zero Percent? We have a winner! (Oh, and don’t get me started on My Ántonia.)

    Futhermore, I can’t see anyone getting excited about the results of a poll of 40 people out of a potential group of 3000, where the participants were not chosen at random, and the number that agreed with Noring’s pet position was barely over half.

    As far as Distributed Proofreaders is concerned, they’re not interested in doing “authoritative editions”, they’re interested in doing accurate conversions to electronic texts of the editions to which they have access. I’m sure that’s been the intent of all the rest of the Project Gutenberg volunteers as well.

  11. I appreciate Bowerbird’s reply, and invite him to directly address where he disagrees with my various positions on the general topic of AFT, and do so in an objective, rational, and non-adhominem manner. (E.g., his “noring/rothman spin machine” mentioned above is an example of an ad-hominem attack which has no place in any civilized discourse.) Bowerbird is very intelligent and capable of reasoned discourse.

    His characterization of the thread on Book People is also greatly skewed, and I also welcome everyone here to read the various messages posted there. Bill Janssen and Mark, for example, provided insightful posts (and they disagreed with some of my positions.) Most of my private email has been supportive in one way or another (with some respectful disagreement), but it is clear a lot of the people are lurkers and some are scared away by the fierce ad hominem tone being directed against me by a couple of folk simply because I’m pointing out that, in my opinion, the Emperor is only partially clothed. Arguments that are directed about me (my motives, my deficiencies which of course I have many, etc.) are totally irrelevant in discussing this topic.

    I’m a firm believer that discussion should and must be totally rational, objective, and pointed at the arguments, and not on the person, nor their motives. I hope Bowerbird does likewise and adds to this discussion. If he chooses not to, that’s his decision, but please drop the ad hominen stuff, ok?

  12. i invite people to visit the bookpeople thread,
    and make their own judgments on the issues.

    doing a “debate” here in the comments section
    of a blog where spin doctors control the front page
    is close to worthless, especially since it will shortly
    go off to the “archive” section anyway…

    -bowerbird

  13. Yes, I agree that the blog comments section is not the best place to discuss this further.

    With respect to the claim of ‘spin doctoring’, that of course is very subjective. (You must love Bill O’Reilly! *laugh*.)

  14. sorry, i don’t know who that is.
    is it someone on television?
    i don’t watch much television.

    except for desperate housewives
    and gray’s anatomy on sunday,
    and lost and invasion on wednesday.

    my other favorite is boston legal,
    but it’s on tuesday now, when
    i have a prior standing engagement.

    anyway, jon, if you prefer the term
    “outspoken dedicated spokesbulldog”
    to “spin doctor”, do please feel free to
    substitute it in at the opportune time. :+)

    -bowerbird

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.