Here, from Wall Street Journal via LISNews. It’s a good read, and I hope that the WSJ in time will go on to discuss another matter–the optimal presentation of the scanned books. WSJ excerpts:

Ms. Ridolfo is part of a massive undertaking to digitize the world’s books. She is one of about a dozen scanners employed by the Internet Archive, a San Francisco nonprofit group that is spearheading the Open Content Alliance, a consortium of business and educational groups that includes Microsoft Corp., Yahoo Inc., Hewlett-Packard Co., Adobe Systems Inc. and several university libraries.

The group wants to build an online library of millions of old books and hopes to make a big batch accessible through Web searches as early as next year….

During each shift, Ms. Ridolfo sits in a soft office chair in front of an apparatus designed by engineers from the Internet Archive. The machine is about six feet tall and five feet wide, and is largely covered with a black tarp to keep out light. Ms. Ridolfo places each book on a V-shaped tray beneath two sheets of glass, also in a V-shape. Two digital cameras hang above her, mounted on brackets linked to the rest of the machine. The camera over her right shoulder is angled to snap photos of the left page; the camera at her left shoots the right page….

The Internet Archive’s effort to get books online is still in its early stages. In the little more than a year since the group started scanning books, it has digitized just 2,800 books, at a cost of about $108,250. Funding has come largely from libraries that have paid to have their texts digitized. Work will likely speed up now that Microsoft and Yahoo are on board; both companies joined the effort in October. Microsoft has pledged to pay for the scanning of about 150,000 books from collections at the U.K.’s British Library and elsewhere, and Yahoo will fund the scanning of 18,000 American classics at the University of California….

On a recent morning, Ms. Ridolfo and a fellow scanner, LaJolla Young worked quietly…

Ms. Ridolfo scanned her first book — an early 20th century copy of works by William Shakespeare — in about 40 minutes. Then she encountered her toughest assignment of the shift: the book about English authors, which weighed 10 pounds. The most vexing part came between pages 364 and 365, where Ms. Ridolfo found a copy of a lengthy, two-sided letter by Robert Louis Stevenson, written in cursive. Mr. Young, a more experienced scanner, helped Ms. Ridolfo figure out how to position the book on her scanning machine to capture a clear image of the entire letter.

6 COMMENTS

  1. we don’t need to “discuss” the “optimal presentation
    of the scanned books” — and we most assuredly do not
    need to have the wall street journal lead that “discussion”
    — we need _actual_working_programs_to_evaluate_…

    but hey, since you now have a real live programmer
    who will be creating your “openreader”, you’re welcome
    to spout your thoughts about “the optimal presentation”,
    so long as we don’t have to wait _too_ long for an app.

    so what are your thoughts?

    what do you think “optimal presentation” looks like?

    what will “openreader” look like? what is its interface?

    i think thoutreader is a nice program. but is it “optimal”?
    i don’t think so. i think it’s a long way away from “optimal”.

    or, how about the viewer-program that the o.c.r. unveiled?
    do we think the interface there gives “optimal presentation”?

    examination of that viewer shows it’s geared toward _images_.

    will open-reader be able to display books that are image-sets?
    if not, how would it leverage the o.c.r. that is being done?

    you’re dodging a lot of hard questions with a simple link to
    the openreader website. which, by the way, looks very nice.
    not that i thought there was anything wrong with the old one.
    but it’s good to see you “got” mambo (or whatever it’s called
    now, depending on which side of the fight you came down on).

    -bowerbird

  2. First of all, there are definitely hard questions to ask on various topics, including OpenReader, but I don’t believe Bowerbird asked any hard questions of OpenReader in his comment (he did ask a hard question of the Open Content Alliance folk, though.) And it is not right to imply David was intentionally misleading.

    Second, David covered it nicely that the OpenReader Format is intended to be an open standard digital publication format. Anyone can build a user agent to render OpenReader Publications and compete with others.

    That OSoft is the first to commit to the OpenReader format does not mean they are the last. For example, Lee Passey, within the OpenReader Consortium, is starting an effort to build “Orca,” an open source, cross-platform OpenReader user agent. Lee has already posted a preliminary general design document to get things started.

  3. a file-format won’t guarantee “optimal presentation”.
    and any number of file-formats can _be_ “optimal”.

    so if you want to talk about “optimal presentation”,
    it doesn’t do you any good to point to a file-format.
    or, for that matter, to an app whose interface is vapor.

    i think the matter of “optimal presentation” is vital;
    but it doesn’t seem you _really_ want to discuss it.
    which is fine. as i said, we need working apps more…

    -bowerbird

  4. I have no difficulty in discussing what is meant by “optimal presentation”, but this comment area is not the best place to do an elaborate discussion. Better to do it on Book People or other public forum designed for multi-party discussion.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.