2

Distributed ProofreadersDistributed Proofreaders is not just a piece of software; it is also a string of communities using that software, some by themselves, some heavily connected.

Currently, the Distributed Proofreaders software is being used by three different projects: Project Gutenberg in the USA for mostly English texts, Project Rastko in Serbia and Montenegro for European languages and Project Madurai (in Australia?) for Tamil documents.

Having looked at the output of Distributed Proofreaders (DP) the last time, we’ll explore the human side of the mechanics of DP.

The three projects use different versions of the Distributed Proofreaders software. This software contains the PHP code that is needed to set up a site for distributed proofreading, and various extras, such as a font optimized for proofreading, a proofreading quiz, guidelines, et cetera. This entire package is published using the GNU General Public License, which means you can do with it what you want, but if you want to further distribute it, you have to abide by a small set of rules.

Because the three projects use different versions of the software, visiting their sites is like travelling through time. The Project Rastko version has got many extras that are related to proofreading using UTF-8 characters. The UTF-8 set allows correcting texts in many non-Western types.

All this talk of technology hides the fact that these installations are used by very real people, who are sometimes fighting the software, sometimes enhancing it, sometimes finding uses for it that were not foreseen.

The leader of Project Rastko, Zoran Stefanovic, calls the site at www.pgdp.net the mothership. This is where Charles Franks’ original proofreading site is hosted. Over 15,000 people have registered there, although it is a core group of some 500 people that work there. They are organised in competing Teams, though to be honest, as a member of Team Non-Competitive I really have no idea how much these teams compete.

There are forums where the relative merits of different approaches to proofreading are discussed. Currently, the big debate is about the intelligent proofreader. The past year, the mothership has been in the grip of a push for higher quality, with the development of the site’s software lagging behind. The squirrels, the nick-name for site developers and administrators, are currently incapable of cranking out improvements that will automatically lead to a better output. And so, for better or for worse, more is expected from the proofers, or so the proofers feel.

The term squirrel was coined by Thierry Alberto, after a site outage prompted a proofreader to ask what caused it. He suggested that the squirrels that were running the tredmill that powered Franks’ server had had a little too much of their own moonshine.

In order to increase the proofreaders’ involvement with their site, Special Days are organised from time to time, themed proofreading occassions. And from time to time I lament these occassions, because I see unfulfilled PR opportunities in them; why is R.L. Stevenson day used to proof instead of release R.L. Stevenson books? But of course, the Distributed Proofreaders are right; these days are not for the readers, but for the proofreaders.

Recently we had Proof Like a Pirate Day, as part of the international Talk Like a Pirate Day, and tomorrow there will be the DP Monster Bash for a third time in a row. The volunteers will be engaging with the scariest books they can find; and for the third year in a row, there will be a book about diseases of the horse. Don’t ask.

The proofreaders have all sorts of guidelines, organized in task-related books that take the size of airplane manuals. These things did not use to be so large, but the desire to retain more of the original formatting of a book, and to solve ambiguities where possible, has led to growth.

Often a new rule is introduced by the Project Managers, those responsible for shepherding an etext through the various rounds at Distributed Proofreaders (scanning, OCR, proofreading, formatting, post-processing). Sometimes proofreaders vehemently disagree with these rules imposed by one-book-dicators, and will work on other texts, or engage in discussion with the Project Managers. The rules that survive this vetting process and that are generic enough, will end up in the guidelines.

As noted, there aren’t enough developers to keep the site’s development in tune with the wishes of the proofreaders, and so the latter often develop low-tech means of getting what they want. Examples of these are several pools, such as the HTML pool, the OCR pool and more. These are forums where people can ask others to perform a certain task for them, such as OCRing a set of page scans.

And so the people of Distributed Proofreaders make the site and its software come alive; by constantly changing it, reshaping it, using it.

This is the third installment in a series of introductory articles on Distributed Proofreaders. The previous installments were: 5 Years of Distributed Proofreaders, and Things the Distributed Proofreaders come across. Distributed Proofreaders is a system for automating and distributing the task of proofreading etexts.

 
2