More on TechnologyTell: Gadget News | Apple News
FamousPlagiarists.com–and an e-book angle
June 23, 2005 | 1:43 am
By David Rothman
H.G. Wells (pictured), Jack London, Martin Luther King, Doris Kearns Goodwin, Stephen Ambrose, Osama bin Laden–all have this in common: accusations of plagiarism. I’d do more links, except that, at least on my system now, FamousPlagiarists.com is so bloody slow. The fault might be at my end. Anyway, FamousPlagiarists.com is worth the hassles for those who have time. Quick! Credit where due. I found FP through The Real Paul Jones.
The e-book angle: Imagine word-crunching all the Project Gutenberg texts to see the extent to which the greats stole from each other.



Previous

SUBSCRIBE TO RSS
Comments:
And let’s now forget about Frank Abbagnale, the subject of the recent “Catch me if you can” with Di Caprio and Tom Hanks. I really liked that movie because it clearly showed what a clever man can do with his plagiarist skills
Oh, but that is one of my favorite movies! Clearly Frank Abagnale’s plagiarism of his victims’ signatures fell outside fair use guidelines.
I agree that word-crunching all the Project Gutenberg to look for plagiarism would be interesting. Current plagiarists are apparently blissfully unaware that the probability of detection is growing rapidly, and that it will be nearly unavoidable in the future. The corpus of documents available in electronic form is enormous and rapidly expanding. Even primitive tools such as “Google” give a preview of the speed and power of search on huge databases. First generation companies specializing in detecting plagiarism such as turnitin.com already exist.
Intellectuals have been remarkably tardy in demanding that the Library of Congress electronically scan its entire collection for easy and universal access. Certainly, this should be done immediately for all documents in the public domain. Also, for full-text searching and indexing purposes it should be done for all documents even those which are copyrighted.
Optical character recognition is an imperfect technology, but it allows the extraction of searchable text from scanned printed documents with 98 or 99 percent accuracy. This is adequate for finding verbatim and near-verbatim plagiarism of text passages through the use of flexible approximate matching algorithms. (It will not, however, catch extensive paraphrasing.)
New automated tools will help shame thieving miscreants by comparing candidate documents against all the documents in a super-corpus which includes the current web, archives such as the “Wayback machine” archive.org, and the printed texts in Library of Congress. Historians will be able to judge the “originality” of historical figures and previous historians with perhaps fascinating revelations.
(Note: I did not plagirize this comment. I wrote it back in January 2003 at another website and thought it would be appropriate here also.)