10

images.jpegIn my article on “Consumer Demand for Pirated eBooks“, I showed that Google Trends data tells a very different story from the one that anti-piracy services vendor Attributor derived from the very same data. I did not comment, however, on the headline that Attributor gave for its press release. The key finding of the report heralded by that release was that “Daily demand for pirated e-books can be estimated at 1.5-3 million people worldwide.” This result has garnered some significant attention, because the number is quite large.

Extracting numbers using the tools used by Attributor is rather involved, and it’s taken a while for me to carefully examine the available data. After doing this work, I’ve decided that when Attributor wrote “can be estimated at 1.5-3 million”, they left out the word “blindly”. As far as I can tell, Attributor is recklessly inflating the magnitude of ebook piracy; using the very same traffic measurement tools, I estimate the truth to be about 10% of the number they claim.

The Attributor numbers come from data generated by Google’s AdWords service. AdWords is designed to help advertisers select advertising keywords and to manage budgets. For example, AdWords will tell you that the keyword “PDF” is used in approximately 101 million searches per month, worldwide, or 3.32 million searches per day. “PDF” is a keyword that a searcher might use in the course of a search for a pirated ebook, so you could reasonably assume that some percentage of these searches involve a consumer looking for a book they can avoid paying for. The trouble with this assumption is that most searches that include “PDF” have nothing to do with ebooks.

Another AdWords tool designed to assist Google advertisers is the keyword suggestion tool. In practice, you use this tool to refine keywords. Here is a table of the top ten refined searches for “PDF”:

Screen shot 2010-10-20 at 9.23.07 AM.png

Of these, it’s reasonable to assume that some percentage of the “pdf free” and a smaller fraction of the “pdf download” searches are related to consumers trying to avoid paying for books. The other searches are clearly unrelated to books. We can further use the keyword suggestion tool to refine these estimates. My review of over 700 refined keywords indicates that at most 4% of PDF searches, or 132,000 per day, are looking for ebooks of any kind.

A review of AdWords’ suggested refinements for the term “rapidshare” reveals that searcher interest in ebooks is negligible compared to that for movies, TV, music and games. For example, Rapidshare is a “file-locker” site, and might be expected to appear in search terms for illegally distributed files. Of 743 suggested keywords, only one, accounting for 0.24% of “rapidshare” queries, or about 4,000 per day, is clearly related to ebooks:

Screen shot 2010-10-20 at 9.35.08 AM.png
Screen shot 2010-10-20 at 9.35.51 AM.png

Although direct interest in ebook torrents is so small that AdWords can barely measure it (~1500 searches per day), torrent search sites can give us another way to estimate the magnitude of interest in pirated ebooks. According to “KickassTorrents“, the torrents active recently had this composition:

Screen shot 2010-10-20 at 9.37.27 AM.png

About 1.4 million searches using the keyword “torrent” are made on Google daily, according to AdWords. If the distribution of searches mirrors the distribution of files, this would indicate that searches for ebook torrents comprise about 46,200 per day.

All in all, I estimate that about 210,000 searches made on Google per day represent possible interest in pirated ebooks. About 30,000 of these come from the US. The “real” number for all countries could be as high as 300,000 or as low as 100,000. The 1.5-3 million numbers reported by Attributor are not within the range of plausibility.

One difficulty with using Google AdWords to gain insight into piracy is that it measures only a “shadow cast by piracy”, as expressed by a commenter on my previous post. Nonetheless, AdWords sheds considerable light on patterns of demand. For example, the tools show clearly that it’s common for people to search for movies and TV shows and acquire them extralegally. Also, they indicate that most of the demand, about 82%, for pirated ebooks comes from outside of the US, UK and Canada. Publishers should plan antipiracy strategies accordingly, based on data that can be confirmed independently.

Editor’s Note: Eric Hellman is a technologist, entrepreneur, scientist and writer. After 10 years doing research at Bell Labs, he founded Openly Informatics, a linking technology business that was acquired by OCLC in 2006. Over the last year, he has been blogging about ebooks, libraries, and technology at Go To Hellman. PB

 
10