images.jpegIn my article on “Consumer Demand for Pirated eBooks“, I showed that Google Trends data tells a very different story from the one that anti-piracy services vendor Attributor derived from the very same data. I did not comment, however, on the headline that Attributor gave for its press release. The key finding of the report heralded by that release was that “Daily demand for pirated e-books can be estimated at 1.5-3 million people worldwide.” This result has garnered some significant attention, because the number is quite large.

Extracting numbers using the tools used by Attributor is rather involved, and it’s taken a while for me to carefully examine the available data. After doing this work, I’ve decided that when Attributor wrote “can be estimated at 1.5-3 million”, they left out the word “blindly”. As far as I can tell, Attributor is recklessly inflating the magnitude of ebook piracy; using the very same traffic measurement tools, I estimate the truth to be about 10% of the number they claim.

The Attributor numbers come from data generated by Google’s AdWords service. AdWords is designed to help advertisers select advertising keywords and to manage budgets. For example, AdWords will tell you that the keyword “PDF” is used in approximately 101 million searches per month, worldwide, or 3.32 million searches per day. “PDF” is a keyword that a searcher might use in the course of a search for a pirated ebook, so you could reasonably assume that some percentage of these searches involve a consumer looking for a book they can avoid paying for. The trouble with this assumption is that most searches that include “PDF” have nothing to do with ebooks.

Another AdWords tool designed to assist Google advertisers is the keyword suggestion tool. In practice, you use this tool to refine keywords. Here is a table of the top ten refined searches for “PDF”:

Screen shot 2010-10-20 at 9.23.07 AM.png

Of these, it’s reasonable to assume that some percentage of the “pdf free” and a smaller fraction of the “pdf download” searches are related to consumers trying to avoid paying for books. The other searches are clearly unrelated to books. We can further use the keyword suggestion tool to refine these estimates. My review of over 700 refined keywords indicates that at most 4% of PDF searches, or 132,000 per day, are looking for ebooks of any kind.

A review of AdWords’ suggested refinements for the term “rapidshare” reveals that searcher interest in ebooks is negligible compared to that for movies, TV, music and games. For example, Rapidshare is a “file-locker” site, and might be expected to appear in search terms for illegally distributed files. Of 743 suggested keywords, only one, accounting for 0.24% of “rapidshare” queries, or about 4,000 per day, is clearly related to ebooks:

Screen shot 2010-10-20 at 9.35.08 AM.png
Screen shot 2010-10-20 at 9.35.51 AM.png

Although direct interest in ebook torrents is so small that AdWords can barely measure it (~1500 searches per day), torrent search sites can give us another way to estimate the magnitude of interest in pirated ebooks. According to “KickassTorrents“, the torrents active recently had this composition:

Screen shot 2010-10-20 at 9.37.27 AM.png

About 1.4 million searches using the keyword “torrent” are made on Google daily, according to AdWords. If the distribution of searches mirrors the distribution of files, this would indicate that searches for ebook torrents comprise about 46,200 per day.

All in all, I estimate that about 210,000 searches made on Google per day represent possible interest in pirated ebooks. About 30,000 of these come from the US. The “real” number for all countries could be as high as 300,000 or as low as 100,000. The 1.5-3 million numbers reported by Attributor are not within the range of plausibility.

One difficulty with using Google AdWords to gain insight into piracy is that it measures only a “shadow cast by piracy”, as expressed by a commenter on my previous post. Nonetheless, AdWords sheds considerable light on patterns of demand. For example, the tools show clearly that it’s common for people to search for movies and TV shows and acquire them extralegally. Also, they indicate that most of the demand, about 82%, for pirated ebooks comes from outside of the US, UK and Canada. Publishers should plan antipiracy strategies accordingly, based on data that can be confirmed independently.

Editor’s Note: Eric Hellman is a technologist, entrepreneur, scientist and writer. After 10 years doing research at Bell Labs, he founded Openly Informatics, a linking technology business that was acquired by OCLC in 2006. Over the last year, he has been blogging about ebooks, libraries, and technology at Go To Hellman. PB

10 COMMENTS

  1. “Also, they indicate that most of the demand, about 82%, for pirated ebooks comes from outside of the US, UK and Canada. Publishers should plan antipiracy strategies accordingly”

    Suggested anti-“piracy” strategy: make it possible for potential customers outside the US (and to lesser extents UK and Canada) to actually *buy* books.

  2. You noted that some percentage of “free pdf” searches might be “people trying to avoid paying for” books.

    “Not paying for” does not equal “pirating”. I have many free books, but I didn’t pirate any of them. I’m sure you understand this, but it’s becoming obvious that many people don’t, so it’s worth making the point.

  3. Thanks for the response to our research Eric.

    The estimate of the number of users searching for the top pirated books per day was computed by buying AdWords campaigns and counting impressions. We did not use the AdWords estimation tools, rather we counted the number of impressions generated during the course of the day for pirated ebook keyword combinations across the 89 top book titles as measured by Amazon. For example, we bought the keyword “lost symbol free ebook” and counted how many impressions were generated by Google.

    I hope this helps clarify how we arrived at our conclusions.

    Thanks

  4. Jim- Thanks for the response. “lost symbol free ebook” is a good example. AdWords suggests a total query volume of 62 per day. If all 89 top book titles do as well, you should have expected 5,500 queries per day, so there’s a factor of about 500 that needs to be explained.

    Over all keywords containing “free ebook”, AdWords predicts 74,000 per day over the entire world. The refinement identifies 1800 of these as clearly not pirate interest and 1400 of these as probably pirate interest. 22,000 of these are searches for places to find generic free content. As Stephen notes above, free ebooks can’t be equated to piracy; I’m reading a free ebook myself- Moby Dick. In my analysis, I credit 40% +/-20% of the 74,000 traffic in my total.

    If Attributor were to publish the list of keywords used in its campaign, we could do a much better assessment of what it measured.

    It seems to me that running an ad campaign falsely offering free ebooks and using it to measure piracy could be like putting dollar bills on the street and counting how many people pick them up to measure the rate of petty theft.

  5. Thanks for the response Eric.

    I think we’re both in agreement that Google’s prediction tool for AdWords is not very accurate and is not a solid foundation for ebook piracy demand estimation.

    This is why we purchased the keywords, ran the AdWords campaigns directly, and tallied the ad impressions delivered by Google for each keyword campaign. This removes any and all need for estimating – it’s the real number of impressions Google delivered, not a prediction.

  6. Stephen, how can you have free books that you did not pirate? Do you only download books that the author is offering for free? If so, then you are right, you are not pirating. But if you obtain them any other way, unless the authors send them to you in exchange for a review, then how did you get them? File-sharing? That’s obtaining books for free from someone else who did the pirating. Thinking of yourself as blameless might assuage your conscience, but if someone tries to sell you a stolen TV, and you can tell by the cheap price and the fact that it’s on the back of an unmarked truck, that it’s stolen, doesn’t that make you an accomplice? Would TVs get stolen if there were not people like you to buy them, in effect, paying the thief to do it for you?

  7. Fiona, I’m a little astonished by that question. Have you not been on the net very long? There are tens of thousands of free e-books available that aren’t pirated, and have been for years.

    Project Gutenberg. Feedbooks. Manybooks. Baen. Cory Doctorow. That’s just for starters.

    Stephen makes a good point.

The TeleRead community values your civil and thoughtful comments. We use a cache, so expect a delay. Problems? E-mail newteleread@gmail.com.