A Long Tail of translated e-books?
March 10, 2006 | 5:51 am
By David Rothman
Now that the TeleBlog is multilingual–with crude machine translations available for Spanish, German, Japanese and other languages–I’m starting to wonder what the technology could mean for e-books.
Whether e- or p-, most English-language books never make it outside the Anglo-reading market.
But what if, just like print on demand, you could easily do accurate translation on demand? New technology from Google, reported in the Christian Science Monitor and blog-found via LISNews, could hasten that day. And if near-flawless translation happens, that could open up a host of opportunities for English and non-English books alike. As noted by the Monitor last year:
English rules the Internet, which can be a frustrating thing for the world’s 1.3 billion Chinese and 322 million Spanish-speakers. They outnumber Anglophones. Even online, two-thirds of users speak something other than English at home.
If Google does perfect its translation act, perhaps publishers will have more reason to appreciate its book-search capabilities. You’ll never be able to buy a book if you can’t learn about it in a language you can understand. It’s something to ponder amid all the fiery words exchanged in the Google-related copyright debate.
Multilingual search as a boon for publishers
Far from subtracting value, multilingual searches and generous excerpts could send more money in the direction of publishers. I know. Some American publishers will shudder about all the copyright violations taking place in Third World languages they cannot even understand. But there’s no reason why multilingual search engines can’t help ferret out the offenders. While copyright enforcement isn’t as aggressive in most countries as in the States, publishers will still be better off–given the increased exposure that their wares will be enjoying.
But why should Americans publishers alone enjoy the bounty? More accurate machine translation could mean that people in non-English countries can more easily discover writings that never made it into English in the first place. That has all kinds of economic, cultural, scientific and other implications.
Good or bad for Long Tail?
Meanwhile, with more choices available than ever, thanks to the end of linguistic barriers, the Long Tail will might benefit. Or maybe not. Perhaps aggressively marketed international best-sellers will dominate sales more than ever. We’ll see.
But is a revolution in translation truly ahead? To this layman, the Google makes a good case for such a possibility:
“Nobody in my team is able to read Chinese characters,” says Franz Och, who heads Google’s machine-translation (MT) effort. Yet, they are producing ever more accurate translations into and out of Chinese – and several other languages as well.
To demonstrate the software’s prowess, Mr. Och displayed an Arabic newspaper headline at a recent media tour of Google’s headquarters in Mountain View, Calif. One commercially available MT program translated it: “Alpine white new presence tape registered for coffee confirms Laden.” Then he displayed the translation from Google’s prototype, which made considerably more sense: “The White House Confirmed the Existence of a New Bin Laden tape.”
Of course, every MT program can point to strengths in its approach versus weakness in others’, experts say. The key is whether statistical systems have become powerful enough to outperform the intensive, rules-based systems now available.
“These translations were impossible a few years ago,” Och says. But the advent of ever-cheaper and faster data-crunching and the mushrooming number of online documents have changed the equation. Google has improved the algorithms for its MT program, he says, by feeding its computers the equivalent of 1 million books of text, using sources such as parallel translations of United Nations documents.
So how do you feel about what’s going on?
Disclosure: I own a speck of Google stock.
Also Google-related: Osama bin Laden fan clubs build online communities.



Previous

SUBSCRIBE TO RSS