China's big data advantage isn't enough
(Bloomberg View) -- The digital world relies on data, and no one produces more of it than China's 1.4 billion internet users. The vast wealth of information these users emit has helped Chinese tech companies become some of the world's best, and led to speculation that China will inevitably dominate future technologies, such as artificial intelligence. But this is almost certainly mistaken. Data, it turns out, isn't destiny.
Even in the digital age, data may have a declining utility. Tech companies already have millions of users. To believe that China will have a significant advantage due to its population size requires us to believe that each additional user adds as much to an informational ecosystem as the first one. If that were true, then India -- with nearly as many internet users as China -- should be just as likely to be the next world leader.
In reality, a data set's usefulness can face diminishing returns to scale. Twitter Inc., with some 300 million active monthly users, faces nearly no data disadvantage against WeChat, with 1 billion users. Although a larger sample size will generally include more significant correlations, the probability of finding statistical significance at 1 billion users but not 300 million is effectively zero. Plus, larger populations tend to be more susceptible to false positives.
Tech companies, moreover, don't crunch data in simplistic ways. They commonly use a technique called clustering, for instance, in which users are lumped together based on their commonalities. Rather than relying on blunt-force statistics, clustering separates people into refined groups that can be more precisely targeted. This leads to more efficiency and better results. Often, the sophistication of the clustering model is more important than the size of the data set.
Perhaps most important, humans still matter. A huge data set isn't worth much without skilled workers to distill it and extract insights. Dating sites are continually updating their algorithms to account for what people say they want in a partner, for instance. But it turns out that what they don't say can be just as important, if not more so. Algorithms are poorly suited for understanding such behavior. Past a certain point, more data is useless without people who can make sense of it.
In this world, China faces some steep challenges.
One is attracting talent. Although China has a burgeoning tech scene, high-level data scientists are in short supply. The Communist Party has been struggling to lure tech workers from abroad, despite offering generous incentives. This is largely due to China's internet restrictions: In addition to blocking popular sites such as YouTube, the government also prevents access to software libraries like Github and academic sites like Google Scholar. Given a choice, top technologists will look elsewhere.
Another concern is privacy. Because China's privacy laws aren't strictly enforced, tech companies can monitor their users intensively, offering them an advantage in everything from optimizing ads to assessing credit risk. As one executive put it, these companies "know where you've traveled, what movies you saw, what restaurants you ate at." This intense surveillance may be a growing liability, however. A significant consumer backlash is building in China, driven partly by ubiquitous fraud and identity theft. And Chinese tech companies are running into stiff resistance when trying to expand into more privacy-conscious markets overseas.
This raises a final concern. Chinese tech firms are largely confined to China, where they're protected from competition. This gives them a dominant market position and other advantages. But a platform that censors searches for Winnie the Pooh simply isn't going to be competitive overseas. Google and Facebook Inc., with much more international experience, have proven adept at understanding a global audience and picking up on diverse socio-cultural norms. Extracting ever more data from local users won't help Chinese companies compete at that level.
Data is the lifeblood of the digital world. But learning how to use it requires human talent, insight and creativity. In that race, China's tech giants still have a ways to go.