I rely on top newspapers and business magazines for ideas with my weekly IM blog. In fact, I'm not sure how I'd fare without the inspiration from The New York Times, The Wall Street Journal, BusinessWeek, The Economist, and the Harvard Business Review. Each of these publications focuses often on issues surrounding the measurement and analysis of business processes. And each does so with the valued perspectives of both macro forests and micro trees.
The February 25, 2010 Economist revolves on a 14-page special report entitled Data, Data Everywhere, must reading for those looking for the big-picture future of BI and analytics. Alas, I can touch on only a smattering of the covered topics there.
Tech bellwether Cisco estimates that by 2013, 667 exabytes (1 exabyte=1000 petabytes) of data will flow over the internet annually. Retail giant Wal-Mart now handles 1M transactions per hour, with databases sized at more than 2.5 petabytes (1 petabyte = 1000 terabytes). CIO Rollin Ford asks every day: “how can I flow data better, manage data better, analyze data better?” According to IBM historical writer James Cortada, “We are at a different period because of so much information.” Johns Hopkins astrophysicist Alex Szalay opines: “How to make sense of these data? People should be worried about how we train the next generation, not just of scientists, but people in government or industry.”
The big four software giants Oracle, IBM, Microsoft and SAP certainly recognize the challenge, in recent years spending $15 billion on acquisitions of software firms specializing in data management and analytics. And IBM's “Smarter Planet” strategy is backed by a $12 billion investment that includes six new analytics centers and 4000 “quants” hires. Indeed, the information management industry is now $100 billion and growing 10% per year – twice as fast as the larger software market. Craig Mundie, head of research and strategy at Microsoft acknowledges, “The data-centered economy is just nascent.”, even as Google chief economist, Hal Varian, notes that “statistician” is now the sexiest job around.
Having access to all that data, however, is just a point of departure: Mining and analyzing the data – business intelligence – is where the information can be monetized. An early BI leader, Wal-Mart noted in 2004 a run on flashlights and batteries – in addition to Pop-Tarts – in anticipation of a hurricane. Wal-Mart now routinely stocks up on Pop-Tarts for hurricane season. Telecoms such as Cablecom have cut their annual churn from 20% to 5% through analytics. The Royal Shakespeare Company in Britain used analytics on seven years of sales data to divine a marketing campaign to more precisely target its best customers, resulting in a 70% increase in visitors. The University of Ontario, in tandem with IBM, uses predictive models to spot potentially fatal infections in premature babies, monitoring seven streams of real-time data. Medical staff can then detect an infection before symptoms present. “You can't see it with the naked eye, but a computer can.”
Cloud computing and open source software are fueling the data and analytics binge. The cloud allows businesses to lease computing power when and as they need it, rather than purchase expensive infrastructure. And the combination of the R Project for Statistical Computing and the Apache Hadoop project that provides for reliable, scalable, distributed computing, enables networks of PCS to analyze volumes of data that in the past required supercomputers. With the Hadoop platform, Visa recently mined two years of data, over 73 billion transactions amounting to 36 terabytes. The processing time dropped from one month to 13 minutes.
The internet economy is, not surprisingly, at the center of today's super-crunching. Companies like Google, eBay, Facebook and Amazon have long understood that their data and the “data exhaust” visitors leave behind, are competitive differentiators. But the leaders spend little time promoting their successes with analytics. “They are uncomfortable bringing so much attention to this because it is at the heart of their competitive advantage”, says publisher Tim O'Reilly. Web leaders routinely use randomized experiments to test new features on their sites. Amazon and Netflix use collaborative filtering data mining techniques to make recommendations to customers. eBay makes adjustments to its commercial exchanges based on activity, bidding behavior, pricing terms and user visit profiles.
Of course, Google is the leader in mining data, a fact reflected in its $170 billion market cap. According to Edward Felten of Princeton, “Looking at large data sets and making inferences about what goes together is advancing more rapidly than expected. Understanding turns out to be overrated, and statistical analysis goes a lot of the way.” Facebook apparently agrees, regularly using its data to boost usage. “If there are user-generated data to be had, then we can build much better systems than just trying to improve algorithms”, says Andreas Weigend, former chief scientist of Amazon, now a consultant, speaker, Stanford instructor and Predictive Analytics World keynoter.
The emergence of big data and network computing have exposed many of today's data regulations as antiquated, prompting the movement for comprehensive new principles to guide the deluge. First is privacy, especially involving the social networking tensions between the individual and shared information. Providing users with granular control of their data-sharing options is a step in the right direction. Information security is also a mandate. California requires companies to notify individuals of security breaches that compromise personal information. A prescription for annual security audits of large companies might also make sense. The friction on the retention of digital records between the dictum of storing records for only as long as needed versus the social and political expectation that records wills be retained, seems resolving in favor of the latter, for better or worse. Attorney and Super-Crunchers author Ian Ayers frets on the ethics of using analytics findings, or processing the data – e.g. for racial profiling, or predicting one's predisposition to commit a crime. A pertinent regulation might be that an individual's data cannot be used to discriminate against him based on the likelihood of something happening. Ownership or property rights to data that individuals leave as exhaust should belong to the individuals, not the collector. Google's “data liberation” initiative seems a step in that direction. Finally, integrity of information acknowledges the internet as a shared environment mandating international cooperation. The absence of censorship and network neutrality are obvious goals. “The World Trade Organization, which oversees the free flow of physical trade, might be a suitable body for keeping digital goods and services flowing too.”
Steve Miller also blogs at miller.openbi.com.