New York, March 21, 2012 -- Can a hosted, cloud-based data analysis service really serve up insight from terabyte-scale swaths of uploaded information in near real-time?
Google thinks so, or at least is giving the premise a good run through. Google's BigQuery, a beta program that lets users use Google's own tools and structural expertise to analyze data was further revealed at the GigaOM Structure Data conference held in New York City this week.
The product manager for the service, Ju-Kay Kwek said the idea grew from Google's accumulated understanding of the Web as its own big data problem. That expertise extends to several products including email, Web documents, and, of course, search.
"As a consequence of Web search apps and email, being able to proactively monitor that and understand a business indicator so we can maintain those applications and continuously improve them requires a certain degree of insight," Kwek said. "How do you get insight on the behavior patterns or even the problems that 200 million users might be having at any given time? How do you understand the features different sets of users are favoring, say in gmail?"
Kwek acknowledged that data transfer rates to the Web are problematic for Google as well as its customers, and that is a focus of great attention. But given the value of big data insight, he says it's a worthy undertaking.
Dialing into what beta customers are actually slicing and dicing, Kwek told GigaOM Senior Writer Stacy Higgenbotham that scale is the real attraction of BigQuery. "I can quickly get a sense of what the application or set of users are doing from that very large set of fine-grained data," Kwek said.
Higgenbotham asked whether customer uploads of huge data files can become practical, given the bandwidth restrictions and time consumption.
Kwek said the project is about tradeoffs. "When you start to talk about data that could get into the multi-terabytes level you have to ask yourself what you want to get out of it."
Kwek said customers working with Google do see benefits in a hosted service with replication, data durability across multiple data centers, security and privacy.
Probably the most important thing about BigQuery, he added, is the use of data in the context of the algorithms and processing capabilities Google can bring to bear on that data. "It's a great way to leverage the power of these operating systems without building a large infrastructure on your own."
The most obvious use scenarios begin in marketing, where customers work with very large amounts of data in global campaigns, multiple languages, hundreds of ad words.
"These people have big data problems especially when it comes time for a quantitative marketer tries to understand the ROI or effectiveness of a campaign," said Kwek. "I want to understand how that works across different regions and that is a big data problem."