(Updated March 25, 2011 with additional analyst comments.)

March 24, 2011 – Every federal agency must develop a big data strategy to deal with exponential data growth, according to the report, “Designing a Digital Future” from the President’s Council of Advisors on Science and Technology.

Big data is too big to process by most software tools within a tolerable elapsed time. The collection, management and analysis of data is a fast-growing concern of technology research at a time when data volumes are growing exponentially due to the proliferation of new data sources.

It is estimated that around 1.2 zettabytes (1.2 billion terabytes) of digital data are generated worldwide each year by numerous devices in numerous forms: web logs, sensor networks, social media, telecommunications, astronomical observations, biological systems, military surveillance, medical records, photographic archives and video archives.

“A very big issue that is often overlooked is when we start growing data to that exabyte class, scalability means you can manage that data without an army of DBAs,” says Stephen Brobst, Networking and Information Technology Research and Development group member and chief technology officer for Teradata Corporation. “To me, the automation in terms of managing that data is critical.”

The report’s contributors included the PCAST Federal NITRD program review working group. With the NITRD Program, the federal government coordinates its unclassified research and development investments in networking and information technology. In 2010, PCAST appointed a fourteen-member working group to lead an assessment for U.S. leadership in high-performance computing, science and engineering.

Automated analysis techniques such as data mining and machine learning facilitate the transformation of data into knowledge, and of knowledge into action. The NITRD consensus is that for federal agencies, a big data strategy is an imperative, and Brobst points out that commercial organizations are already refining such strategies. However, the report points out that the majority of private industry R&D is focused on the engineering of future products and product versions, not on fundamental research, emphasizing that that an investment of at least $1 billion annually in the areas of networking and IT in general.

"Commercial enterprises have largely been focused on analyzing customer value to date. Going forward, the emphasis for leading organizations will be to focus on analyzing the customer experience – which will involve huge amounts of big data. The next generation of analytics will be about interactions, not just transactions. Think about click stream data (Web interactions) in addition to on-line purchase data (transactions). Think about the network packets in telecommunications delivery (interactions with the network) rather than a simply analyzing (billable) call detail records (transactions). Think about analyzing social media interactions. Every organization needs to have a big data strategy - and build on it quickly," says Brobst.

Despite the fact that NITRD found the U.S. is actually investing far less in NIT R&D than is shown in the Federal budget, some progress in dealing with big data is noted. The report gives an example of a successful federal innovation in the Community Health Data Initiative, in which The Office of Science and Technology Policy and the Department of Health and Human Services target information needs in community and public health. The effort integrates data sets across a broad range of private and public organizations, from county health departments to patient advocacy groups to social media startups.

Big data consultant Richard Winter suggests agencies deploy a range of analytic database tools. “It is crucial that agencies use the right tool for a given requirement. Only a well-defined, forward looking analytic data strategy will position them to do that.”

Read Information Management's exclusive on ultra large-scale data in military intellingence here.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access