There is reason to hope and believe that the last few years have given rise to a new era of maturity in data and information management. The early limitations of hardware and networks, and ensuing challenges of application and data integration, are largely conquered (or at least understood) for now. CPU and storage costs are less and less a barrier to any project, and analytic tools have matured.
Now, organizations are turning their attention toward making information work.
We're seeing this in widespread uptake of data governance, where business rules, data quality, usefulness, context and access are becoming policies that allow people to more easily sort through data.
The more data is understood, the better it can be organized and packaged into services, in the enterprise or on the Web. And, with the connectivity of broadband HTTP and open APIs, the utility of software as a service and cloud infrastructure become basic components of a new transactional data economy.
Today, startups and traditional data providers are busy thinking up business models for information services. Brick and mortar institutions are also examining their data stores as assets to be resold.
There is a ready client base - especially sales and marketing teams under competitive pressure and looking to find valuable information in someone else's lists, contact trails and transaction records.
"The last two decades have been about getting access to data but we haven't put it in enough hands to change the way we conduct business," says Dana Gardner at Interarbor Solutions. "A big part of Web and cloud computing that doesn't get talked about much is that it enables a kind of data economy where you access data, pay for it and use it in a system that has value and a currency to it."
Indeed, a new era of Web data services that are easier to find, buy or sell is coming.
However, it arrives with a downside. The order we've worked so hard to achieve, which is enabling this new information economy, can quickly unravel.
Every new data service is a new silo of information to be judged and matched to other data, meaning that a lot of governance lies ahead.
"The nice thing about Web data services, SaaS and cloud is that it's easy, it's relatively inexpensive to create or consume and it's immediate," says longtime BI analyst Howard Dresner. "Business doesn't want to wait until next quarter, and IT is gravitating this way too because they have only so much budget and so many people."
But Dresner and others add that outside the four walls and partner channels we lose our grip on governance. Networks of services such as Salesforce.com can predeliver integration and quick usability, but don't assure quality of data by an enterprise standard. By a Google standard, quality on the Web is undefined and judged by popularity, not by whether the information is true or useful.
There is expectation, demand and confusion on all sides about how we will manage these issues, orchestrate services or understand their worthiness. We don't know how at-large information will compare or intersect to business intelligence and data quality as we know it. But we've already begun to treat the Web as its own integration platform.
The drive to governance arose when corporations were already less vertically and more virtually integrated across divisions, lines of business and scattered projects. The spread of data silos within organizations came to be managed with conventional connectors, and increasingly, Web service languages, protocols and application interfaces that break silos of data into chunks of content that can be mixed, matched and reused as feeds.
The same protocols and services benefit any number of nimble service and information providers across the Web. A mostly-undocumented trend has been the rise of Web traffic that runs not through browsers, but through application programming interfaces. APIs serve as a conduit for developers to extract data into applications, and for gatekeepers to regulate traffic loads, control access permission and the rights to access data. Thusly, a developer accesses or subscribes to services to build a desktop or iPhone application that might call on airline reservation systems, weather reports and hotel booking systems all in one view. As the ad says, whatever you need, there's an app for that.
Perhaps most important, APIs provide a point of transaction that might be free or paid. John Musser runs a site called programmableweb.com, which documents API and mashup releases. In four years of tracking, the biggest shift he's seen is the movement of APIs as a consumer phenomenon to something more tantalizing to the enterprise.
APIs are often built in house but increasingly come from third-party service providers. There are hundreds of examples. The U.K.'s Guardian newspaper uses API software-as-a-service from a vendor called Mashery to strategically resell content to hundreds of secondary news outlets with unique interests. Retailer BestBuy uses API services from the same vendor to distribute product specifications and pricing to resellers, and looks at the behavior of API traffic to measure whether price, newness, descriptions or pictures of a product are the best way to lead a campaign. Whatever bit of information most leads a customer to the "buy" button is what BestBuy tells resellers to feature first.
Traditional lookup services like credit report bureau TransUnion are using Web services and APIs to build new businesses on top of existing data. Scott Metzger, CTO of TransUnion's partner facing Interactive division, uses API software as a service from Sonoa Systems to expose and sell credit report information to banking and other financial institutions that pass along or resell to their own customers. TransUnion Interactive benefits by being freed of the responsibility of building applications or Web sites for its partners, and by studying the value of different data services as they are consumed through the API.
Partners benefit from TransUnion data based on their own approach to market. "It might be credit management, identity theft protection, lead generation, there are a lot of variations on the theme," says Metzger, and he monitors API traffic to improve his services.
The actual credit information is regulated and remains secure behind the company's firewall. TransUnion's enterprise infrastructure was built on a service-oriented model and first used Sonoa for internal application interfaces. Now Metzger uses the API interface as a "policy layer" to meet regulations, enforce compliance with the terms of the agreement with the partner and maintain service levels. TransUnion Interactive uses Tibco Spotfire to aggregate large data sets, confirm expected patterns or nail down the cause of unexpected ones.
The intrinsic value of Web data services, Metzger believes, is the ability for any organization to reuse the core capabilities they've developed over the years. "With the economic climate of today, companies are looking at how they can monetize the assets they already have instead of doing high risk R&D on new products."
There is no aggregate market data on API volume, but Salesforce and NetSuite report a majority of data transactions are now consumed through APIs rather than browsers, and other providers report the same. Anecdotally, Zoho desktop productivity software evangelist Raju Vesegna reports that API transactions for data stored by his company will increase from 50 percent to 80 percent of the total in the next 12 months. And Omniture, which holds sway in the analytics space for Web site reporting, launched a new service for monitoring API transactions as recently as October.
Every SaaS vendor and application that doesn't have an API today soon will, Musser says, because that's the way data becomes glued to other cloud or enterprise apps. "Just as every enterprise platform has had something you could build on for the last 20 years, whether it was Oracle or Windows, that will be true for every cloud platform because APIs are the glue of SaaS."
Web Data Services
No single boundary defines what a Web data service provider is, but all allow interaction with data streams directly or indirectly. Some originate proprietary feeds, some redistribute those feeds and some build software and analytics that gather competitive information or scour social networks or blogs. Still others offer cloud infrastructure and analytics to sift the data gathered from any source. Most can be used inside or outside the firewall.
It's an emerging and vibrant data marketplace almost too wide to document. There are intermediaries like StrikeIron that work with traditional look up services and sell metered access to D&B or address verification on a per-API call basis. InsideView aggregates primary sources from LinkedIn to Thomson Reuters to integrate contact info, management changes and product news into its customers' CRM applications. Some startups, like Jigsaw, create value based on the data contributed by members and make their money on enterprise licenses. (See story sidebar "The Communal Data Service" at the end of this article. -ed) Another startup called GoodData provides a SaaS analysis tool and cloud infrastructure for clients to upload and crunch Web site data unsuited to their own databases or data warehouses.
"Sometimes you bring your own data, skills and technology to the surface," says Phil Wainewright, a London-based consultant. "But just as importantly, there are a lot of opportunities for a startup to make money by chasing money back to someone else."
And while API traffic looks like a bellwether of a data service economy, APIs are not required to gather and parse the majority of content on the Web that exists in unstructured form. Connotate, Mozenda and QL2 are prospering with screen-scraping services. Another vendor called Kapow expects that refined robotic scraping of content already outweighs the benefits of programming interfaces. (See story sidebar "Point and Extract" at the end of this article -ed)
By redirecting data, brick-and-mortar businesses can move from a product to a service model. Jeff Kaplan, founder of consultancy THINKstrategies, thinks of payroll processor ADP as an early example of a mainline business that morphed into a multifaceted SaaS model.
"Look at ADP and ask yourself, is this a software company or a business service company or an information service company?" To Kaplan, it's all three, a software-enabled business service and information provider. Along with regular payroll, ADP has added insurance, retirement and standalone services tied to its records that help HR departments manage and retain unique employees and look at new staffing needs.
"When you add insight to information, that can change the way people behave, and now you're talking about exerting influence in the marketplace." That's powerful stuff, Kaplan says.
The more services are aggregated, the more they become units of currency in a data economy. And services that can be packaged or bundled offer increasingly complete views of information to customers. Salesforce.com originated as a CRM application but is now an integrated network of more than 1000 applications and services that dwarf its original mission and value.
Data services that are integrated to enterprise applications reflect the most powerful form of data service usefulness. "A SaaS provider like Salesforce.com is a classic example, because the Salesforce API brings integration into all of the intelligence sources in an enterprise," says Musser. That includes HR, BI and transactional systems and explains why the API Web integration is displacing browser traffic.
"The most relevant thing in the cloud are SaaS enterprise apps like Salesforce because the already have a schema and a database that's organized and not freeform," says Forrester analyst Rob Karel.
While Salesforce or NetSuite are traditionally products for the small and midmarket, their effect is not lost on enterprise software vendors. Integration platform provider Informatica touts an ability to incorporate external data feeds in the latest release of its platform. SAP's new BI Explorer tool appeals to users of external databases; a research note from Saugatech says SAP's unsubtle reorientation to cloud technology is "now central to the company's future success."
Don Campbell, IBM's CTO for analytics and performance management, sees opportunities for IBM to be involved in the bigger picture of information commerce. "New uses for data have brought change to companies that produce, package, massage and sell information. We would like to be able to provide a platform that can consume your own data along with those outside sources and aggregated feeds." Combining outside operational feeds with historic patterns and time dimensions, Campbell says, is a big trend in predictive analytics as a way to drive business forward.
Governance and Risk
Opportunities abound, but what lies in the gap is data quality and governance. With a growing abundance of discrete data services on the Web, companies are gathering and using information from many more sources, some trusted and others less so.
Going forward, business will need to add risk management to the governance of data that no longer comes only from internal resources. "We almost had governance roped in and now we have a whole new Wild West and we have to go back to the drawing board in terms of discipline," says Saugatech Technologies VP Mike West. "Is information authentic? Is it valuable? What does a lifecycle of information look like now?"
For many feeds, governance and data quality are already implied by provenance. Legacy providers, Dow Jones, D&B, Equifax et al, have earned roles that were once managed within enterprises by providing information that is current and comes with a service level of authenticity. But even that has raised confusion over who really owns the golden record of corporate trust in data.
Flip a coin, Karel says. "Many organizations use D&B as the record of truth and many others use D&B to validate their own database. Before I was an analyst, I was an end user and I've gone both ways myself."
On the frontier of social and contact media, companies have to manage risk in data they can't control, but still need. It is difficult to merge the idea of "crowd wisdom" with our definitions of data quality, but any educated stock investor would agree that momentum (eyeballs) and fundamentals (quality data) are both valid strategies.
Campbell sees value in "gray" data that needs to be scored. "I know I can't trust information from Wikipedia, but odds are there is some truth there and if I ignore it I'm just hurting myself."
That is ready justification for risk analysis of curves of likelihood for different information scenarios. "The CIA has quality assessments assigned to any information they capture," West says. "'Something' is better than a rumor, so you need to think about a level of quality that creates a trigger for a decision."
There is no doubt that analytics can importantly advance our understanding of at-large data. As information economies develop toward identifiable standards of accuracy, Gardner and many others believe strongly that analytics will be the best answer available to rationalize a sample or universe of data. "Something that is not scientific is nonetheless reflective of a trend or zeitgeist [between] behavior and reality."
Standards for a Data Economy
When more networks of data become economies, companies will look for more metadata to segregate types of information and their relative merits and quality. "No one is suggesting we're going to replace our legacy systems and data farms," Kaplan says. "But you want to ask how effective you have been with your own data silos as well."
This is precisely why specialized service vendors and networks have emerged, since no organization can hope to close all the gaps or process the 15 petabytes of new information IBM estimates are created each day.
It's still garbage in, garbage out. "I hope people are taking a risk approach because, yes, things will get better, which means you'll have to revisit the bets you make internally and with others," Dresner says.
But like the dial tone of a telephone, infrastructure is always receding to a presumption of connectivity and trust. We know standards have arrived when we no longer pay attention to them. The toast always fits into the toaster and the toaster plugs into any wall socket.
When data economies are reality, they will likewise come with assumed connectivity that lets information producers and consumers focus on the data, not the details.
And while standards of dependability continue to mature, the market will set the rules and we'll judge the interim value of data in part by what we are willing to pay for it. Whether it's a pig in a poke or a highly regarded source, there is every reason to expect we will talking more about the value of information and less about the details of infrastructure that support it. The Web as integration platform has arrived and with it, an economy of its own.
(STORY SIDEBARS REFERRED TO IN ARTICLE FOLLOW -ED)
The Communal Data Service
The ability to monetize a data feed can arise from dedicated users with a common interest. A service provider called Jigsaw brings a tribal approach to managing business contacts and prospects, backed by five years of growth and $18 million of venture funding.
"The way we aggregate our database is through community, over a million registered members at Jigsaw," says CEO Jim Fowler. "But we make our money by cleaning databases for enterprises with our data as a service product, DataFusion, where enterprise customers, if they share information, can lower cost further."
The premise of Jigsaw is that it's folly for an individual or an enterprise to maintain a current list of clients and prospects. So individual members trade new update information for new contacts, and enterprises pay by the seat for access to 3.6 million company records and 18 million contact records, a list that grows 25,000 records per day. Data records of enterprise customers are batch cleansed nightly and those that open their own records to Jigsaw lower their monthly $99 per seat cost to $79. Of roughly 100 enterprise customers signed in the last four months, 40 percent share their information with the greater Jigsaw database.
The goal is higher data quality and more complete records. All Jigsaw records come with an email address and 70 percent come with a direct dial phone number, the highest standard in the industry, Fowler says. "We're hitting the point where the value of data is exceeding the value of software. And going forward, the systems that understand changes to data are the ones that will win."
Point and Extract
A longstanding method of aggregating unstructured streams of data is through robots that search and scrape content from Web browser views. Kapow claims to have taken screen-scraping technology to a new level with extraction of any source of enterprise or Web data through a proprietary point and click browser interface.
"Early on we were trying to catch the wave of mashups, but it felt like old IT because of the code you had to write and maintain for APIs," says Kapow marketing VP Ron Yu. "Plus, new applications are coming out faster than APIs, and we want to automate data extraction inside and outside the firewall."
Kapow has many named corporate customers but can only speak about a few under non-disclosure agreements. One it can point to is Fortune 500 financial service tech provider Fiserv, documented in a 2009 case study published in the Journal of Corporate Treasury Management.
Fiserv had tried and failed to create a compliance dashboard to track more than 10,000 treasury transactions per day among the top global 300 banks. Using a login and password to open a view-only interface to secure bank Web sites, Fiserv used Kapow, an Oracle database and a dashboard view from Corda to successfully create a 10-bank proof of concept in three weeks, and a full production application within three months.
"Banks have many layers of policy for security and no one is going to let you punch a hole through their firewall to the application," Yu says. "Kapow, gave them a way to automate access to transactions."
By such an example, Yu hopes point and click Web data collection will be an effective alternative to traditional SQL programming and ETL tools, especially when the focus is on operational data. "The problems with data quality arrive when you're moving data back and forth and asking what the system of record is. Why not just integrate two Web applications so you can skip all that?"