In the enterprise IT world, data is stored in a vast array of information repositories, such as corporate databases, enterprise relationship planning systems, customer resource management systems, and many others. Finding data dispersed across these many repositories can be time-consuming and frustrating for end users. As companies deploy enterprise-wide business processes and work to achieve compliance with government and industry regulations, they must bridge the various data islands and provide transparent, secure access to data wherever it is stored.

Over the past couple of decades, a dedicated data integration industry has formed to address this challenge. These companies make it easier to access, search, synchronize, move and manage data from enterprise repositories, typically using tools and services that are installed on corporate servers and move data on corporate networks. Enterprise solutions typically leverage traditional data portability standards such as SQL, ACORD, ARTS, FIX/FPL, OFX and many others.

However, in recent years a new type of application has emerged, which you may know as Web 2.0. These applications use the Web as their essential platform and rely on a variety of Internet-based services - usually from independent service providers - for identity management, single sign-on, data management, content delivery, event management, reliable messaging and more. Accordingly, an enterprise approach simply won’t work, yet Web 2.0 users still need to access data without administrative hassles. Fortunately, new Web data portability standards are emerging to help address these needs.

Web 2.0: New world, New Rules

Today, a broad variety of Internet-based applications such as Flickr and Facebook are now available. They allow users to maintain profiles, participate in communities, create and post content, and perform a wide variety of other tasks. Some applications even mashup business logic and services from different sources. Tim O’Reilly, a well-known Silicon Valley investor and blogger, has defined this new phenomenon as follows:

Web 2.0 is the business revolution in the computer industry caused by the move to the internet as platform, and an attempt to understand the rules for success on that new platform. Chief among those rules is this: Build applications that harness network effects to get better the more people use them. (This is what I've elsewhere called "harnessing collective intelligence.")

He also described the common attributes of Web 2.0 applications:

  1. Don’t treat software as an artifact, but as a process of engagement with your users.
  2. Open your data and services for reuse by others, and reuse the data and services of others whenever possible.
  3. Don't think of applications that reside on either client or server, but build applications that reside in the space between devices.
  4. Remember that in a network environment, open APIs and standard protocols win, but this doesn't mean that the idea of competitive advantage goes away.
  5. Chief among the future sources of competitive advantage will be data, whether through increasing returns from user-generated data, through owning a name space or through proprietary file formats.

Data plays a central role in Web 2.0 applications. Data defines the business model and is the ultimate commodity in the new Internet-based economy. But as the number of sites and data multiplies, the challenges of accessing and managing this data also grow. To truly realize the Web 2.0 vision, Internet-based applications must be able to securely access, share, synchronize and manage data stored in disparate repositories - the same problem enterprise computing has already addressed, but on a whole new platform. Accordingly, we need a new set of solutions.

A basic level of standardization is often the first step and is an important requirement for more powerful data integration services to emerge. In other words, data portability is the precursor to data integration. Today, new organizations and committees aimed at standardizing representation of data on the Internet are already beginning to make data portability a reality.

New Standards Emerge

Dataportability.org is one of the more interesting groups driving data portability today. Originally, this organization focused on putting users in control of their own data, as it accumulated across various Internet-based services. However, the organization now has a broader goal, and champions overall standards for Web data portability. Let’s take a quick look at some of them.

OpenID. OpenID aims to be a free and simple way to use a single digital identity across the Web, and has been adopted by Yahoo, AOL, Google, Microsoft, MySpace, Orange, France Telecom and many other providers. More than half a billion people now have an OpenID identifier, which should further encourage active adoption of this standard.

OpenID does have some issues, including usability: an OpenID identifier is a URL string that can be difficult to remember. It also has many powerful proprietary competitors, such as Facebook Connect. Nonetheless, given its broad audience, OpenID could potentially become the dominant way to represent user identities on the Internet.

OAuth. OAuth is defined as a simple way to publish and interact with protected data on the Internet. For example, an Internet-based application may wish to receive additional information such as date of birth or email address from a user logging in with an OpenID string. It could prompt the user to enter this information into a form, but this approach can be tedious and inefficient - particularly if the same data is already available elsewhere. Alternately, the application could ask the user where this information is located and then ask for a password to access it.

This model is already in use. Social networks often ask new members for an email address and password. This password is then used to access the member’s address book to send out friendship invitations. However, sharing passwords on the Web obviously raises many concerns and risks. OAuth addresses these problems by establishing a protocol for secure data sharing across services. With help of OAuth, a user can authorize one service to access a limited subset of user-related information maintained by another service.

RSS. RSS stands for Really Simple Syndication. It’s a dialect of XML that is often used to share content on the Internet, signified by the orange RSS icon you’ll spot on numerous sites. RSS feeds usually contain a subset of the site content, shown as a list of items with a title, publication date, description, link to the original document and other attributes. Published RSS feeds can be consumed by a broad variety of applications and services such as a portal gadget, feed aggregator, or RSS reader, or even an email client such as Outlook or Apple Mail.

Many readers can automatically monitor updates in multiple RSS feeds from various sources, apply any desired filters and automatically notify users of any changes. Media RSS has also emerged for publishing video and audio content and is used in video search and podcasting. RSS is both simple and powerful and is a seminal standard. It has inspired many other standards, such as CMIS for content integration and SSE for file sharing.

OPML. OPML stands for Outline Processor Markup Language. It’s closely related to RSS, and is often used to share RSS subscription lists amongst readers and aggregators, which typically allow users to export and import RSS subscriptions and Web bookmarks as an OPML file. For example, Google Reader provides these capabilities. Sharing subscriptions and bookmarks plays a key role in collaboration and social networking, as seen by the growth of services like del.icio.us or ma.gnolia. These sites could greatly benefit from standardization offered by OPML.

Microformats. Microformats are a set of simple, open data formats for representing data in XHTML and embedding it directly within Web pages. This information can then be viewed by users in its readable form or leveraged by Internet-based applications and services such as search engines and browser plug-ins. One could say that microformats are the poor man’s semantic Web, providing context to information so computers can use it on an automated basis.

For example, the following information conforms to a microformat standard called hCard:

home:

+1.415.555.1212

While users may only view a phone number, search engines and other applications can automatically recognize what type of information is being provided, enabling efficient reuse of the data.

A Data OASIS

Dataportability.org is just one of many organizations promoting standardization of data on the Internet. OASIS (Organization for the Advancement of Structured Information Standards) is another prominent player and has sponsored many other standards, most notably CMIS or Content Management Interoperability Services.

CMIS is an unstructured content equivalent of SQL, which can be used to query and manage structured data in databases. Unstructured content, by comparison, is managed by content management systems. Despite the maturity of the content management industry, it still lacks a broadly accepted equivalent of SQL. CMIS is designed to fill that gap.

A content management system can be deployed in an enterprise environment and accessed by enterprise applications, such as Documentum or SharePoint; it can also be deployed as an Internet-based service, as seen with Flickr or YouTube. CMIS is a standard interface for accessing metadata and content stored in both enterprise and Internet-based content management systems. It has one domain model but two different API bindings, including a Web services API binding for the enterprise, as well as a REST-based (Representational State Transfer) API binding for Internet-based deployments.

Taking Data to Go

As content and services providers continue to embrace the Web as their platform, the same data portability and integration problems that emerged in the enterprise will appear again. New Internet-based tools and services that can connect disparate data islands and providing transparent and secure access to Internet data regardless of where it is stored will be essential to making Web content and services truly accessible - and standardizing data-related format and protocols will be a critical first step.

Interesting new standards such as OpenID, OAuth, CMIS and others have already begun to emerge. These standards are a huge step forward, but much more must be done to improve these standards and ensure they are supported by new and existing Internet-based services. Until this is accomplished, data integration in the Internet will remain a big issue. However, a big problem is usually a big opportunity in disguise, and many companies will undoubtedly rise to the challenge and fill this important need.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access