Exponentially growing data sources and the information they contain promise to bolster our professional decision-making on both a day-to-day and a long-term strategic basis. Yet, data professionals in large enterprises and government agencies face increasingly difficult data integration challenges, including:

  • Constant business change necessitating rapid IT response that includes changing information requirement specifications;
  • Growing data volumes and complexity that increase business risk and reduce agility; and
  • Financial constraints necessitating cost-effective IT solutions that meet business objectives and not simply “technology for technology’s sake.”

Traditional approaches such as data consolidation and replication alone have not kept pace. As a result, federated data models have evolved to complement these investments and fill the gap.

Data Federation at a Glance

Federated approaches to data integration, also called data virtualization, take an integration approach that combines data from multiple, disparate sources – anywhere across the extended enterprise – in a unified, logically virtualized manner for consumption by an array of transactional and business intelligence applications.

Frequently deployed at two levels, data federation typically complements existing data integration methods such as consolidation and replication. At the project level, data federation virtually integrates the data required in support of a specific application or use case. On an enterprise level, it is typically implemented as common services or as a loosely coupled data abstraction layer to share data from multiple sources across multiple applications and uses.

Common Data Federation Usage Models

IT teams across a diverse set of industries including financial services, manufacturing, energy, telecommunications, media/entertainment, pharmaceutical, health care and more have deployed a federated approach to data integration at both the project and the enterprise levels. The aggregated experiences of these enterprises may be summarized and characterized by five common data federation usage models. Enterprise architects and data processionals seeking to integrate disparately located and sourced data more rapidly and efficiently should consider applying one or more of the following five models to their own information systems:

  • Project-level data federation,
  • Data warehouse extension,
  • Enterprise data sharing,
  • Real-time enterprise data infrastructure, and
  • Cloud data integration.

Project-Level Data Federation

Project architects typically begin data federation at a project level, after evaluating a number of business, data source and data consumer considerations. These include, but are not limited to:

  • Time-to-solution: How rapidly does the business need a solution? For rapid turnarounds, data federation has the edge. For solutions requiring significant dimensioning of data, physical consolidation is the stronger, albeit longer-to-deliver choice.
  • Resource allocation: What is the size of the budget? How big is the staff available to support the creation and maintenance of the data infrastructure? For projects without sizable budgets, data virtualization offers a high ROI and typically requires fewer staff members to deploy and support.
  • Risk tolerance: What is the potential for building a solution that meets current data integration specifications, but isn’t sufficiently agile to meet unexpected changes that are likely to occur soon after deployment? For projects where little change is anticipated and the risk tolerance levels are high, physical consolidation data integration may be the preferred choice. For enterprises where data sources and information requirements change frequently and the risk tolerance is low, federated data integration is the best choice because it offers the most agility.

By taking a federated approach to data integration, architects and developers have a flexible palette of low-cost, rapid-deployment integration options including: federated views, data services, data mashups, in-memory and database caches, virtual data marts and virtual operational data stores.

Data Warehouse Extension

Supporting critical yet ever-changing information requirements in an environment of ever-increasing data volumes and complexity has and will continue to drive the demand for enterprise data warehouse-centric solutions. However, business change has begun to outpace EDW evolution, with significant volumes of enterprise data residing outside the EDW, due to increasingly distributed business value chains, greater workforce mobility, cloud computing and more.

Data federation preserves and extends EDWs through a range of flexible data integration techniques including:

  • Integration with external, intra-day or detailed data,
  • MDM hub extension – 360-degree view,
  • Data warehouses federation,
  • Hub and virtual spoke,
  • Enterprise architecture,
  • ETL sources extension,
  • Data warehouse prototyping, and
  • Data warehouse migration.

By integrating information from outside data sources, data federation avoids the costs, potential risks and time required to modify and support ETL and data warehouse structures.

Enterprise Data Sharing

Enterprise data sharing is becoming increasingly popular in enterprises and government agencies where various teams share an assortment of analysis and reporting tools to access and analyze large amounts of diverse data across disparate sources and multiple geographical locations.

To enable enterprise data sharing and overcome the complexities inherent in multiple geographies and data sources, federated data integration leverages service-oriented architecture principles including abstraction, decoupling of data from its sources, reuse and more, while also supporting a range of internal and industry data standards.

Successful enterprise data sharing patterns enable a phased adoption that may start with the creation of individual data services and evolve to enterprise-wide architectures, including:

Shared data services,
Data abstraction layer,
Standards-compliant data services and
Data virtualization competency center.

Real-time Enterprise Data Infrastructure

Data integration has become increasingly difficult where real-time data is essential, particularly in industries such as broker/dealers, process manufacturers and power (energy) distribution. Real-time enterprise data sharing requires multiple, robust data integration capabilities including data federation, change data capture messaging and data consolidation.

In a typical real-time data infrastructure, as transactions occur, change data capture immediately offloads data from the transaction systems to the operational data stores. Data services built and run with data federation middleware deliver this data to downstream consumers using both push and pull techniques. By providing a range of caching techniques (e.g., distributed and centralized, batch and incremental, in-memory and database), and by working transparently with enterprise service bus messaging middleware, the latest data is sent where it is required. This capability is especially useful in the financial services industry, where seconds can be the difference between a trading gain and loss.

Cloud Data Integration

With the advent of software as a service, platform as a service,and infrastructure as a service, cloud computing offers powerful computing resources for attractive, pay-as-you-go prices.

However, leveraging these capabilities greatly increases data integration complexity. Equal to a new data silo, every new cloud source must be integrated with existing on-premise information sources. Further, cloud data integration requires new integration methods not typically supported by traditional direct database queries and ETL scripts because of a variety of technical issues pertaining to the firewall, security, application programming interface, on-demand versus batch and more.

A more virtualized integration model is especially well suited to solving these technical challenges and thereby enabling enterprises to take advantage of powerful cloud economics sooner.

Getting Started

While any of the federated data models outlined in this article may be adopted independently, most enterprises have started with project-based data federation and/or data warehouse extensions. These two models align most easily with existing data integration approaches and typically deliver the quickest ROI. Enterprise data sharing is frequently the next step and is typically aligned with SOA or industry-standards adoption initiatives. Real-time data infrastructures apply most directly to enterprises where real-time data strongly impacts corporate success. Although receiving much ink in today’s trade press and gaining popularity, cloud computing is still far from mass adoption. Therefore, federated data integration deployments typically follow cloud adoption once the on-premise data integration impact and technical challenges become more apparent.

Traditional approaches to data integration alone are no match for today’s increasingly complex information environments and accelerated business pace. Meeting business objectives coupled with the necessity for more agile solutions that adapt as information requirements change, enterprises across a broad range of industries are deploying more virtualized, federated data integration solutions to extend and preserve their sizeable data warehouse investments. By applying one or more of the five most successful federated data models presented in this article, enterprises can accelerate their data integration efforts, reaping ROI more quickly as a result. 

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access