Continue in 2 seconds

Federating Distributed Data

Published
  • November 01 1999, 1:00am EST

The business intelligence community has spent the past half-decade evangelizing the importance of consolidating and aggregating transaction data in an analytic data store. Consequently, data warehousing has become a mainstream information technology discipline. Yet new industry forces are already reshaping the types of data architectures companies are building to support analytic applications.

Today there's talk of federated data architectures. Just as a federated government divides power among parties or states, a federated data architecture consists of multiple, distributed data stores with varying degrees of affinity. The goal of a federated data architecture is to provide users with a unified view of distributed data sources.

Interestingly, the data warehouse may be just one of these distributed sources. The federated architecture may also encompass data marts, transaction systems and external data repositories. In some cases, these architectures may also integrate XML content, file systems, document databases and other sources of unstructured content.

Business Drivers

The basic driver of a federated architecture is the growing desire of companies to provide end users quick and easy access to any relevant information they need to perform their jobs. Federated architectures will soon become indispensable weapons to help companies efficiently manage supply chains, interact profitably with customers, deploy e-business strategies, build corporate portals, personalize Web sites and monitor key performance indicators, among other things.

For example, in many companies customer information is scattered across many different systems, including data marts and transaction systems. To provide customers on the Web or calling into a call center with a personalized experience and highly customized cross-selling offers, companies need to dynamically create a customer profile from distributed sources. Once the profile is created and held in memory, the application can apply predefined business rules to generate the personalized offer or telemarketing script.

Although it would be easier if all customer information was consolidated in a single system, this is often not the case. For many companies, it's often physically impossible and organizationally impractical to aggregate the necessary data in one place. It can be too costly and time-consuming to create a new data warehouse or data mart to support each new application or end-user request for information.

In addition, companies want to leverage existing investments in systems rather than build new ones that often duplicate data that already exists. Moreover, different systems are designed to handle different workloads. For example, an operational data store is best used to support real-time analytic applications, not a data warehouse.

Federated Data Warehouses

In some respects, federated architectures are not new to data warehousing users. In the early days of the data warehousing movement, we used the term "virtual data warehouse" (or VDW) to refer to a federated architecture in which operational data was consolidated and aggregated on the fly, rather than in batch, as is commonly done now. The VDW concept was universally denounced because of its performance and scalability problems, its impact on operational systems and the difficulty of performing complex data integration on the fly. The VDW acronym quickly came to stand for "Voodoo and Witchcraft."

Today the most commonly espoused data warehousing architecture is one that federates data marts, not operational systems. In this scenario, a central data warehouse or staging area spawns multiple, dependent data marts. Each data mart essentially provides a different view of the data warehouse. In a slightly different twist, author Ralph Kimball's "dimensional bus" approach federates data marts by defining standard definitions (i.e., conformed dimensions and facts) that are shared among the data marts. This provides the basis for letting users query across marts without building a central data warehouse or staging area.

Three Approaches

But the new class of business applications mentioned earlier requires companies to treat data warehouses and data marts as one set of resources among many. To federate multiple, heterogeneous data sources, companies need new tools and techniques.

nQuire Software

Startup nQuire Software recently released nQuire Server Suite 1.0 to let end users transparently query structured data resources, including data warehouses, data marts, transaction systems and XML stores. nQuire's stated goal is to become the universal search engine for structured data.

For example, nQuire Server Suite would make it possible for a CFO to get accurate, aggregated financial results the week before the end of the quarter in response to inquiries from Wall Street analysts. The nQuire engine would generate the answer by joining and calculating data pulled from three sources: a financial data mart that keeps monthly financial summaries, a data warehouse that keeps weekly summaries and an accounting system that manages daily financial data.

nQuire's GUI provides a point-and-click interface for submitting complex queries against a logical business model that represents distributed data sources. The nQuire Server is fundamentally an analytic engine that employs sophisticated distributed database management techniques and a user-centric business model (i.e., semantic layer) to give users a unified view of distributed data. The nQuire engine does as much processing as possible in each database to minimize the amount of data shipped across the network. In most cases the databases ship the data back to the nQuire engine, which performs the joins and applies any needed transformations and calculations.

Cohera Corporation

Another tool that works similarly to nQuire, but with significant differences in philosophy and approach, is Cohera Corporation's Data Federation System (DFS). Founded by Michael Stonebraker, Cohera commercializes research work begun at the University of California at Berkeley to develop a scalable, truly functional distributed data management system.

Like nQuire, Cohera's DFS provides a unified view of distributed data sources, including SAP R/3 applications and HTML pages but is more of a distributed query optimizer than an analytic engine. And Cohera takes federation to the extreme. Using a political analogy, Cohera would advocate that the U.S. return to its original Constitution ­ the Articles of Confederation ­ which gave substantial power to individual states to govern themselves.

By contrast, nQuire's concept of federation more closely resembles the current U.S. Constitution, in which a central authority makes demands of local systems. In addition, while nQuire performs most of the joins and calculations in its analytic engine, Cohera's DFS distributes all query processing to remote database engines and its middleware.

In general it seems that nQuire is geared more toward resolving analytical queries that involve on-the-fly aggregations and calculations, whereas Cohera is geared toward record-oriented queries that can be resolved with SQL. Whereas nQuire might support a trend analysis using entirely derived data, Cohera might create a composite customer profile using data elements pulled from multiple systems. nQuire is more user friendly since it entertains queries couched in business terms, whereas Cohera uses SQL statements as its starting point.

Conclusion

Building applications using a federated data architecture is a new endeavor. Fortunately new tools, technologies and techniques are emerging that are attempting to support applications that must span and integrate distributed data sources. Whatever approach or technologies these tools use, it's clear they will play a vital role in supporting a whole host of next-generation applications that rely heavily on integrating information from diverse sources.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access