If all you have is a hammer, everything looks like a nail. Barnard Baruchs quote resonates in IT organizations that often find themselves solving new problems with their old reliable tools even when more appropriate new tools may exist.
This truism especially applies when it comes to choosing data integration (DI) methods. Physical data consolidation - the combining of related data into a common physical store using ETL tools - is like a familiar, reliable hammer. Data virtualization is like a screwdriver. Which will be the most useful to solve a problem? Like their toolbox counterparts, both are indispensable for solving different DI problems. The challenge to IT comes in deciding early in the design cycle which tool best fits the job.
A Key Capability in Every DI Portfolio
For enterprises around the globe, ETL has been the traditional go to tool for integrating data across disparate sources. However, data virtualization has moved from an interesting new approach to a standard DI method along with physical data consolidation and data synchronization (see Figure 1). Data virtualization brings together (federates) data from multiple, disparate sources - anywhere across the extended enterprise both inside and outside the firewall - into unified, logical, virtualized views or data stores for consumption by nearly every front-end business solution including portals, reports, applications and others. As a project-oriented DI middleware, data virtualization is often referred to as virtual data federation, high-performance query or EII. As enterprise architecture, it is frequently described as a virtualized data layer, an information grid, an information fabric or as data services in SOA environments.
Industry analyst firm Gartner, in its June 2008 report, Survey on Data Integration Practices Shows Move Toward Strategic Initiatives, shows that more than 50 percent of organizations surveyed say they are creating virtual integrated views of data from disparate databases via data federation techniques. Industry analyst firm Forrester Research, in its October 2008 report, Securing Next-Generation Information Architectures, sums up this trend: Nextgeneration information architectures such as data federation and information services are gaining increased adoption.
Figure 1: Data Virtualization within a Data Integration Portfolio
(For a larger version of Figure 1 see PDF below.)
Making the Right DI Decision
Nearly every new solution that IT builds leverages data from existing sources and therefore can benefit from data integration. Data virtualization is a natural fit for many use cases; physical data consolidation is the right answer for others. Sometimes, the best solution is a combination or hybrid of the two.
Recognizing the importance of this decision, DI industry analysts and software vendors, working in concert with data architects and integration specialists, have published useful DI decision-making guidance. Among the offerings available, there are two distinct decision-making approaches: integration pattern matching and integration factor analysis. Both may be applied to individual projects using insight readily available in typical project justifications, higher-level designs and detailed specifications.
Integration Pattern Matching
Using integration-pattern matching, data architects or integration specialists look for matches by comparing their specific use cases with typically deployed DI patterns. Lets consider a few examples:
- If the use case is improved corporate-wide sales reporting, then the appropriate integration pattern would be an enterprise data warehouse that physically consolidates and summarizes sales data from across the enterprise.
- If the need is to provide better supply chain information in order to optimize logistics service levels and spend, then a virtual operation data store that federates up-to-the-minute supply and demand data would be the best pattern.
- To support specialized portfolio analysis requirements by financial analysts in an investment management company, a virtual data mart sourced from the financial research data warehouse would be the best pattern .
In an April 2008 paper, To V or Not to V: Business Intelligence Gets Virtual!, Claudia Imhoff, president of independent industry analyst firm Intelligent Solutions, utilizes this pattern-matching technique to provide clear virtual versus physical DI decision-making guidance. Eight common BI integration examples or patterns are identified:
- Data warehouse
- Virtual data mart
- Physical data mart
- Data mart or warehouse archiving
- Data mart or warehouse extension
- Virtual operational data store
- Physical operational data store
- Multisource operational BI
Integration Factor Analysis
In integration factor analysis, data architects or integration specialists evaluate factors including replication constraints, source system availability and consuming system volume, to determine whether virtual, physical or a hybrid combination is best. For example, if significant data cleansing and transformation are required, then physical data consolidation is typically the most practical choice. On the other hand, where the individual query volumes are in the tens of thousands or less reasonable and source systems are 90 percent or more available, then virtual makes sense.
By assessing many factors, this bottoms-up style of decision-making is often valuable where the DI decision is complex and/or where a clear integration pattern match is not obvious.
One challenge is the breadth of factors to consider. Independent BI expert Colin White was one of the first to publish research on these factors in his November 2005 TDWI report, Data Integration: Using ETL, EAI and EII Tools to Create an Integrated Enterprise. This report listed 25 variables or factors that can influence decisions. For effective decision-making, integration factor analysis must rationalize the many factors into a focused, universal short list, logically grouping the factors in easy-to-understand categories such as source factors and consumer factors.
Beyond the numerous factors, the relevance of each factor may vary from project to project. In one case, source system availability might be significant; in another case, it may have little to no relevance. Integration factor analysis must provide a means of weighting various factors in a range from absolutely critical to not applicable, and everything in between.
Further, some factors are more difficult to evaluate than others. For example, while data volumes and formats can be precisely specified, requirements stability and transformation efforts are often not as easily measured. Integration factor analysis must leave room for imprecise inputs.
Finally, some factors may conflict. For example, achieving rapid time to solution may be difficult when large transformations must be developed. Integration factor analysis must expose rather than hide these conflicts, so architects and designers can address them head-on.
A Practical Tool for Integration Factor Analysis
Architects, designers, integration competency center teams and projects teams should assess and assign relative weighting to each of 13 decision factors (see Figure 2).
Figure 2: Relative Weighting for Decision Factors
(For a larger version of Figure 2 see PDF below.)
- Time to solution,
- Cost sensitivity,
- Requirements stability,
- Replication constraints and
- Organizational personality.
Data source considerations:
- Source system availability,
- Source system load,
- Data cleansing needs and
Data consumer considerations:
- Application focus,
- Data format,
- Data freshness and
- Data volume.
Based on these assessments and weights one of three DI method recommendations can be made:
- Virtual federation (EII) candidate - Virtual data federation and EII middleware is the best method for this specific project.
- Physical consolidation (ETL) candidate - Physical data consolidation and ETL middleware best meets the needs of this specific project.
- Inconclusive or hybrid - This recommendation can mean one of two things: more detailed analysis is needed or, in other cases, a hybrid combination of virtual federation (EII) and physical consolidation (ETL) may be necessary.
Todays enterprises use DI to tap the plethora of data sources and deliver the business-critical information they contain. Fortunately, when building or developing their DI infrastructures, enterprise IT teams have multiple, strategic solutions to choose from: physical data consolidation, data virtualization or a combination of the two. And when it comes to choosing among DI options in their tool portfolios, there are several excellent resources to guide them through the decision-making process as they determine the best tool for the job.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access