Data Virtualization: The Unstoppable Force
Information Management Magazine, January 2008
The virtualization revolution is upon us: first storage, then servers and applications, now data itself. Data virtualization, also referred to as a virtualized data layer, an information grid or information fabric, brings together data from multiple, disparate sources - anywhere across the extended enterprise - into a unified, logical virtualized data layer for consumption by nearly any front-end business solution, including portals, reports, applications and more.
Data virtualization is increasingly being recognized as the better way to integrate data when the consuming solutions need real-time data from multiple silos and complex sources.
Advertisement
Enterprises are facing growing challenges in using disparate sources of data managed by different applications, including problems with integration, security, performance, availability, and quality. Business users need fast, real-time and reliable information to make business decisions, while IT wants to lower costs, minimize complexity and improve operational efficiency. New technology is emerging that Forrester has dubbed information fabric, defined as a virtualized data layer that integrates heterogeneous data and content repositories in real time.1
The Case for a New Approach to Data Integration
To keep pace with constant changes in the business, IT has been aggressively delivering new solutions that integrate existing data from a complex, ever-changing infrastructure. Enterprises that limit themselves to traditional data integration methods are less competitive than those that adopt them.
When accessing a few data sources with well-understood syntax and common structures, integrating data using custom code is effective. But, the limitations of hand coding materialize quickly as data silos proliferate, accompanied by new structures (XML or complex syntax such as enterprise applications like SAP), and the data needs of consuming applications become more diverse.
Alternatively, replication-based data integration methods, including file extracts, database replication, data marts and data warehouses, have emerged as an alternative to hand coding. However, replication-based approaches have their own set of limitations, such as:
-
Batch refreshes slow down real-time information delivery.
-
Building and testing extracts and marts add development time to every project, delaying timely business decision-making.
-
Controlling replicated data and maintaining additional physical data stores are resource intensive, thereby exacerbating the data proliferation problem and adding business costs.
-
Only a subset of use cases requires multi-dimensionality and other complex transformation capabilities.
-
Typical replicated architectures dont align easily with modern, real-time service-oriented architectures (SOAs).
Technology advances have opened the door for new data integration methods. Advanced query optimization techniques, combined with low-cost, high-performance server and network architectures, mitigate many of the performance issues that originally motivated replication. Furthermore, server and storage virtualization advancements have demonstrated dramatic cost savings while hiding the ever-increasing complexity of the IT factory.
What is Data Virtualization?
Data virtualization is a new approach to data integration based on virtualized or logical, rather than physical, integration. It leverages recent technology advances and overcomes many of the issues associated with hand coding and replication-based approaches. Enterprises are using data virtualization to gain dramatic time and cost savings for development projects where any or all of the following characteristics are important:
-
Time to solution and frequent change place a premium on agility.
-
The consuming business solution requires real-time insight from fast-changing sources.
-
Data volumes, transformation and cleansing workloads are supportable at run time.
Page 1 of 4.






