Virtualization is a venerable old computing concept that has achieved new life in recent years.
Virtualization brings to life a new world of more flexible service provisioning while cleverly emulating the old world that is being replaced. Virtualization refers to any approach that abstracts the external interface from the internal implementation of some service, functionality, or other resource.
The promise of virtualization is that, no matter how scattered and diverse, all pooled resources behave as if they were a single unified resource, both for usage and administration. In a sense, this is the practical magic that Arthur C. Clarke identified with advanced technology. The external interface may conceal various facts about the implementations of the underlying resources. The virtualized resources may:
- Run on diverse operating and application platforms;
- Have been deployed on nodes in diverse locations;
- Have been aggregated across diverse hosting platforms (or partitioned within a single hosting Platform, either through virtual machine software, separate CPUs, or separate blade servers); and have been provisioned dynamically in response to a client request.
When Noel Yuhanna and I presented on enterprise database virtualization last week at Forrester IT Forum, we took pains to point out that is not a radically new paradigm. In fact, database administrators (DBAs) have been doing virtualization for a long time and not realizing it. Were all familiar with such database virtualization approaches as policy-based server clustering, massive parallel processing database grids, and enterprise information integration. In these environments, you can identify the virtualization layer as single system image, semantic abstraction, or some other approach.
What all these approaches share is that they make two or more repositories behave as if they were a single database for unified access, query, reporting, predictive analytics, and other applications. If you wish, I could drill down further into the layers of database virtualizationdata virtualization, transaction virtualization, and platform virtualizationbut that would be too much for a mere blog post.
One twist that I didnt have time to explore in depth last week is the notion that the traditional hub-and-spoke enterprise data warehousing (EDW) architecture is itself a form of database virtualization. The hub-and-spoke model transforms analytic data to a common spoke-side semantic access model, such as star schema or columnar. As such, this approach abstracts from the data models (usually 3NF relational) implemented at the EDW hub tier, the staging tier (perhaps file-based), and OLTP sources (perhaps hierarchical, XML, or what have you).
When you realize that each data-persistence approach has its optimal deployment sphere, youre thinking database virtualization. At that point, you start to realize that the various database religionsrelational is supreme, columnar is king, and so forthare not absolute truths. Theyre simply sectarian texts in a tradition of longer vintage: the evolution of truly all-encompassing data virtualization clouds.
Yes, Im using cloud in this context because it best describes this new paradigm. Cloud-based virtualization is beginning to seep into analytic infrastructures. To support flexible mixed-workload analytics, the EDW, over the coming five to 10 years, will evolve into a virtualized, cloud-based, and supremely scalable distributed platform.
What are the outlines of this new paradigm? The virtualized EDW will allow data to be transparently persisted in diverse physical and logical formats to an abstract, seamless grid of interconnected memory and disk resources and to be delivered with sub-second delay to consuming applications. EDW application service levels will be ensured through an end-to-end, policy-driven, latency-agile, distributed-caching and dynamic query-optimization memory grid, within an information-as-a-service (IaaS) environment. Analytic applications will migrate to the EDW platform and leverage its full parallel-processing, partitioning, scalability, and optimization functionality. At the same time, DBAs will need to make sure that cloud-based DW offerings meet their organizations most stringent security, performance, availability, and other service-level requirements.
I wont opine here and now on how much enterprise data will be persisted in public clouds vs. private environments that incorporate many of the same platform virtualization technologies. Ill save that discussion for the upcoming Forrester reports that Noel and I are developing in virtualization of transactional and analytic databases, respectively.
Expect those in Q3 or thereabouts. Thanks everybody who attended our preso last week in Vegas!











Noel and you are on to something significant here.
I really like how you have identifed both database access and database management as key capabilities within database virtualization.
In your presentation at the Forrester IT Forum I noted that a common approach to query was a standard element.
And in your multiple virtualizations stack (storage, servers, databases, information, etc.), I like the database virtualization positioning relative to the information virtualization layer above it.
It seems to me, the information virtualization layer can call up the common query capabilities of the database virtualization layer.
And we all know who is the best of breed supplier of the virtualized, federated query services that support both the database and information virtualization layers..... Composite Software!!!
I look forward to your future research on this topic.
- Bob Eve, EVP Composite Software
Plus assuming that data is stored redundantly in these multiple repositories and that we want the cloud to support "operational analytics" in near real-time we now have to figure out not only which of the distributed platforms can serve the request most quickly / efficiently, but whether it has an up-to-date copy of the data requested. And if it doesn't, can we serve the request most quickly / efficiently by queuing the request until the data is available on that platform or by re-directing it to another platform?
So I stand by "complex". And, at least with current distributed database / query federation technology and business processes, I think "fantastic" is also a fair assessment!