Data Virtualization: The 13th Commandment
It has been 30 years since Edgar "Ted" Codd published his “12 rules defining a relational database." Those 12 Commandments, as they became known, actually include 13 rules (0 through 12) and describe the need for logical-physical independence in data management systems. But where do modern information architectures -- particularly data virtualization -- fit into the conversation? Here's the answer.
These days, it is virtually impossible not to be familiar with the idea of abstracting lower level resources and exposing them through software as higher level logical resources. We are seeing that approach successfully put in practice every day. Not just in DBMS but in many other areas -- from Java virtual machines to virtual LANs, or from network function virtualization (NFV) to hardware and storage virtualization just to name a few.
But how could Codd’s proposition of logical-physical separation in DBMS be transposed to 2015’s information architectures? The rule was originally formulated as follows: “application programs should remain logically unimpaired whenever any changes were made in either storage representation or access method.” At that time, data was stored in a centralized manner in disk drives, and Codd was mainly concerned about the DBMS isolating applications from things like different underlying storage structures, such as ordered flat files or B-trees, or perhaps from physical changes like table files being moved from one disk drive to another or a reordering of rows in a file.
Fast forward three decades and we find that enterprise data is more of a polymorphic and disperse asset, and are orders of magnitude larger and of course no longer confined exclusively in a bunch of centralized relational databases. Today’s enterprise data is the sum of a complete ecosystem of specialized data systems scattered across the organization, living either on premises or in the cloud, accessible in different ways and each implementing specific and optimized data representation paradigms. Such a level of heterogeneity demands a clear separation from the way applications logically consume it. Even though I have shifted in my argumentation from a single type of storage to a broader data architecture level, the analogy is still pertinent and it’s easy to establish a parallelism with Codd’s concerns regarding abstraction. Now, the kind of “physical” or “lower level“ changes that should not impair consumer applications or processes may be:
- The replacement of a RDB backed ERP system with a cloud based one that might only be accessible through a JSON API.
- The offload of “cold data” from an Enterprise Data Warehouse to a low cost storage system like Hadoop.
- The addition to the data management environment of a live clickstream data feed to enable analysis of customer behavior.
- The introduction of a graph database that models the interactions of your customers on social networks.
This is only the tip of the iceberg as any data architect or CDO can easily add to this list based on their own experience. The need for a certain degree of abstraction has become more acute in today’s complex data ecosystem than in the good old relational-only example of 1985.
How organizations provision this logical-physical independence will take them down at least three alternative paths:
- Ad-hoc and Point-to-Point Integration: This hardcodes the interface between producers and consumers, inevitably introducing inconsistencies, huge maintenance costs and leading to “spaghetti style” non-architectures.
- Traditional Data and Application Integration: Solutions like Extract-Transfer-Load (ETL) or Enterprise Service Buses (ESBs), which lack the agility and the flexibility required by the highly dynamic nature of the requirements from both the business and the technology.
- Modern Lightweight Data Integration: Middleware like Data Virtualization that enables not only the required abstraction but also introduces agility, and maximized usage of all enterprise data. Data Virtualization brings in flexibility and efficient usage of the virtualized resources by leveraging existing IT infrastructures and in some cases even repurposing and enhancing them in ways they could not work before. Modernization of legacy apps rather than rip and replace.
So, what does this mean in a real-world scenario? Take, for instance, a network management division of a mobile operator that is looking to rollout 4G on their mobile networks and upgrade their transmission networks to all-IP. By leveraging Data Virtualization the organization can access and combine:
- Live and historic hourly subscriber usage statistics from their base stations.
- Live performance and status from the routers, switches, leased lines monitoring equipment, etc.
- Live unplanned downtimes in the network due to equipment failures through the event and alarm management system.
- Planning information for ongoing maintenance being regularly carried on their network that could potentially clash with their rollout/upgrade and generating unexpected side effects.
- Public meteorological information.
The operator can come up with the most efficient plan for the installation of the 4G and IP-enabled equipment and change -- or even abort -- in real-time if unexpected failures in the network or severe meteorological conditions require. More importantly, they can also, minimize disruption to their services and the impact to customer experience.
The mere possibility of data-driven planning and operation in such a complex and dynamic scenario is a great example of how Data Virtualization can provide the required abstraction while leveraging the horsepower in existing specialized data management systems and applications. Ultimately, it delivers standardized data services for both analytical and operational purposes. .
Similarly, the fact that the same Data Virtualization platform is also serving data for reporting on the progress of the whole rollout/upgrade process, is a fine example of how Data Virtualization introduces physical/logical separation and decoupling that enables both real time and batch data consumption of the same underlying data assets in a coherent and efficient manner.
Data Virtualization Goes Mainstream
The need for decoupling and physical/logical separation is driving Data Virtualization into the mainstream. Today’s modern organization need to leverage live data both for analysis and reporting or even to inject real-time analytics in operational applications. Only those who succeed at capitalizing on all their data assets in this way, will be able to react quickly and insightfully to changes and this will give them competitive advantage. Data Virtualization is becoming a key element in modern data architectures at organizations of any size and in any industry, driving maximized usage and leveraging of all data assets.
If Ted Codd was to add an extra rule to his 12 Commandments for 21st century information architectures, it would be that Data Virtualization is the natural enabler of the logical-physical data independence.