Data Virtualization and the Four Missions of a CDO

Register now

Today, as organizations place more value in data as a strategic asset, the chief data officer (CDO) is coming into the limelight.

But the role of the CDO is not an easy one. The CDO is first and foremost charged with the business mission to leverage strategic data assets to enable a digital transformation of their business, enhance customer relationships, and perhaps even to use data to change the basis of competition in the industry.

In addition to the strategic remit, the CDO has several functional responsibilities, including governing data assets, enabling business agility through self-service for users to gain more value from the data, and reducing costs by leveraging new technologies and the cloud.

This is a multi-faceted role, which will change depending on the needs of the organization and the chosen strategies of the individual CDO. However, across the board, a CDO’s initiatives can be streamlined - if not expedited - by data virtualization, which enables seamless access to disparate data sources through a logical data layer, as well as detailed control over these sources.

Today’s CDO falls into four broad archetypes, based on their primary mission:

The Big-Data Revolutionary, who transforms the business through data-driven insight from advanced analytics.

The Friend of the Business, who enables business agility and self-service.

The Data Governor, who turns chaos into order.

The Efficient Data Operator, who reduces cost and complexity.

While these archetypes will all overlap to some degree, they are the key roles or missions that the CDO is expected to lead. If you are a CDO, perhaps you recognize one or more of these descriptions in your priority agenda? Let’s explore how data virtualization can streamline key objectives in each area:

The Big-Data Revolutionary

The vast amounts of internal and external information now available to an organization must be leveraged through advanced analytics for business transformation. Often, even as these strategies are being developed, data is being landed in so called data lakes.

One of the Big-Data Revolutionary’s most important initiatives will be to implement data lakes, along with developing enhanced analytics that include predictive analytics, in such a way as to deliver tangible business outcomes. However, having a good data lake strategy is not just about acquiring and dumping the data in the data lake.

According to Gartner, “…through 2018, 80 percent of data lakes will not include effective metadata management capabilities, making them inefficient.” Also Gartner predicts, “…through 2018, 70 percent of Hadoop deployments will fail to meet cost savings and revenue generation objectives due to skills and integration challenges.”

Therefore, CDO’s should include data virtualization in their data lake strategy. This ensures: that the data lake has a good semantic layer to make sense of the unstructured data; allows combinations with traditional data sources in other data marts and data warehouses; and allows analytics to be leverage not only by data scientists, but also general business users by masking complexity and providing data access in simpler formats.

This strategy of creating a hybrid data lake or extended data warehouse uses data virtualization as the key layer to bridge between traditional data warehouses and Hadoop and Spark clusters with large volumes of dynamic operational data, and provides unified data for analytics engines as well as operational applications. 

Companies like Autodesk who have built their data lake with data virtualization to deliver such benefits have gained recognition for their IT leadership in this area.

The Friend of the Business

To enable business agility, the CDO has to lead a virtualization-first data strategy that encourages new logical data marts and data warehouse capabilities rather than traditional replicated data stores. Besides accelerated time-to-market and time-to-change, this approach delivers reusable data access for self-service BI and digital applications.

An Intel IT white paper on Big Data and Business Intelligence succinctly summarized: “By deploying data virtualization solutions that combine disparate data sources into a single virtual layer, Intel IT expects to increase the agility of our business intelligence (BI). This agility will enable our business groups to more quickly solve business problems, discover operational efficiencies, and improve business results worldwide.”

Traditional BI relies on a data warehouse and data replication; whereas, agile BI, self-service BI and advanced analytics rely on fast real-time access to disparate data.

Data virtualization can logically combine data warehouses, data lakes, cloud and enterprise systems, Web data, and any other source. The data virtualization platform contains information about each of their various schemas, and creates a unified semantic view that any BI or reporting analytics tool can access without the user knowing its location or format.

In addition to standard SQL access , the data virtualization layer goes a step further by packaging the data from the source systems as RESTful APIs, web services, and XML over SOAP formats, and makes them available in real-time to consuming applications.

The Data Governor

For data to be a strategic asset, it must be well understood, easily accessible to those who need it, and secured to keep it away from others. In addition to managing the catalog of data and metadata, relationship between entities, data lineage, change impact, stewardship, standard processes, and other aspects of governance, the Data Governor will also be concerned with providing security.

A data virtualization layer can connect master data with other sources, such as transactional data, unstructured and external sources to establish an enriched single source for truth around key business entities that serve a variety of users. Sitting between disparate sources and multiple applications and users, data virtualization provides a single, authoritative, logical view across any and all of them, and enables secure role and policy-based access to just the relevant data sets.

Real-time integration also reduces replication and unwanted copies of the data, which in turn improves governance and security. Even the most demanding applications like our nation’s nuclear stockpile, can rely on data virtualization to protect and share data.

The Efficient Data Operator

The Efficient Data Operator will cut costs and maintain efficiencies through several highly focused strategies including data consolidation, reduction of replication, offloading of big data to open source systems, and systems modernization and migration.

These projects are not done purely for cost reasons. As big data lakes have proliferated, the efficient use of new technologies like Hadoop, columnar databases, graph databases, combined with traditional high-performance data warehouses and appliances, requires a balancing act to maximize useful analytics while reducing replication. This, in turn, also allows more information to be acquired and processed from more external and internal sources.

Data virtualization enables such projects to take place without ripping-and-replacing parts of the infrastructure and without downtime. In fact, data virtualization can facilitate large scale logical data warehouses -  combining data during mergers, migrating applications to cloud, or data offloading from expensive mainframes or MPP data stores to Hadoop.

Thus, the logical data layer that sits between users and the data sources streamlines the data and IT infrastructure without the business user even noticing as underlying systems get gradually phased out or phased in. In fact with data virtualization, an organization like Vizient could not only save money but also transition from a traditional data warehouse to a modern data lake or hybrid data warehouse with zero downtime.

A data virtualization layer can be established as an enterprise capability wherever users need to seamlessly access data across a variety of heterogeneous sources. If you want maximum impact, it should be a critical component of your data-driven business projects, including agile BI, self-service data, data lakes, big data analytics, logical data warehouse, and data services for single-view applications that drive the digital transformation of your business. No matter which CDO archetype you most closely resemble, data virtualization can be among your most versatile and powerful tools.

(About the Author: Suresh Chandrasekaran, senior vice president at Denodo Technologies, is responsible for global strategy and growth initiatives in addition to operational leadership in other areas. Before Denodo, he served in executive roles as general manager and VP of product management and marketing at leading Web and enterprise software companies and as a management consultant at Booz Allen & Hamilton.) 

For reprint and licensing requests for this article, click here.