Todays enterprises rely on the information in their data warehouses more than ever for making informed, business-critical decisions and complying with a myriad of ever-increasing regulations and compliance mandates. Data warehouses integrate and transform the complex, disparate and globally distributed data from back-office transaction systems and other sources into the rock-solid information stores that support a range of financial, customer and supply chain performance management analytics - reporting that is critical to helping enterprises increase revenues, decrease costs and reduce risk.
But are these enterprises really maximizing the return on their data warehousing investments? Might complementary technologies provide additional performance management insights and therefore valuable returns? In particular, how are enterprises leveraging new advancements in data virtualization required to achieve even greater revenues, larger cost decreases and better risk reduction today?
Data Warehouses - Cornerstone For Business Intelligence
The data warehouse is the cornerstone of any business intelligence (BI) strategy and architecture. The warehouse is the repository of nonvolatile, historical enterprise data snapshots used across a range of analytical and reporting purposes. Securely storing the data needed for analytical purposes, the data warehouse provides stability, reliability, quality and consolidation. After so many years building and enhancing their data warehouses, few enterprises question their business value and return on investment.
However, data warehouse practitioners also understand that data warehouses have drawbacks. These include:
- Time-to-solution - Integrating disparate data, even for a single project, can be a difficult and time-consuming task. Typically, the data warehouse is built project-by-project, with each project contributing another useful set of data to the overall environment. It takes time, but eventually a deep wealth of analytical data is created.
- Generic data schema - Because the data warehouse serves many analytic purposes and departments, its design cannot be optimized for any one form of analysis over another (e.g., star schema versus cube versus flat files).
- Data latency - To achieve quality and consolidation goals, as mentioned, data loaded in the warehouse often undergoes significant physical transformation, cleansing and integration processing, which delays its availability. As a result, data in the data warehouse has built-in time latency (a few minutes to several hours to even a day).
- Deep resources/total cost of ownership - It takes a village to design, build and maintain a data warehouse. Vendors and IT departments have adopted a range of strategies to mitigate these costs from data warehouse appliances on the tools side to integration competency centers on the people and process side. But significant inefficiencies and limitations remain.
Data Marts and Operational Data Stores Extend Value
Data marts and operational data stores have evolved as complementary repositories designed to address the schema and latency drawbacks identified above. Data marts can provide optimized schemas in support of specific analyses, for example, a cube for market segmentation or a relational database for departmental reporting. Operational data stores can combine historical warehouse data with up-to-the-minute operations data to overcome the latency challenge when performing operational BI activities such as supply chain planning or equipment dispatching. However, time-to-solution and total cost of ownership remain as lost value opportunities.
Physical data warehouses, data marts and operational data stores all leverage a common data integration middleware toolset, extract, transform and load (ETL). This combination can be seen in the physical data integration landscape in Figure 1.
Virtual Data Marts, Operational Stores and Layers Address Time-to-Solution and Cost Drawbacks
With the advent of data virtualization technology, enterprises can now implement virtual versions of data marts and operational data stores, as well as virtual data layers to complement their existing physical data warehouses, marts and stores as seen in Figure 2.
Data virtualization, which leverages virtual data federation or enterprise information integration (EII) middleware, provides three critical capabilities:
- Data virtualization serves up data as if it is available from one virtual data store, regardless of how it is physically distributed across source data silos. Query optimization and caching enable the high performance required to meet latency objectives without physical replication.
- Data abstraction simplifies complex data by transforming it from its native structure and syntax into reusable views and Web services that are easy for applications developers to understand and the applications themselves to consume. Common higher-level abstractions might include customers, invoices, shipments, payments and more that can be shared across numerous applications.
- Data federation securely accesses diverse operational and historical data, combining it into more complete and meaningful information for a range of application uses.
At build time, data virtualization middleware provides an easy-to-use data modeler and code generator that leverage enterprise metadata to create abstracted relational views or Web data services of the source data. At run time, the data virtualization middleware acts as a scalable information server called by the consuming applications to execute high-performance queries that securely access, federate, transform and deliver the required data in real time.
By avoiding physical data replication with its associated storage and support requirements, data virtualization reduces both time to solution and ongoing operating costs, often as much as fourfold. Further, as requirements change or expand, modifying the models and regenerating the views from the virtual mart or store can be done in minutes, without requiring IT resources to rebuild physical marts or stores, which may require several days to schedule and execute.
Flexibility to Do the Right Thing
While the benefits of the virtual marts, stores and data layers are significant, they should not be seen as one-for-one replacements for every new physical mart and operation store project that comes along. Instead, enterprises should consider the virtual versions as additional options that can be used to provide greater flexibility in meeting a new projects specific business, data source and consuming application requirements. A number of decision trees, factor analysis tools and usage pattern guides are available from vendors and analysts to help enterprises with this decision. For example, Gartner, Inc, identified eight frequent usage patterns that enterprises can use as a means of helping with the virtual versus physical decision.1 And an end-user customer presented their decision tree in a data and application integration track session at the March 2008 DAMA Conference.2
Maximizing Portfolios Returns from a Financial Research Data Warehouse
Enterprises across multiple industries have gained higher returns on their data warehouse investments by applying data virtualization to a wide range of business problems. One recent example occurred in the financial services industry where investment managers responsible for large-equity portfolios leveraged a virtual data layer to help improve their investment decision-making. At this firm, managers team with financial analysts to build portfolio analysis models with MATLAB and other analysis tools that leverage a wide range of equity financial data from their financial research data warehouse. Due to the complexity of the data in the warehouse, analysts would spend hours trying to access what they needed, often spawning new satellite data marts with useful data subsets for each new analysis activity. To accelerate and simplify data access and to stop the proliferation of costly, unnecessary marts, the firm encapsulated their financial research data warehouse with a virtual data layer that abstracts the data into a set of high-performance views shared by over 150 analysts and their financial models. This enabled analysts to spend more time on analysis and less on access, thereby improving portfolio returns. And it helped IT operations eliminate extra, unneeded marts and all the costs that go with them.
Accelerating the Drug Discovery Pipeline with Virtual Operational Data Stores
A drug discovery portfolio management portal illustrates the value of virtual operational data stores. This use case is one of more than 20 data virtualization projects now in production at this leading pharmaceutical company. Senior management, project team leaders, business analysts and research scientists continuously review and evaluate their portfolio of in-process drug discovery and development projects with the goal of accelerating time to market for new pharmaceuticals. This collaboration requires a wide range of project status information (costs, resources and timelines), as well as lab and clinical trial information from a variety of transaction and warehouse sources. The breadth of data required and the ever-changing business needs of these diverse users were the key design considerations favoring a virtual operational data store. Using the virtualization approach, IT reacts much faster to new requirements, with developers adding, testing and delivering data from new sources in minutes without involving the IT operations groups for physical rebuilds and similar efforts. These time-to-solution savings provide immense value in an industry where being first to market has huge revenue advantages.
Extending the Data Warehouse to Keep the Oil Pumping
Californias largest oil producer has built an impressive enterprise data warehouse that uses over 1800 ETLs a day to consolidate subsurface, surface and business data. Daily data solves most needs, but there are certain applications where up-to-the-minute data is a must. One example is repair rig dispatching. Whenever a well goes down, a large repair rig must be deployed to the well so the maintenance crew can perform the necessary repairs. Deciding where to deploy the limited number of repair rigs in relation to the many possible problem wells is a key decision that impacts both revenue (barrels per hour from a down well) and compliance costs (significant EPA fines possible). Data latency is important because making this decision requires a range of data, much of it real time. This oil company designed a virtual operational data store that pulls both fresh, up-to-the-minute data and enterprise warehouse data into an automated repair rig dispatching application on demand whenever a well goes down or a repair rig frees up. This enables maximization of oil production and revenue, while mitigating repair and EPA compliance costs.
Making the Case and Getting Started
The first step is to build the business case for purchasing the data middleware as well as the implementation resources that may be required. Any new project where a new data mart or operational data store is required is a candidate for data virtualization and can serve as the pilot. As highlighted above, the best candidates are ones where rapid time to solution and total cost of ownership are key considerations and the data volumes and transformation requirements can be handled on the fly.
Existing BI and data integration teams typically find the data virtualization middleware easy to use because these tools leverage common data modeling techniques and automate many of the functions. As such, adoption costs are surprisingly low, especially when an integration competency center is in place. In fact, most enterprises pay back these investments in less than a year and then use the time and cost savings realized from initial projects to help fund additional usage, eventually expanding enterprise-wide.
Virtualizing Data for Todays Enterprise Demands
Data warehousing continues to provide the essential cornerstone for enterprise decision-making. However, as intelligence needs become more sophisticated, competitive markets require a faster time to intelligence, and economic fears squeeze costs, enterprises have realized the need to develop complementary repositories along with the traditional warehouse. Physical data marts and physical operational data stores were the original wave, with virtual data marts, virtual operational data stores and virtual data layers forming the latest. Data virtualization, leveraging virtual data federation or enterprise information integration (EII) middleware, enables these new virtual stores to provide more complete and meaningful information critical to todays revenue, cost and risk optimization strategies. These benefits not only pay for themselves with greater efficiencies, but directly contribute to the bottom line.
- Ted Friedman. Data Federation Adoption Increases as Part of Complete Data Integration Strategy. Gartner Inc., October 5, 2007.
- DAMA. Data Application and Integration. The DAMA International Symposium and Wilshire Meta-Data Conference, March 2008.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access