Data warehouses support many kinds of analyses involved in business intelligence activities. As well, there is a natural tendency to create data warehouse "slices" or data marts to get activity areas of a manageable size. For instance, customer satisfaction, market penetration strategies and sales force productivity are common data mart themes. In order to add value to internal data, the picture of the markets, products and customers is enhanced through the addition and integration of data which is not generated from business transactions. Some of these data sources include: externally purchased demographic or business data; local, user-generated data; graphical and map- based data; statistical data; Web-sourced data; and key government-produced economic indicators. Consequently, many companies have significant data integration challenges. This article reviews these challenges and suggests a consolidated data mart architecture as a solution.
Placing Business Data in Context
"Extra" data is important because it places CONTEXT around the business event data. Competitors are profiled for comparing their market share and potential share to your business share.
For example, one context of a customer is their participation in a target market group based on their purchase records. Groupings can then be studied for their response to various delivery channels for products and services.
Customers can also be grouped based on their demographic or attitude profiles for special handling.
Markets can be forecasted with economic trend indicators.
Analysis in many of these cases involves data generated outside of the company: census, market research or market segments (e.g., North American Industrial Classification System characterization of commercial and industrial customers used in NAFTA countries starting January 1, 1998).
Confronting Integration Limitations
Fundamentally different in their genesis and supportability, the amount and type of integration possible between the various types of data and sources varies greatly today. What are the issues and what can be done to minimize the impacts?
Issue 1: Data Formats
Most data warehouses or data marts are designed to support one or maybe two types of data. When other types are added, the compromises begin, including possible redesign of the warehouse to accommodate the new types of data. Examples of various formats for integration are internal RDBMS data, denormalized purchased monthly data, geographic data and object data (reports, documents, spreadsheets, notes, multimedia).
Issue 2: Data Visualization
This is another opportunity for dis-integration of the data mart front-end environment. The requirements from user groups for graphs, charts, data animation, data landscapes, cartographic or dashboard representations, or even sophisticated virtual reality interfaces are key marketing features of specific software products. These visualizations may or may not be integrated to the base analytic tool. In fact, there may be no one base analytic tool in the architecture. In this case, there may be multiple visualization and analytic data marts running off the base warehouse.
Issue 3: Proprietary Databases
Today corporations are all trying to acquire data in order to complete their customer, market, competition and forecasting picture This has generated a new industry of data packagers and resellers ready to cater to data subscription or even full service analytics.
Data Marketing Tricks
Data integration and functionality integration usually go hand-in-hand within the products on the market. However, this is not always true. There are many tricks available to the data seller to make the data useful in the immediate analysis, but impractical to integrate as a whole in the subscriber's computing environment. These are listed here from the simplest to the most sophisticated:
- Making sure the data is current. Perhaps the simplest, but a highly effective barrier to local integration. Resellers package databases. This data can often be purchased at lower cost from the data originator, but the repackaging has a significant value-add (for example, stock market performance, competitors' rates, annual reports, regulated reporting data). With a number of data sources to integrate, IS staff are happy to have updates arrive reliably on a predetermined schedule, pre-formatted and custom run to your requirements.
- Re-coding difficult separate data sets. Census, state or province-based data into country, country into global, etc. Some vendors add value by keeping to the data re-coding activities alone. Products are raw data sets designed with some data "hook" that allows limited integration with your own customer data or full integration with your data if bundled with the vendor's consulting "one-off" project.
- Packaging the data into more user-friendly programmed front-end analytical tool. In this scenario, repackagers deal with messy raw data and new official updates providing a full- blown system where you can report and analyze only on the data in their system. Integration is not possible. Data extraction and conversion to other formats is limited to "one report at a time." If you try to look at the underlying data outside of the package, it is encrypted to prevent direct access.
- Provide full modeling environments. This is the most sophisticated approach to data service, but you must be able to "buy into" the predetermined analytical framework and assumptions, algorithms for analysis and the data visualizations supplied. These tools are often models for business scenarios, containing value- added algorithms. One example of this is an analytic tool capable of segmenting your customer database into "loyalty profile groups" based on past customer behavior with your company.
Solutions for a Dis-Integrated World
One Solution: Multiple Independent Data Marts
One obvious solution to the continuing and even multiplying integration problems is to support the multiple data mart solutions. On a practical basis, integrating the various data types into an RDBMS represents a great deal of work and redesign. This solution gives in to the complexity of the environment and supports additions to the corporate analytic environment in an ad hoc manner.
For example, data marts designed to follow natural limitations for integration might create an environment like this:
- Geographic-based data for demographic analysis;
- Statistical samples for survey data;
- Proprietary databases with built-in query environments purchased as a stand- alone;
- Data visualizers for 3-D, landscape and other advanced displays (must be loaded with data set to be viewed);
- Multidimensional databases for executive information systems (highly summarized critical performance indicator- based tracking);
- Mathematical models loaded with local data for scenario analysis.
This solution, although practical, raises other issues. Independent data marts are the antithesis of our fundamental objectives for data warehousing solutions. Data warehouses and data marts are meant to coordinate in a data integrating strategy and architecture. The lack of data integration that is inherent in legacy systems is relocated to the front-end presentation and analysis tools. In fact, we find that much has been written on the ultimate failure of the independent data mart precisely because in the long term, the lack of control in the system architecture framework leads to complexity that is unmanageable.
A Better Solution: Consolidated Data Marts
A consolidated data mart solution is a managed-for- integration data mart architecture. In this alternative, there is an acknowledgment that integration is not going to be seamless. A plan to coordinate data marts using the enterprise architecture and central planning is developed. The degree of integration between marts will vary from a full segmentation of the business into targeted data marts, to an as-needed segmentation. Limitations for data types, vendor-based constraints and data visualization challenges are recognized as key challenges to be controlled and leveraged.
Key Consolidated Data Mart Design Principles
One of the important patterns to establish in a consolidate data mart architecture is the understanding of how each part of the architecture contributes to the whole and to establish a framework of principles to support this understanding in order to aid in the management of the data marts.
Figure 1 shows this approach.
The overall plan for data in the data marts has to coordinate on the following aspects:
Common definitions: Common data in the data marts will have agreed-upon data definitions. Where definitions differ, this is documented.
Common roll ups and data relations: This is necessary to ensure that understanding of organizations, structure and geographies is collective and aggregates are understood and performed in an agreed-to manner. Exceptions are documented.
Common sources: For common information, the extract mechanisms for data, schedules, cleansing and transformations will be done once for all data marts in order to keep versions of the data under analysis in unison. This makes all reports from all data marts comparable at some level. Again, exceptions are documented.
Standard measures (dollars, profit, margins, etc.): For evaluating business performance, these should be common between data marts. The ways in which these metrics are derived should also be adopted across the marts.
- Implementation Strategy
It's important to identify the data which is central to the business. Generally this information will be relatively static and, in most businesses, will be customer, market, internal organization or responsibilities data.
This modeling of common data and responsibility for all model changes might be best supervised by a centralized data administration unit.
Consolidated data marts differ in the type of local analysis and reporting undertaken. Local data that is central to the particular focus of analysis can, and should, remain local. Again, a centralized administration unit has a role here. The unit can consult on local data requirements and coordinate centralized model changes when mandatory.
This policing role optimizes data reuse and supports business needs to report and analyze on the same version of the data that is defined in a uniform manner. The requirement to run on the same version of the data implies that the centralized unit should also control the publishing of new versions of the centralized data to the consolidated data marts. The data warehouse that feeds the data marts contains only the centralized data and business structures needed by all data marts.
In this way, the overall directional feeling can also be set by the centralized unit; and responsibilities, boundaries and reporting feeds to an executive scorecard of business performance can be done.
- Access Through a Meta Data Shell
The consolidated data mart architecture is, by definition, a distributed environment. Because of this, understanding what is available and where to find it is critical. This is achieved through a meta data interface to the data. The meta data is recognized as a critical part of the analytic environment. Meta data information creates semantic packages surrounding each piece of data in order to support data transformation into information. A determination of the usefulness of information can be made from the review of the semantic package. The validity of data is in direct relation to the agreed-upon authority and the completeness of a semantic package to address such issues surrounding the data interpretation and usage.
The semantic packages support users in their ability to establish many things about the data including:
- Data mart and reports participation;
- Analyses and special studies performed with this data;
- Appropriate usage and limitations to appropriate use (especially for survey samples of data);
- Access authority required;
- Data authority;
- Data ownership;
- Common definitions;
- Currency of the data available;
- Roll ups and rules for roll ups;
- Relations to other data;
- Update schedules;
- Data sources (legacy, external, local);
- Maximum and minimum values;
- Dated changes to any of the meta data so that understanding of the history of data mart content is supported.
Once loaded, the meta data interface should support access either to the application or directly to the raw data mart or enterprise warehouse as permissions allow. Ideally, this should be one uniform meta data interface with different data mart views. This interface should be maintained by the centralized IS unit.
Consolidated Data Mart Payoff
Consolidated data marts are a practical solution to data warehousing problems that can't be controlled well within the organization. These problems limit integration of data and functionality. The environment is flexible and yet well- controlled and understood both by the IS staff and the end user.
- Points of control can be established in this dis-integrated world and a flexible architecture for all data marts supported and enhanced.
- A meta data shell around the consolidated design supports the idea of semantic packages attached to the data.
- The appropriate data mart can then be accessed through the standard meta data interface, the necessary information found and analyses run.
- Shortcuts to preferred data marts and data can be established in order to customize the user desktop.
- All development and changes to the overall architecture are supported with a design implementation framework through the consolidated data mart vision
But these are operational benefits. On the functional side, the whole is greater that the sum of the parts.
The Strategic Bonus
For business it is a critical that no matter what the internal schedule, the overall analytic environment supports the ultimate critical performance measures of the overall corporation. These indicators point the direction for sustaining and improving company performance. Analyses performed, scenarios run and selected for action, and strategies adopted from causal analysis all must support the ultimate goals of the business. Performance measurements reflecting the activity that is done inside the company as a result of the analysis must be reported regularly. Linking of performance with the customer and market drivers completes the picture. This linking of analysis and reporting activity with corporate critical performance measures is the ultimate deliverable of a data warehousing environment. Whatever IS can provide to make a cumulative whole out of disparate pieces will aid the leadership in measuring progress and indicate corrective steering for the corporate ship.
Efforts already undertaken in the business in data warehousing and data marts are not totally lost in the migration to a consolidated data mart architecture. Instead, the diversity is recognized as a business reality, and this diversity is supported in a controlled, logical architecture. This is a win/win solution for IS group and the corporation.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access