It is almost impossible nowadays not to notice the trend for data warehouses and data marts. Their role in strategic decision making is increasing rapidly as they turn data into knowledge. As the business community hears of their contribution to the profits, almost everyone is making plans to implement one. Some are asking the software vendors to bundle a decision support solution either as a part of their standard offering or an option. There are others who want to build their own with the help of experts. Even though there are many vendors who are offering tools to build a data warehouse/mart solution, many customers are finding that they still need to do some work themselves to have one up and running. This is mainly due to the fact that every company has a uniquely different culture, environment and requirements.
Implementing a corporate-wide DSS requires good planning from the start. According to a recent survey done by Sentry Market Research, "managing expectations" was rated as the most challenging aspect of building such a system. Another survey done by the Data Warehouse Institute in 1996 shows that enterprise DSS users double in the first six months. Therefore, choosing a scalable implementation methodology becomes a major consideration, and one must plan" for scalability.
This article discusses a way of accomplishing that goal while minimizing costs and risks for an enterprise DSS project.
Popularity of Decision Support Systems
Although the DSS type of applications that analyze or monitor historical data have been around a long time, why are so many more rushing to implement one today than ever before? There are several contributing factors.
The most important reason is the movement toward information-based management. When operational data is transformed into trends and aggregates to make future decisions, it becomes an advantage over the competition. It affects the bottom line. As we produce more data than ever before, users demand a better way to understand it. Consequently, there is a widespread realization that only by automating the analysis of data can users efficiently make sense of it all. The easier the process of getting to the relevant information, the faster the business decision can be made to adjust to the ever-changing conditions.
The Internet boom also gives broader access to a larger number of people. The budget-conscious organizations can now afford to distribute information inexpensively by providing access to their database through the Web. There are also a number of vendors who provide tools to convert 3270 and 5250 screens to Web pages. Therefore, it is an easy task to turn your existing host screen functions into a Web format by automatically generating HTML statements.
Hardware affordability is another reason. The price/performance ratio for hardware is making more organizations look at these solutions. For example, it is rather inexpensive to purchase reliable and fast high-capacity hard drives to store lots of data. In addition, today's latest Wintel (Windows-Intel) systems with multiple, powerful processors deliver more horsepower than some of the midrange systems at half the cost.
The availability of more mature and function-rich software tools is also a factor. Among these are operating systems, database management systems, GUI tools, query tools and data mart solutions. While software vendors are trying to entice us, we are seeing more built-in functionality in software tools which makes it easier to create and maintain decision support systems.
There are many buzzwords in today's DSS. Some terms have existed before, such as executive information systems (EIS) and some are more recent, such as data warehouse (DW), data mart (DM), on-line analytical processing (OLAP), data mining, data extraction, data cleansing and data transformation.
Today EIS and DSS practically mean the same thing. In the mid eighties, the GUI-based decision support systems were built for high level managers and given the term EIS, mainly to appeal to the executives. But today, as decision making gets closer to the customer level, many users with different roles demand access to the same information to stay competitive. So it is very common to interpret EIS as "everyone's information systems."
Based on the scope of data they contain, data warehouses and data marts are basically data architectural options to interface with today's decision support systems. The data in a DW contains multiple subject areas while DMs are usually subject-specific. When a DM is said to be dependent on a DW, it means that the data in the DM is populated from a DW.
OLAP and data mining are the types of query tools to view and analyze data that resides in either a DW or DM. Normally, with OLAP tools, you determine what kind of information you want to see and they allow you to view that information in different ways such as sales by region or state. On the other hand, data mining tools can search for information that you may not know existed by finding patterns in the data.
Data extraction is the process of getting data from operational systems into a decision support database. While data cleansing refers to the task of creating clean, valid and consistent data, data transformation deals with turning raw operational data into calculated fields such as sums and averages.
I will briefly compare two of the most common DSS types today: the data warehouse and the data mart.
Figure 1: Key Characteristics
|Attributes/DSS Type||Data Warehouse||Data Mart|
|Typically built first with||Top-Down Approach||Bottom-Up Approach|
|Data model employed||Enterprise||Department|
|Data sources used||Many||Few|
|Subject areas covered||Many||One|
|Next level of migration||Data Mart||Data Warehouse|
|Project time invested||Higher||Lower|
Data Warehouse or Data Mart?
Creating a DSS requires building a database and an application. The degree of work involved varies by the number of application subject areas and their scope. While a data warehouse type of DSS spans multiple subject areas, a data mart solution usually includes one subject area.
Although many agree on the benefits of a corporate DW, it cannot become a reality without commitment. As early implementers have found out the hard way, creating an enterprise-wide data warehouse with clean and consistent data is neither easy nor cheap. Ideally, it is the final goal. But, as we all know, what is ideal is not always the most practical and efficient. The project scope, especially in companies with high volumes of multiple data sources, is usually so enormous that it becomes a major undertaking. Because of that, companies are looking for a DSS strategy that would be comprehensive in subject areas but, at the same time, allow for an iterative implementation.
Choosing an implementation strategy for a DSS depends on several key characteristics in the DW and DM. Understanding of these differences, as listed in Figure 1, ultimately affects the planning process and the cost estimations for an enterprise-wide DSS project.
The DW and DM can either exist independent of each other or co-exist. The DSS architecture is named differently based upon its overall system design. The most common ones are listed in Figure 2.
For reasons such as lower costs and risks, the DM solution is gaining in popularity. According to a survey done by the META Group, data mart projects are estimated to make up the majority of decision support systems within the next year. Since data marts are subject specific and smaller in scope, the results are seen much sooner. Furthermore, it is usually much easier to justify costs for a data mart. The amount of new DM products available indicates that software vendors are realizing and reacting to this change as well. They include companies such as: Digital Equipment Corp., Fiserv, Hewlett Packard, IBM, Information Builders, Informatica, Oracle, Prism Solutions, SAS Institute, Sagent Technology and VMARK Software. Pre-packaged data mart solutions that are Web-enabled are also becoming common.
Figure 2: System Designs
|System Design Is||When There Exists...|
|Multitiered||Enterprise DW with several dependent DMs|
|Centralized||Single enterprise DW|
|Distributed||Several independent enterprise Dws or DMs|
Current Implementation Methodologies
Two of the most common methodologies for implementing a multitiered, corporate-wide DSS are "top-down" and "bottom-up." In a top-down approach, one big corporate database (DW) containing the historical detail data is created first from the operational systems. Later, the subject-specific databases (DMs) are built. Because of its large scope, the danger in this approach is that building from the top may not go deep enough to serve each department's specific needs. When that happens, the end-user applications are created with what is available rather than what should be there. One of the most difficult tasks is getting all the key people from different departments to agree on the same interpretation of the data before building any applications. The never-ending requests from the users to expand the design built with this approach makes on-time delivery rather difficult and costly.
Figure 3: Implementation Order for Top-Down Approach
Figure 4: Implementation Order for Bottom-Up Approach
Among the software vendors, these two methods of building a DSS constitute a major difference for the preferred choice of implementation. Both methods have their risks. But, interestingly, when the advantages of each are combined into another approach, risks become more controllable. Learning from past experiences and trying to ease this effort, there is another methodology emerging that marries the two methods.
The Hybrid Approach
It has been an accepted opinion that when many independent DMs exist without a common architecture, it later becomes harder to create a single, corporate DW for the enterprise. But it is also a known fact that building a full-blown, easy-to-use DW is more risky and costly than a DM, even for corporations with deep pockets. So, how can one increase the chances of putting a DSS in production within reasonable time while minimizing efforts needed to have an enterprise DW later? The answer is to combine the strengths of a top-down and a bottom-up approach in a methodology called the "hybrid approach."
Figure 5: Implementation Order for Hybrid Approach
The degree of how far the enterprise and department data models are taken during the design process is one of the key characteristics of the hybrid approach. With this approach, it is not necessary to create a fully completed enterprise data model at the beginning which is a very time and resource consuming effort that only the corporations with big budgets can undertake. But, it is necessary to identify all major problem areas that eventually will be included in the total enterprise solution. For high-level enterprise design, we only need to know which key fields will relate to each other. But, for the initial subject area chosen, a detailed, low-level design with granular data needs to be completed. What this means is that a DM subset of the enterprise data model should be agreed upon by the key people before the department-specific DM is built. At this point, the data naming conventions, types and sizes must be clearly defined for the DM at hand only. These conventions will function as the standards for building a DSS in the enterprise. As a result, we will be building a strong foundation for the corporate data model, one DM at a time. These DMs will be created with a corporate foundation of a future DW. Then, one can clearly realize that scalability does not seem to require "big budgets," and the goal of creating a common enterprise data model becomes achievable.
There are a couple of different schools of thought on which subject area should be worked on initially. While some prefer it to be in a low- to medium-priority area, there are others who advocate it to be in a high-priority area. This is less of an issue with the hybrid approach because of its ability to provide better scalability. But, to entice more project sponsors and attract users' interest, choosing a highly visible problem area is a better choice. Jay Marquez shares this view when he says, "The first iteration is the key to a successful implementation in that it sets a precedence of tangible benefits and success."3
As illustrated in Figure 3, the top-down approach encourages the design and building of a DW first and then the DM. In bottom-up approach, this order is reversed (Figure 4). It is only the hybrid approach that starts the design at both levels. It then continues to build DMs and finally rolls them up into the DW level (Figure 5). The good news is that with the hybrid approach, the effort to build an enterprise DW would have been greatly reduced, since all existing data models for DMs would be decided by keeping a corporate strategy in mind.
Some people prefer to create a "complete" enterprise data model instead of "complete" subsets of it (as in the top-down approach). This type of development is analogous to the "waterfall development" (sometimes known as a big-bang method) where each task has to be fully completed before moving on to the next task. On the other hand, starting a DSS with subsets of the enterprise data model is, by its nature, more analogous to the "incremental development" (also known as an iterative method) where tasks can be revisited to add more functionality. The leading software development companies have discovered that the incremental approach of building software is more manageable and less risky. It holds true in building a DSS as well. As W.H. Inmon states, "The most successful implementations of the architected environment have been those where the data warehouse has been built one iteration at a time."4 Simply stated, the hybrid approach represents an example of applying incremental development by subject areas in a DSS. And, not surprisingly, this practice is now beginning to be actively pursued by vendors such as IBM, Oracle and Computer Associates.
Migration to the Enterprise Level
There is an important point that must be stressed at this time: once the DM is proven to be a good investment, we have to avoid building too many DMs (generally no more than two to three) without implementing an enterprise DW. The main reason for this is to be able to create a common data model for the enterprise, by which smaller DMs can be refreshed, maintained and tuned for performance. It is here that we combine the strength of the top-down approach by reducing the data replication from the operational systems. Furthermore, it is essential to centralize the data extraction and removal process by having one common, shared repository. Data cleansing and transformation, which account for most of the unexpected cost, can now be minimized. At the end, this translates to less maintenance and better performance.
After satisfactory results are observed from the initial DM, one will need to decide how to grow the existing DSS. Before each additional DM is built, the existing data model should be examined to decide if new data elements are needed. Creating another subject area for a new DM can be done in the same machine and database, utilizing the data model built so far. Depending upon the number of new data elements needed, either a new data model is created or the existing one is modified. There should be very little data duplication between the DMs so that the high degree of data integration can be achieved for the corporation. Therefore, subject areas must be chosen that are almost mutually exclusive so that it will be easier to integrate these areas when building the enterprise DW. One must also pay special attention to the fact that there should be only one interpretation (format and semantics) of data elements among the multiple subject area DMs. Each DM should be built with the same meta data requirements (source and meaning of a field, field type, its format, how it is calculated, etc.).
We are now at a point where we can build the multitier enterprise DW by moving the data model, representing the "detail" data of each DM, to a different machine. Rolling up an enterprise DW is simply a matter of taking this shared data model to a higher (enterprise) level, since their key relationships were established during the high-level DW design. The effort needed to accomplish this task was made easier because of the properly partitioned DM subject areas. Managing the common enterprise data (the identification and the interpretation of the new detail data elements) was undertaken and resolved as each DM was built. The effort to migrate existing DMs to an enterprise DW would be very tedious if they had been created using a bottom-up approach. "Opportunistic" data marts built without any intention of enterprise integration become isolated systems and maintenance nightmares.
Having users access a DM rather than a DW is also practical for performance reasons and availability as well. When many users run ad hoc queries against the DW, this will cause a slower response time for everybody. Having mostly summarized data in the DM level will further enhance the performance of the system. Ralph Kimball emphasizes this point when he concludes that: "The use of pre-stored summaries (aggregates) is the single most effective tool the data warehouse designer has to control performance."5 And, as we know, nobody ever complains about faster response times!
The meta data for each DM is also easily combined to create a corporate DW repository. Since there is currently no industry standard to exchange meta data, using the same toolsets will be a wise choice here.
In order to create an enterprise data model, some suggest having smart algorithms that would scan the meta data to find what is common at the DM level. In my opinion, this is a valid suggestion only when one is not certain about how the current data models are structured and maintained. This situation is typically encountered with systems built by the "bottom-up" approach where creating a common enterprise data model is an afterthought. The "hybrid" approach attempts to prevent that by keeping a closer eye on the enterprise data model during the design of each DM.
Another important point to keep in mind is choosing a scalable database system and platform. Given the fact that the size of the data and number of users increase very rapidly in enterprise systems, selecting a vendor whose database runs in multiple operating systems (with support for multiple processors) can greatly decrease the growing pains. Planning for growth now will pay off in the future when you decide to increase the capabilities of the your system.
In these times of intense competition, companies are doing everything they can to speed the decision-making process. Since making the correct decision at the right time involves fast access to the needed information, software vendors constantly bombard us with the marketing information about their latest data warehouse tools. Because of the high costs involved in implementing a corporate-wide DSS, the customers need to be very knowledgeable about the whole planning process before the tools are selected.
Before one can plan a corporate strategy for an enterprise DSS, the issues that need planning must be identified. This is easier said than done, especially in the data warehousing world where the toolsets and methods are highly mixed.
Given the importance of a scalable architecture for the enterprise DSS, one of the major considerations during the planning phase is deciding which "implementation option" to use--top-down, bottom-up or hybrid. Understanding the implications of choosing one approach over the other will influence how the DSS project will take shape over time. When this decision is made with the right knowledge, managing expectations during the DSS project will be much easier, causing less stress for the project team and company sponsors. Because of its ability to better control cost and risk without jeopardizing the creation of an enterprise data model, the hybrid approach should be the methodology of choice for the majority of enterprises undertaking this effort. It is suitable to both type of organizations--early-adopters and the more conservative ones.
Scalable solutions get built by design. Therefore, it is imperative to identify the type of method that lets us get there. The hybrid approach is just that. And that is why it is now being practiced by many.
- Inmon, W.H. Building the Data Warehouse. p.292. John Wiley & Sons, 1996.
- Hill, Harjinder S. & Rao, Prakash C. The Official Guide to Data Warehousing. p.82. Que Corp., 1996.
- Joyce Bischoff, Ted Alexander, Jay Marquez. Data Warehouse Practical Advice From The Experts. p.97. Prentice-Hall, 1997.
- Inmon,W.H. Building the Data Warehouse. p.275. John Wiley & Sons, 1996.
- Kimball, Ralph. The Data Warehouse Toolkit, p.190, John Wiley & Sons, 1996.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access