Many large organizations are preparing or executing projects with the aim to improve business intelligence (BI) delivery by implementing a data warehouse on an enterprise scale. However, in practice, it proves to be very difficult to bring such ambitious projects to a successful implementation. In my opinion, there are four main reasons for this lack of success (or at least the problems that are encountered):
- Money (and thus, governance)
- Heritage, and
A Dutch proverb says, "He who pays for the music will determine the song that is played." Money for IT projects is raised by the business based on the added value as it is perceived by the future users of the results of the projects. At least, that should be the way it works. In case of a data warehouse, the future user group consists of the management and their supporting staff. The logic of an enterprise data warehouse, the single version of the truth and the optimization of the data flows between sources and the BI application, is in itself clear and in fact indisputable. But then another phenomenon comes into play, and for this one of my former managers had a saying: "Managers and children have one thing in common, once they want something they want it right away." An enterprise data warehouse is just one of those things you can't have right away. On top of that, managers have an information need, which - for the most part - is fluid, subject to fast and unexpected changes and of an unpredictable nature. At the same time, this information need has to be satisfied now. And so the temptation is large to built or maintain - in spite of the bigger picture - another homegrown shortcut next to the enterprise initiative.
So, although the individual manager will support the idea of an enterprise data warehouse, his willingness to fund it from his project ("the first pedestrian paying for the whole bridge") will not be really great. The same is true for his willingness to wait for the fulfillment of his information need until the enterprise data warehouse is up to it.
The lesson here is, funding of the enterprise data warehouse must be decoupled from the individual BI projects. It is a kind of basic infrastructure ("the plumbing and sewage") which, only after a considerable threshold investment, can play a role in the factual information delivery towards its customers. Apart from that, the actual completion of the data warehouse, which data coming from which systems will be included when, must be determined by the priority and willingness to participate of the various BI projects. You have to avoid the data warehouse becoming a solution in search of a problem.
A second reason why it is so difficult to succeed with an enterprise data warehouse project can be found in the internal design of it. One of the most important requirements is the capability to follow the changes in the business with the pace dictated by the business. Managers want their information, and no matter which internal or external changes will occur, they want an undisrupted continuation of their information flow, including answers to questions coming from those changes. They want answers to questions that will arise only when the answers to the previous questions are on their desk.
In case the data warehouse organization is not capable of following the required pace, the users will once more drop out to set up their own information provision next to the data warehouse. The data warehouse organization is in fact a logistic service provider that ensures that data originating in a number of source systems can be used - within the given time to market - as meaningful information in a number of target systems. Therefore, a logistics process needs to be designed and implemented, which consists of collections points, means of transport, decoupling points where (temporary) storage and processing takes place and distribution points. Such a logistics process is depicted at the conceptual level in the following figure.
Figure 1: Conceptual Information Logistic Model
While designing the processes, systems and applications for this logistic process, one should - in order to obtain the necessary flexibility and reduce change resistance - take into account the following four design principles:
- Generic instead of specific,
- Reusable instead of one time,
- Decoupling instead of integration, and
- Federation instead of centralization.
Generic instead of Specific
In the end, a data warehouse is nothing more than just another database that consists of a number of physical (relational) tables. Such a physical database model is based on a logical database model in which the various relations between the different information elements are described. The logical database model, in its turn, is (or at least should be) based upon the information or business model. This information model represents the way users are looking at their world.
The most far-reaching changes in any enterprise data warehouse have to do with changes in this information model. Following this are changes in the logical and physical data model of the data warehouse. Adding dimensions or changing the structure of dimensions or new facts requires a lot of maintenance, not only in the data model itself, but also in the load processes between source system and data warehouse on the one side, and between the data warehouse and data marts on the other side.
Solutions are sought by using standard (generic) industry sector models (e.g., from IBM or Teradata) or by using a data warehouse for which a standard business content forming the underlying ERP system is loaded into the data warehouse (e.g., SAP or Peoplesoft). However, in somewhat larger organizations, standard will never be standard.
Industry sector models contain too much on one side and must be expanded or adapted on the other side, if only because each self-respecting company feels itself to be so unique that at least the necessary customization of the operational and managerial processes has to be done. The alternative, to organize yourself in line with the standard solution, is something that virtually never takes place, but this - at least in case the costs of reorganization are lower compared to the costs of changing the standard system - would bring the most return from the investment in the standard model.
Importing standard business content works (but almost always only if the underlying ERP is implemented in a "vanilla" fashion) for that ERP system. Unfortunately, that ERP system is never the only source system, and so, changes are necessary for those other source systems. And all of these changes (both with the industry model and the standard content) need to be applied to a normal logical and physical data model. So, in the end, what you get is - depending on the degree of the initial coverage - more or less a head start.
Fortunately, there are other possibilities. There is Datavault from Core Integration partners as a data modeling method to model your data warehouse in a generic fashion. There is also Kalido, a product originally developed back in the '90s within Shell, which is based on a fully generic model, in which changes in the information model are (in principle) overcome by only changing or adding new rows in the generic logical and physical model underneath the "hood." Some other products (e.g., from two small Dutch startups, Dynalytical and BI-Ready) generate new database schemes driven by the changes in the business model.
Reusable instead of One Time
As much as possible, use standardized and reusable components of processes. This sounds completely logical and should go without saying, but in the chaos and pressure of timely delivery, it is quite easy to lose sight of this aspect. This results in a situation where another specific set of software is developed (and has to be maintained).
An underlying reason for this is that too often, too little attention is given to reusability while designing and developing the initial process and the first increments. This is often caused by the time squeeze in delivering the first increment.
Decoupling instead of Integration
At first sight, this seems not to be beneficial for the resistance to change. After all, the idea of an integrated environment is that by doing so, fewer changes are needed. However, it is better to decouple the different information logistic functions and to implement different layers. In that case, changes will be limited - just as with the world of physical logistics - to one or just a few functions. And next to decoupling, the more the design of individual functions is as generic or reusable as possible, the easier it is to isolate the impact of changes and minimize the throughput time of applying these changes.
Federation instead of Centralization
In designing an enterprise data warehouse, it is quite common to base the design on central implementation, in which all data is brought together and managed from one central point in the organization. This central point is also responsible for refreshing the data marts. In most situations, it would be preferable to apply a federative model. In a federated model, the enterprise data warehouse is split up into a number of well aligned and connected partial implementations. Each of the individual business units (region, country or operating company) has its own implementation under their own managerial responsibility. Each unit has the full freedom to model and manage its own part of the company's business model. The common part is managed at a higher level and will be pushed down to the lower levels. Data from a lower level which is of interest to the higher level can be pushed up to that level. In this way, it is much easier to create support in the autonomous parts of the organization, and it is only necessary to reach agreement on the commonly used part of the model.
The initiative to start with the development of an enterprise data warehouse is always taken in a moment in which - depending on the size of the company - dozens or maybe even hundreds of small and large systems exist, which provide management information. Local systems, departmental systems, regional systems, corporate systems, systems on the PC of the individual user (60 percent of all management information is still Excel) and so on, and so on. And in this situation, it is true that "the spirit is willing, but the flesh is weak." Yes, everybody agrees that the existing situation is not good for anyone, but when push comes to shove, few people are really prepared to give up their autonomy.
Central funding of the basic infrastructure and a federated concept as described is the best way to deal with the problem. The advantages of a shared and common infrastructure must be bigger compared to the (perceived) disadvantages. In short, it is not a matter of imposing a central data warehouse and a common approach, but a matter of convincing, selling, creating buy in and a lot of time and patience.
And in the end, everything is arranged and settled as described. Money and governance are where they belong; the design of the data warehouse and the logistic process complies with the rules; the applied technology is "state of the art;" the feelings of the organization are with respect to politics and existing systems are dealt with in the proper way and yet... After the celebration of the initial successes, the progress comes (almost) to a halt. The better the integration from data from different systems is settled, the more integrated reporting and analysis facilities will become available. That is the moment when it becomes apparent that quality and consistency of data are not at the level they should be.
This lack of quality has two aspects. First, defects in the source systems. A lot of source systems contain a lot of faulty data. There are many reasons for this. In old systems, certain input fields are used for completely different data than originally intended (and still showing in the documentation). Often, users have to input data that is not relevant to their jobs. A lot of checks on inputs are not correct, complete or can be easily bypassed. All too often, these faults surface only at the time when more advanced analytical facilities are made available. The size and seriousness of problems with source data in its first instance is often neglected or underestimated.
As users are just normal people, the chance exists that users will blame the new data warehouse and will turn away from the system. Analyzing the defects, finding the root-cause, correcting and preventing future defects requires time, patience and also relatively big (extra) investments, not only in the data warehouse but also in the source systems.
A second data quality problem is the lack of consistency between the master data within - and certainly between - different source systems and organizational units. A survey conducted by The Data Warehouse Institute (TDWI) in October 2006 shows that 83 percent of the responding companies has (had) problems caused by defects in the master data and that 54 percent has gained benefits from having correct master data. Similar percentages (81 and 54) are mentioned for problems or successes with reporting and other BI facilities.
Master data systems - just like a "normal" system - can and will have defects in the data itself. But often, the problem is more serious and lies deeper. In large organizations, the ownership of notions such as customer, product, revenue, calendar and other core entities is just as splintered across the organization as business governance and ownership of supporting systems are.
Individual systems and reports for the direct users of such a silo application probably present the correct figures. But at the moment when a higher-up in the organization questions things such as, "What is the profitability of the sales of the 20 best sold products to the top 20 customers during the last quarter," and the answer has to come from different systems, the confusion can be considerable. Again, the blame will fall on the data warehouse.
However, solving this problem is compared to solving normal defects in data much more complicated. Master data, in its first instance, is an organizational issue. Who is owner of which element of master data, who are involved in defining master data, which source system is leading and which other systems need to follow and need to be synchronized and how do we ensure that the data warehouse holds "the golden copy?" The enterprise data warehouse might be the place where the problem becomes visible, but solving it is a responsibility of the business organization facilitated by proper IT systems.
The success of an enterprise data warehouse can be measured with the disappearance of all kinds of "owned" systems and the quality and use of the information in the data warehouse. The road to implementation is long and with a lot of pitfalls, but success is not impossible. The ambition of this article is not to provide a recipe for success but to shed some light on a number of the ingredients.
The author wishes the readers a lot of patience, endurance and success in their quests to happy users of the BI facilities within their organization.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access