Consultants experienced in implementing data warehouses have become frustrated with market- leading product offerings. In recent years, the success rate of data warehouse implementations has risen dramatically; unfortunately, so have maintenance costs. While the value of a hub-and-spoke product architecture has never been clearer, early adopters are finding that the data movement logistics can be complex. Providing the flexibility and dynamic responsiveness necessary for a truly successful decision support system (DSS) environment requires an infrastructure tool that can deliver visibility and control across the entire DSS landscape. Figure 1 shows a conceptual view of a hub-and-spoke framework. There are two types of entities at the end of the spokes: source systems that feed source data to the warehouse and user-oriented systems that are fed data from the warehouse.
Ideally, rather than imposing intrusive extract programs on data sources, programs are built to push the data to the hub. The hub then handles the transformation, loading and archival of the data. In addition, and this critical requirement is sadly not addressed by most solutions, the hub manages the dispatching of data to end-user data marts, OLAP tools and directly to user tools. This allows for the monitoring and maintenance of data through the entire information supply chain. Through meta data and some coordinating user interface, the entire information supply chain can be centrally managed, coordinated and maintained.
Hub-and-spoke takes a framework or infrastructure approach to data warehousing. This is contrasted with the more traditional approach of using non-integrated, best-of- breed point solutions. While point solutions can often allow you to implement individual pieces of a data warehouse architecture successfully, the lack of integration most often makes maintenance a costly nightmare and responsiveness an impossibility.
Both META Group and Gartner Group have argued strongly for the need for a hub-and- spoke type of solution. META Group, for example, has written extensively on hub- and-spoke data warehousing architectures, the Information Mover Infrastructure (IMInf) and the information supply chain.
History has shown that successful data warehouses must focus first and foremost on the ability to meet the ever-changing information delivery needs of end users. In the information supply chain metaphor, the end user is an information customer. The data warehouse architecture must include the ability to manage on demand the flow of new and changed data out of the original source system, through all of the transformation, cleansing and loading steps, through the database and out to the end user.
In other words, the whole purpose of data warehousing is to get the right information to the right persons in the right format at the right time. The real problem is that it is impossible to predetermine what the right information is, who the right users are, what the right formats are and when the right times are. Unlike traditional operational systems development, these issues are at best partially predictable and are ever-changing. So the challenge is to implement an architecture that optimizes getting the most information to the most people in the most efficient manner and at the least possible cost both initial and ongoing.
The iterative nature of data warehouse access, combined with the fact that the use of any data warehouse system is optional for end users, makes the ability to keep users interested by being responsive of paramount importance for long-term data warehouse success. It is impossible to reach the full potential for responsiveness without an integrated architecture that provides both visibility and flexibility across the entire information supply chain. Companies that focus on initial construction of the data warehouse without consideration for the architectural integration required to effectively manage such a dynamic environment either fail to realize the full potential of the data warehouse or fail completely.
Visibility across the supply chain is critical for tuning, change management and impact analysis. Flexibility across the supply chain is critical for responsiveness as changes occur. These cannot fully be achieved with a suite of non-integrated data warehousing tools, as they require a centralized view and control of the entire supply chain.
Of course, any data warehouse architecture can be sufficiently flexible given limitless resources and time. Once you buy into the need for responsiveness in a data warehouse architecture (and, therefore, the need for flexibility and visibility), you begin to get a different view of the true costs of ownership, as follows:
Cost of initial construction.
All data warehouse architectures incur the cost of constructing the first released version.
Cost of change.
When changes need to occur to be responsive, an integrated architecture will minimize this cost. If you have to regenerate or change code, it costs significantly more than if you only have to reconfigure meta data. Likewise, if the change affects several different tools used at different points in the information supply chain, you will have to go to each tool individually to understand the impact and make the change. This can be extremely time-consuming and costly; and, in many cases, you must make the change no matter how costly (e.g., a custom source system gets replaced by an ERP package).
Cost of not changing.
This cost is the most frequently overlooked. If an end user needs information that will cause a change somewhere along the information supply chain, companies often make the mistake of assuming there is no cost if they simply do not make the requested change. The cost is that you lose any business advantage you may have gained by giving the end user what they needed. This forces a "lesser of two evils" approach to data warehousing, where you must compare the cost of change with the potential benefit. This is tricky at best and can be eliminated if you have an integrated architecture that can minimize the cost of change.
Cost of changing too slowly.
The speed with which change can be effected is directly related to your ability not only to implement the change, but also to see the impact of the change. A non- integrated tool suite will take longer to effect change and usually does not allow you to see the impact of a change at all.
The true cost of ownership is the sum of these four costs. Integrated product suites strive to minimize the cost of initial construction and the cost of change, and to eliminate the cost of not changing or changing too slowly. Any collection of non-integrated, point solution tools will fail to minimize all of these costs.
What matters most to data warehouse end users is responsiveness (How fast can it deliver what I need?), meta data (What does the data in the warehouse mean in terms I can understand?) and data quality (Can I rely on the data being correct and know when it is not?). Consider the following scenarios, assuming the initial data warehouse has been constructed and deployed using a non-integrated tool suite:
Data source change.
Suppose the financial system feeding the warehouse is ported to a new ERP system or an operational procedure is rewritten. In the process, new data items are added or one of the fields changes from a character field of size 10 to a numeric field of size 15. While it may be fairly easy to implement this change on the supply side using an ETL tool, what is the impact on the OLAP database? How many canned reports are affected? What spreadsheets no longer work? These types of change become extremely costly when no centralized tool exists to inform you of the effect of the change, and each system must be considered separately. Further, you probably cannot even see all of the data consumption activity that depends on a certain data structure which has now changed. Your choices are to make the change and break some of the data delivery mechanisms on which the end users depend, or to send out a memo notifying everyone that the change will take place at some future time. Neither choice can be considered end-user responsive.
Physical schema change.
Often, the physical schema of the data warehouse must change, not only due to changes in the source data but for tuning reasons or to optimize the structure for certain kinds of end-user access. For the same reasons listed earlier, the impact of this change is difficult, if not impossible, to assess without an integrated tool suite that contains a centralized tool with visibility into everything that touches the data structures to be changed.
End-user requirements changes.
While performing the iterative analysis so common in data warehousing, end users realize they need several new data points not currently in the data warehouse or they need data delivered to them in an unsupported format. How quickly can you meet the request? Again, you must know how the required changes will affect other tools, processes and end users across the entire information supply chain.
Processing problems in the nightly load.
As you load the data warehouse on a periodic basis, how do end users know the quality of the data or if the loading process encountered problems? With a non-integrated tool suite, users querying the data warehouse using an OLAP tool, a reporting tool or a spreadsheet, for example, have no visibility into the status of the most recent loads. This can only be fixed with custom coding, unless you use an integrated suite.
These are just a handful of examples of the dynamic nature of a data warehouse that constantly pressure you to make changes to remain responsive to your end users. In the worst case, you have to make the change regardless of the cost. In the best case, you have to compare the cost of changing versus the cost of not changing. In either case, you want to use a tool that enables you to manage this change at the least possible cost and without making sacrifices as to the reason you built the DSS in the first place to give users the information they need to keep your company competitive.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access