Continue in 2 seconds

What Kind of Data Warehouse are We Going to Build?

  • November 19 1999, 1:00am EST

In last month’s column we focused on business process maturity and taking the pulse of the organization from a data warehouse readiness perspective. This month, we will focus on the first critical step in our data warehouse development life cycle process – the initiation and planning phase.

In setting up our data warehouse project, a number of perceptions and prejudices come to mind. The biggest, or at least most significant, perception which must be addressed is determining if we are really building a data warehouse. A data warehouse has four primary characteristics as defined by Bill Inmon:

  • Subject-oriented
  • Time-variant
  • Integrated
  • Contains nonvolatile data

If the focus of your user requirement is on management reporting and "doing business as" type of current state comparative analysis, then your database design will not be time-variant or integrated, since you will deal with a restricted set of source systems. If your user requires you to overwrite or make changes to data captured in the database, again for management reporting purposes, then the database will not be nonvolatile. If your user requirements focus on an operational flavor of reporting and analysis by drawing all their information from only one line of business set of systems for corporate data such as customer, product or location, then it will not be subject-oriented either. Your user group and your IT group may think they are building a data warehouse; but, in fact, they are building an operational database and, at best, an operational data store fed by a data staging area. In summary, the biggest mistake to avoid at this stage is to ensure that you do not apply data warehouse design principles to an OLTP or ODS database design. Again, if this is what your customer needs, then build the OLTP or ODS database, but at least you know what it is you are building. You and your users can still call it a data warehouse (however be cautious here). What you are actually doing is laying the foundation for the development of the real data warehouse to follow by raising the awareness level regarding the importance of data and testing the waters in terms of developing its first critical component, the data staging area.
To gain an appreciation for what you are building, the first stage in our data warehouse process should be spent in three areas.

  1. Conduct an initial process and data analysis to frame and understand the scope of requirements.
  2. Undertake an investigation into the available technical infrastructure to address the required hardware, network and software resources (and to help you set the bounds from an IT standards and cultural perspective on what can be acquired and when).
  3. Develop an initial assessment of the available source systems to develop our data staging strategy.

Other required activities include staffing and training a core team to our process, interfacing or developing the framework for a data warehouse project management office, and defining or assessing the degree of required change management procedures.
We have very little time to accomplish all these tasks since we need to push a data warehouse increment "out the door" and have it up and running in production mode in four to six months. This time frame will depend upon if we are building a dependent or independent data mart (the subject of another column), an operational data store or a data warehouse. To meet this time window, we must plan to complete staffing, project management setup and deployment, source system analysis, initial process and data analysis, and technical infrastructure assessment in four to six weeks!

With all of this to consider, it is no wonder that data warehousing projects are one of the most challenging areas in the systems development process. Other areas that need to be addressed are developing a meta data management approach or strategy to deal with how business rules and definitions of data in the warehouse are to be provided to your end user. We also need to consider how we will manage the technical meta data dealing with source system data translation and loading. But, we are not done yet. There is the other issue of benefits definition and ongoing measurement as touched on in last month’s column. To complete our four-to-six week process, we need to package our iterative design and deployment process in terms of a project charter and business case, stating what we are proposing to do, how much it will cost and the time required to deliver our solution. Developing a measurement mechanism to validate the value of the investment after the warehouse is deployed is just as important as the development process itself.

Next month we will continue our discussion of this first critical stage by reviewing a set of tasks to complete, time frames, resources and budgeting/costing considerations.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access