In last month’s column, we focused on the first critical steps in our data warehouse project concerning staffing, planning, technical infrastructure and project communications and marketing. Now we will focus on the foundation activities which cover the analysis of user requirements and assessment of the available deployment options and approaches. These key activities include:

  1. Requirements gathering
  2. Source systems assessment
  3. Data quality assessment
  4. Technical infrastructure assessment
  5. Business value benchmarking

Requirements Gathering

Based on the size and complexity of the proposed approach, requirements gathering can be done using a number of different methods – separately or in combination. However, the end result should be the same – a dimensional data model showing the logical structure of the database design, a process model showing the types of business activities which are to be supported and the view/form the information should take on the user’s desktop. The types of activities that can be employed in requirements gathering include:

  • Brainstorming
  • Interviews
  • Dimensional data modeling
  • Process and context modeling
  • Prototyping
  • Story boarding

Once completed we should at least know:

  • The grain or level of granularity for each of our fact table(s)
  • The number of conformed (cross-subject area) dimensions
  • Overall size in rows and gigabytes for each dimension and fact table and loading requirements
  • Initial number and type of aggregates
  • The types and number of the predictable queries to be run against the model and their frequency
  • Currency and security regarding data for each dimension and fact
  • That each attribute in our model is referenced or used by a process object

Source Systems Assessment

Source systems assessments can be done a number of ways. The point here is to determine the best source of data (our system of record) for each fact and dimension in our warehouse as well as the best method for extracting this information on a regular basis for populating our data warehouse database. Whether we use automated tools or develop our own extraction routines, we have to decide how often we need to extract source system data to our data staging environment for data preparation and loading into our warehouse. We must also determine how this can be done without causing a severe performance impact on the source system whether it is an operational system, operational data store or another data mart or warehouse. If the data source is an OLTP system then some of our options are:

  1. Write routines directly against the source system to extract the data (not good due to performance considerations)
  2. Read the systems journal files for record changes (may not be an option based on the type of DBMS used by the source system)
  3. Copy the source system DBMS into a staging area and run a comparison against the last period’s dump for any changes (may not be possible due to available disk or complexity/timing of records to be identified for update into the warehouse)

As we can see the source system identification and extract process is not easy, and it is further complicated by the different types of systems we need to access. We may also have to access more than one system for information that will eventually reside in one unified dimension (again stressing the need for a separate and distinct data-staging environment to sort all this).

Data Quality Assessment

A data quality assessment is often done in conjunction with a source systems assessment to determine the general level and reliability of the source data. Data models and database DDL will not tell us what physically is stored and available. We must conduct data sampling to determine the overall level of quality that exists within our proposed systems. Often data reengineering tools are used to audit these systems quickly and give us a quality/conformance map of our source data. Other places to look are in the problem and change logs for each source system to quickly determine its general level of stability/availability and usage across time.

Technical Infrastructure Assessment

While our analysis team delves into analyzing the business and determining user requirements, in parallel our technical team needs to conduct a gap fit on what is possible from a hardware and software perspective. Often data warehousing entails the deployment of advanced server and data storage array technology. It may also introduce new reporting and analytic tools and extensions to this in terms of data mining software and servers requiring a radical change to the existing portfolio of IS technology. What is even more important is the potential impact on the various IT support groups which will be needed to support/enable the process (i.e., hardware and network, change control, help desk and maintenance). Therefore, in assessing what is available against what is needed, a gap fit for each layer of our architecture needs to be conducted. This gap fit should cover:

  • Hardware and operating system technology
  • Application development software and end-user decision support technology
  • Data movement and storage/replication
  • Meta data management
  • Network and communications
  • Facilities

For each layer of our architecture we must determine what we need, what we have and what we need to acquire. Other considerations include the impact of change of each layer of our architecture on each other and the potential impact of introducing vendor products.

Business Value Benchmarking

Requirements gathering and verification, source system and data quality assessment, and technical infrastructure review will not mean a whole lot to us unless we are able to justify and prove the validity of the proposed investment to the business. As part of this process we must not only be able to determine what to measure, but how and when so our data warehouse can be monitored on an ongoing basis once in production. This process is often left to the end, not done at all or only to a limited extent. Yet it is a critical success factor to the business. Our CSFs must be established based on what it is we are trying to achieve. If we are building a data mart or operational data store our CSFs will be radically different from an enterprise data warehouse view of the business. Data marts focus on department or, at best, line-of-business value mechanisms while operational data stores focus on short-term corporate memory solutions involving operational decision making. Data warehouses look for more strategic or long-term business positioning types of value propositions and are, therefore, harder to determine, monitor and justify on an ongoing basis.

Next month we will continue this discussion by diving into each of these five areas in more detail. For a more complete description of this process, please contact the author at horse.musket@attcanada.net.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access