This month, we’ll continue our review of the data warehouse analysis and design process for decision support systems. Unlike traditional waterfall systems development processes, data warehouse design is iterative – not only do we rapidly go through the analysis-to-build process, but the transition from development to production is different as well. The important point to remember is that we are still building a production system for business analysis. This means that even though we are moving through the analysis-to-build process in a rapid iterative form, we still must comply with all of the due diligence processes required to properly certify the data warehouse for production. How should we do this? To complete the process of DW design and production readiness, we usually can delegate the work to two project teams. The first deals with the design of the data warehouse database, extraction and presentation layer; the second will address technical infrastructure, production control, testing and certification, end-user training, and production handover and sign off. The aggressive time frames inherent in today’s design of operational and analytic information systems dictate that parallel work streams become the norm. The infrastructure team needs to get the development environment set up for the analysis team as an initial activity to be followed by the definition of the unit and acceptance testing environment, production preparation and hand-over of documentation and training. Once the development environment is defined, the analysis team can proceed with the process of gathering the user requirements and the initial design of the following:

  1. Database design and sizing for the data warehouse.
  2. Design of the extraction, transformation and load process.
  3. Identification of the various types of analysis requirements (reporting, data mining, drill-down/ad hoc requirements).
  4. Performance metrics in terms of potential number of users, geographic location, volume and types of information requests to be expected.
  5. Change management considerations in terms of culture change and impact on the end-user community.
  6. Change management in terms of culture and change impact on the IT community (which can be more of a headache than the end-user scenario).

While all this is occurring, the infrastructure team proceeds with the design and deployment of the following:

  1. Required hardware in terms of platform, application and data-server identification.
  2. Software in terms of database, ETL and OLAP vendor selection and product deployment and certification.
  3. Network requirements in terms of local and remote access and overall network design enhancement (if required).
  4. Licensing and deployment issues of the required technology.
  5. Production and change control.

When starting the analysis process, a number of factors need to be considered:

  • Data usually exists on many different platforms that must be combined into a common format.
  • A subset of relevant data must be determined from these disparate sources.
  • Query requirements need to be collected and quickly reviewed/revised.
  • Different output formats for the same data are often required to address the needs of critical business activities.
  • The mechanism selected to perform this task must integrate within the current suite of office automation products.

The need for this technology stems from knowledge workers needing to get information from their current production systems even though little or no help is available from IS, queries against the data need to be asked in business terms and questions are often ad hoc or nonprocedural in nature.
What differentiates data warehousing from operational systems design is that the process of data collection and operational reporting can be precisely determined. We also need to ensure during our requirements gathering process that we understand what it is we need to build. More than 90 percent of all initial user requirements will be operational-reporting based. If the client is not in marketing or senior management, he/she will ask for operational reporting – analysis and forecasting of business based on daily, weekly or monthly needs – not long-term trending and business direction types of enquiries. The general rule of thumb is to assume an ODS will be required and not a data warehouse the first time out of the gate. If we are designing an ODS, we will build a data store which is:

  1. Not time variant (limited time window for review and analysis).
  2. Integrated (in terms of data collection and consolidation).
  3. Volatile ( we will overwrite data on a continual basis and not retain history).
  4. Subject oriented (common profiles refined from collection of subsets of same topic information).

What we will require as a common building block for this environment is a data staging process where we can start thinking about the ETL process and how it will evolve as our data mart or ODS grows and integrate it within an overall data warehouse architecture. Again, both the infrastructure and analysis teams need to put their heads together to determine how much ETL is enough for the current set of project requirements and what to consider as we move forward.
Next month, we will continue the process of analysis and design of decision support data structures and related processes.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access