Editor’s Note: The first part of this series was published on July 20, 2000, and can be found in DMReview.com’s online columnist section ( http://www.dmreview.com/onlinecolumnists/).
This month we will continue our review of the data warehouse analysis and design process for decision support systems. As discussed in last month’s column (Data Warehouse Design, Part 1), we are usually on a very aggressive time frame to develop and populate a user friendly database to enable our users to make informed decisions about their business. How should we proceed? Listed below are the more traditional methods for gathering requirements. Some of these could prove valuable to collect requirements and gain an understanding of what is needed. The more popular techniques include:
- Analysis package distribution and review
- Assumption surfacing
- Creative dynamics
- Data sampling
- Group synthesis
- Joint applications development (JAD)
- Resolution dynamics
- Creating a storyboard
A selected subset (two or three) of these techniques that are used in parallel or in sequence can provide a better understanding of how the business works as well as develop a framework (scope) of implementation. Another technique for gathering requirements, which is driven by the users themselves, is to develop an exploratory warehouse. By using an extraction-transformation tool coupled with a browser capable of viewing meta data structures of source systems, the user selects what is needed to solve business problems. The resulting ETL loads an interim database with the answer set for querying and reporting. By viewing the SQL generated by these user queries over a period of a few weeks, we can quickly gain an understanding of what really is required (versus what the users told you they thought they needed). The main thing here is thinking outside the box. Challenge your requirements teams to come up with rapid analysis methods that can lead to quick star schema designs and process requirements in terms of the breadth and depth of possible queries. What we need to determine during this period is how to gain just enough knowledge to get busy with our first star schema database design and to get the ball rolling on data extraction and transformation to load our test database (the hardest part of the job).
During the analysis/design phase of our project, our requirements team will be challenged with determining:
- Business process scope (number of discrete business events which require analysis). This is key input to determining the number and grain of each fact table in our database design.
- The number of dimensions or points of reference to interrogate our answer set as contained in the fact tables. Remember, we will have to create a new dimension table called TIME to provide searching across any special events such as special promotions, financial or marketing cycles, etc.
- The grain of each fact table (read unique business events) that we wish to analyze.
- Business measures to quantify around the development of each dimensional star schema model (its cluster of fact tables and associated dimensions).
- Number of potential concurrent users, their location, frequency and volume of access for the more predictable patterns of analysis (not ad hoc).
- Business value over time using a generally accepted accounting process such as ROI or NPV and/or employing a balanced scorecard to come up with some way to deliver and track the value we are providing by deploying this application.
- The types, nature and flexibility of information views that are to be provided. Are we providing text reports, graphs, charts etc.? What about browser-based interrogation of our data through an intranet? How do the various decision support tools stack up when confronted with a wide and varied list of presentation requirements? What limitations exist within each OLAP product in terms of how they provide an external schema view to the user? Can they insulate our data warehouse database from direct access by providing a layer of meta data between the DBMS and the end user? Does our DBMS and/or OLAP product offer a query navigator that can redirect the request to summary tables if an executive or high-level summary query is submitted instead of hitting the base fact and dimension tables each time?
- What kind of data sampling can we undertake to verify the provided meta data in the data models or database catalogs? How far apart are we in terms of what we think the data is and what is actually stored? We need to start extracting and viewing the data to be loaded into our warehouse to determine if the quality level is within an acceptable range. If not, what are we going to do about it in terms of managing our clients’ expectations? If the data is to be sourced from multiple departments, how difficult will it be based on the politics of the organization to gain consensus on the definition and structure of each dimension (where the bulk of our data quality problems hide)? Is the requested data even available? Will we have to create new information systems to collect this data? What about the quality of external feeds from third parties and outside sources? We need to sample and review these data sets as well.
As we can see, we have a lot of work ahead of us during the next few weeks of the project. We need to gain a sufficient level of understanding to either develop the application and/or refine its scope based on our findings.
Next month, we will continue the process of analysis and design of decision support data structures and related processes. For a more complete description of the process and deliverables discussed in this month’s column, please contact the author (firstname.lastname@example.org).
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access