This month we continue where we left off in the last column – undertaking our data warehouse design through a process of requirements gathering. Based on the size and complexity of the proposed approach, requirements gathering can be done using a number of different methods separately or in combination. However, the end result should be the same – a dimensional data model showing the logical structure of the database design, a process model showing the types of business activities which are to be supported and the view/form the information should take on the user’s desktop. The types of activities that can be employed in requirements gathering include:
- Dimensional data modeling
- Process and context modeling
- Story boarding
The sequence of how these techniques are applied is:
- Background materials and systems research and assessment.
- Brainstorming and/or interviewing.
- JAD which includes data and process modeling.
- Prototyping and revision of the data and process models.
Based on whatever analysis methods we choose, the focus here is to develop our understanding of what is required by:
- Establishing an understanding of the business (process modeling).
- Understanding how deep or detailed this analysis needs to be which will set the grain of our fact tables and surrounding dimensions (data modeling).
The data model provides:
- An understanding of the core business properties (dimensions).
- The essential knowledge to be analyzed (facts).
- The level of detail (the grain of the base fact tables).
- How core business objects need to be shared (conformed dimensions).
- The business meaning of each dimension, fact and data item.
- A user view of the data which can be immediately recognized and utilized by the business (the star schema model).
The process model provides:
- The ad hoc process in accessing data at any level in the warehouse.
- The data quality audit processes in verifying data loading into the warehouse.
- The data access authorization (security) processes in governing access to the various levels of the data warehouse.
- The change and problem management processes required to support access to the data warehousing environment.
The basic mistake made is that much like in OLTP-systems analysis, process and data analysis are undertaken as separate and disjoint tasks. It is a crucial requirement that process and data modeling be done together. That is, we usually start with a context model or high-level view of the area of analysis. This model shows the major players and interfaces (source systems) which will feed our data warehouse. The next step is to drive out the essential business activities that form the candidate fact tables of our warehouse design. For example, in an insurance-based data warehouse, the processes being modeled may include fraud detection, claims processing, invoicing and collections. These four business activities may eventually form candidate fact tables called "fraud," "claim," "invoice" and "collection." Once these key business activities are understood in terms of focus, frequency and content, the corresponding dimensions can be identified and modeled as our star schema design. Once we have identified all the key business events, we can cross-check them against the identified fact and dimension tables to be sure that all processes have a star schema model view defined for them and all stars we have modeled actually will be accessed by a business activity as identified in our process analysis sessions. To allow this cross model validation to occur, we usually flip back and forth between process and data analysis sessions with the user group until we complete the analysis of the business activities included within our scope and constrained by our context model which forms the "fence" for our analysis (and keeps us honest in terms of digressing from our stated scope). Next, we combine all our process-centric star schema views into one overall model which share our now "conformed" dimensions across all the fact tables at the various levels of granularity which fall within our scope as illustrated in Figure 1.
Develop user views containing fact tables of interest for each user department/function
The user-view star models can also be used to understand what data sharing will be required by the various user groups and the degree of data sensitivity. The final step in our analysis is to confirm the levels of aggregation required to satisfy our requirements based on the types of analysis being done, the volume of data and the number of required levels of aggregation.
In summary, once completed we should at least know:
- The grain or level of granularity for each of our fact table (s).
- The number of conformed (cross-subject area) dimensions.
- Overall size for each dimension and fact table.
- Initial number and type of aggregates.
- The types and number of the predictable queries to be run against the model and their frequency.
- Currency and security regarding data for each dimension and fact.
- Validation in that each attribute in our model is referenced or used by a process object.
- A view into what will be required in terms of source system data to populate our model.
- The types of analytic tools to provide the users access to the information as contained in our model.
Business requirements analysis needs to cover these essential activities in a sequence of the following tasks:
Conduct Business Requirements Analysis
|1) Schedule end-user requirements gathering sessions. |
2) Develop subject-area data model (in concert with the business requirements sessions).
Next month we will continue this discussion by reviewing source system assessment, the second of five major activities we need to accomplish as part of data warehouse design.
Richard J. Kachur is the author of The Data Warehouse Management Handbook (Prentice Hall, 2000). Kachur is the lead data warehouse architect for TMP Worldwide, a leading provider of Internet, data warehousing and data quality engineering solutions. His diverse 23-year background includes systems methodology consulting,project management, information systems planning, data administration, business process reengineering, customer relationship management, enterprise data modeling, object oriented analysis, data warehousing, data quality management, repository administration and case technology management. His data warehousing background covers government, insurance, oil and gas, retail, telecommunications, transportation and utility corporations. For further information on Kachur's background or available publications, please contact him at email@example.com.