William wishes to thank Dan Stanford for his contributions to this month's column.
Once you have completed your requirements analysis as described in Part 1 of this column in the November issue of DM Review, you should be able to start developing a data model to meet the end users' requirements. You might also want to develop mock-ups of the screens that you believe would meet the requirements identified to be used with the end users for validation. You should test the data model by walking through how the model would meet each of the specific reporting or analysis requirements.
In the interview process, the objects mentioned as the way the data is organized are likely to become your dimensions. Slowly changing attributes of the dimensions become attributes of the dimension table specified in your model. Typical candidates for dimensions are objects such as customer, product, time, account, salesperson, organization, geography and channel. Attributes are objects such as color, name, address, year-to-date sales (might be a fact), asset/liability indicator, long description and parent. Facts or measures are typically numeric and change with time. They would include interest income, interest expense, non-interest income, non- interest expense, outstandings and balances. Calculated measures would include return on investment, total revenue, total expenses, net income and other measures that can be calculated based on your other facts.
For example, if the user's only requirement is to be able to review revenue by customer, product and time, you would have three dimension tables and one fact table. Each dimension table would have a key field for the unique identifier of that customer, product or time, as well as fields for all of the relevant attributes of the dimension. The fact table would have a key field for each dimension (three of these) and a single field for sales (fact). This could be depicted in an entity-relationship (ER) diagram as shown in Figure 1.
Figure 1: Entity-Relationship Diagram
Figure 1 is a classic example of a star schema. You need to develop a similar model based on the users' specific requirements. The actual model you develop will also likely incorporate the schema preferred by your online analytical processing (OLAP) tool of choice, such as a snowflake.
Once the data model has been developed, you must identify the source applications that can provide the data for the model. Working with the "owners" of these source applications, develop the plan and process to extract the data from the source and transport it to the target data model. If there is a single source, the process is greatly simplified. If there are multiple sources, there are potential complications. For example, do both applications code the keys for customer in the same way? Are there other differences in how the applications organize and store the data? Is there a difference in the frequency with which they are updated?
A process flow diagram can be helpful in explaining and detailing the various sources, processes and targets involved in the extract, transform and load process.
Generally, a table listing of the relevant files on the source application can be provided, and these tables can be mapped to the tables and fields identified in the modeling step. Any transformation of keys, summarization/aggregation of detailed transaction data or other data transformations can be documented on the resulting source-to- target mapping document. Data gaps should be identified and may result in the need for the source application to be modified, which can significantly impact the scope of the project.
We've reviewed the primary activities in performing a comprehensive requirements analysis at the onset of a data warehousing project. This process allows us to avoid one of the greatest pitfalls in a data warehousing initiative: delivering what we think the users want, rather than what they really want. This is not a field of dreams - they won't necessarily come simply because we build it. By investing the time up front, we can ensure that our initiative is business-driven and meets the requirements of the users.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access