I was recently at a data warehouse event, and the sign on the door read "Data Wherehouse." While this was an obvious spelling error, it also points out one of the benefits of the data warehouse—the data warehouse provides the business community with the information concerning the location of the data. The data warehouse does this in a number of ways. Where Did the Data Originate? The first answer to "where" is provided by the source system analysis. Source system analysis entails a number of steps, the first of which is identifying the location of data that could be used as a source for the data warehouse. While there is usually an obvious source for data, the search should include a review of the less likely sources so that the integration process can be thoroughly designed. For example, if customer data is needed within the data warehouse, the obvious choice is the customer master file (or files). This search for possible sources should continue, and it may uncover small marketing or sales applications that are used to enter or modify customer data in disparate files. The more complete search for where data exists helps improve data quality within the data warehouse and also identifies opportunities for improving the operational data collection process.
Once all the potential data sources are identified and reviewed, a decision needs to be made concerning the "system (or systems) of record" to be used for the data "wherehouse." Once that decision is made, "Where did the data originate?" can be answered.
Where Has the Data Been? Where the data has been provides information on the transformation process. The transformation mapping process entails specifying all the processes that need to be applied to the source data prior to bringing it into the data warehouse. These processes include cleansing, integration, modification and reformation. Information relative to cleansing includes the criteria used for accepting or rejecting data and the filters applied. Information relative to integration includes the criteria used to select the source to be used for each element and the resolution of the different sets of identifiers. Information about modifications includes the default values applied and codes that were changed to conform to a universal set required for the warehouse. Information about reformation includes field length, data type and domain adjustments. Information about where the data has been provides a better understanding and better equips the business user of the data "wherehouse."
The location and manner in which the business user can actually retrieve the data is the answer to another aspect of "where." With a data warehouse, the business user is presented with many alternatives, and guidance to select the best one is very important.
Very often, users repeat the same analysis on a regular basis, and their needs can be met by preset routines. For these applications, a directory of the available queries, reports and data "cubes," along with information for using them is extremely useful. The sophisticated users who research data that resides in the atomic data warehouse also need a map. For these analysts, a directory that identifies the data elements, their definition and their physical location is critical. Armed with this information, they can explore the wealth of data in the "wherehouse" to further meet its strategic objectives.1
Where Is the Data Being Used? For the data warehouse administrator, where the data is being used is extremely important. The data warehouse administrator is charged with ensuring that the data warehouse continues to meet both its business and service-level objectives. To perform this job, the administrator needs to know the organizational and physical location of the data warehouse users, the physical location of the data warehouse itself and the applications that use data from the warehouse.
The organizational location of the data warehouse users, combined with information about the applications, helps the data warehouse administrator understand the usage patterns and anticipate changes. Armed with his or her knowledge of how a data warehouse can help businesses leverage information and an understanding of the existing applications, the data warehouse administrator can help identify opportunities to gain additional benefits. The physical location of the data warehouse and information about the usage patterns help the administrator identify opportunities to adjust the physical structure of the warehouse and to continuously improve its usability and performance.
No discussion of "wherehousing" can be complete without a mention of meta data. In his recent article about meta data management, David Marco describes the meta data architecture that is essential to help both the business and technical community locate meta data.2 Meta data is much more than data about data. The expanded definition of meta data includes information about: where the data originated, where the data has been, where the data is available and where the data is being used. This information helps provide the business users with the understanding needed to effectively apply the data "wherehouse" for strategic advantage.
1 Inmon, W.H. "Profiling the DSS Analyst," DM Review. March, 1995. P. 58. In this article, Bill Inmon differentiates between these two types of users, calling the first set "farmers" and the second set "explorers."
2 Marco, David. "Managing Meta Data," DM Review. March, 1998. P. 58.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access