Essential Steps for the Integrated Enterprise Data Warehouse, Part 2
If you plan to combine data across subject areas, these personality types will do the job.
Information Management Magazine, May 2008
In the first half of this article, published in the April issue, we saw that drilling across was the defining act of the integrated enterprise data warehouse (EDW). Remember that drilling across requires the integration of data from two more business process subject areas. We then defined two personality archetypes: the dimension manager and the fact provider. The dimension manager is a centralized authority who builds and distributes a conformed dimension to the rest of the enterprise, and the fact provider is the client who receives and utilizes the conformed dimension, almost always while managing one or more fact tables within a subject area. We are now ready to roll up our sleeves and describe exactly what the dimension manager and fact provider do. Advertisement The Dimension Manager The dimension manager defines the content and structure of a conformed dimension and delivers that conformed dimension to downstream clients known as fact providers. This role can definitely exist within a master data management (MDM) framework, but the role is much more focused than just being the keeper of the single truth about an entity. The dimension manager has a list of deliverables and responsibilities, all oriented around creating and distributing physical versions of the dimension tables that represent the major entities of the enterprise. In many enterprises, key conformed dimensions include customer, product, service, location, employee, promotion, vendor and calendar. As we describe the dimension managers tasks, we will use customer as the example to keep the discussion from being too abstract. The tasks of the customer dimension manager include: Defining the content of the customer dimension. The dimension manager chairs the design meeting for the conformed customer dimension. At that meeting, all the stakeholders from the customer-facing transaction systems come to agreement on a set of dimensional attributes that everyone will use when drilling across separate subject areas. Remember that these attributes are used as the basis for constraining and grouping customers. Typical conformed customer attributes include Type, Category, Location (multiple fields implementing an address), Primary Contact (name, title, address), First Contact Date, Credit Worthiness, Demographic Category and others. Every customer of the enterprise appears in the conformed customer dimension. Receiving notification of new customers. The dimension manager is the keeper of the master list of dimension members, in this case, customers. The dimension manager must be notified whenever a new customer is registered. Deduplicating customer dimension. The dimension manager must deduplicate the master list of customers. Customer lists in the real world are nearly impossible to deduplicate completely. Even when customers are registered through a central MDM process, it is often possible to create duplicates, either for individual customers or business entities. Assigning a unique durable key to each customer. The dimension manager must identify and keep track of a unique durable key for each customer. Many database administrators (DBAs) automatically assume that this is the natural key, but quickly choosing the natural key may be the wrong choice. A natural key may not be durable! Using the customer example, if there is any conceivable business rule that could change the natural key over time, then it is not durable. Also, in the absence of a formal MDM process, natural keys can arise from more than one customer-facing process. In this case, different customers could have natural keys of very different formats. Finally, a source systems natural key may be a complex, multifield data structure. For all these reasons, the dimension manager needs to step back from literal natural keys and assign a unique durable key that is completely under the control of the dimension manager. I recommend that this unique, durable key be a simple sequentially assigned integer, with no structure or semantics embedded in the key value. Assigning surrogate keys for the customer dimension. Type 2 is the most common and powerful of the SCD techniques because it provides precise synchronization of a customer description with that customers transaction history. Because Type 2 creates a new record for the same customer, the dimension manager is forced to generalize the customer dimension primary key beyond the unique, durable key. The primary key should be a simple surrogate key, sequentially assigned as needed, with no structure or semantics in the key value. This primary key is separate from the unique durable key, which simply appears in the dimension as a normal field. The unique, durable key is the glue that binds the separate SCD Type 2 records for a single customer together. Handling late arriving dimension data. When the dimension manager receives late notification of a Type 2 change affecting a customer, special processing is needed. A new dimension record must be created and the effective dates of the change adjusted. The changed attribute must be propagated forward in time through existing dimension records.* Providing version numbers for the dimension. Before releasing a changed dimension to the downstream fact providers, the dimension manager must update the dimension version number if Type 1 or Type 3 changes have occurred or if late-arriving Type 2 changes have occurred. The dimension version number does not change if only contemporary Type 2 changes have been made since the previous release of the dimension.
Page 1 of 3.







