I am working in the HR Data warehousing Application in Ford Motor Company. We have a situation in our HR DW application where our source system is Peoplesoft HR. This system maintains an effective date in all the lookup data tables and employee job records tables. This effective date is used for maintaining history (time snapshots) in the source system.

We are planning to follow a CIF architecture in which we bring all the data in the warehouse in normalized form and then create data marts on top of it to satisfy end user's requirement. The source system is linked to the data warehouse's staging area using Oracle's Advance replication. So any data changes occurring in the source system automatically apply changes to the DW's staging area which also stores some timestamps (indicating when a record was changed in the source system). These timestamps will be used by ETL routines to process only changed records.

Should we be maintaining history by using source system updated timestamp as a part of the unique key (and avoiding any updates) or should we just use the effective date (provided by the source PSHR system) as a way to snapshot our lookup tables? Some people here are very sensitive about updating any existing lookup record in the warehouse table.

2) The 2nd question I have is regarding the merging of 2 lookup tables (which exist in the warehouse) to form a dimension in the data mart. I have not seen a solid design technique anywhere which lists the details about how to merge 2 effective dated normalized tables (or even more tables) into a single denormalized effective dated dimension. The only way I can think of is that we create an effective date and expiry date in the dimension and create multiple records in the dimension based on different time intervals existing in the lookup warehouse tables. Will this approach work?


Chuck Kelley's Answer: Whew... that's a long question ... Here goes.

1) I think you are making this way to hard. The question is do you want to keep history of all changes or not. It seems like you do. Therefore, you have a few choices. A) Treat each of the changes as a new row, B) only keep the latest version of the row, or C) group all of the days changes into a single new row. In each of these choices, there is no updating of the row but only adding new rows.

2) If what you are doing is trying to combine normalized structures into dimensions, I think it can be done easier. If there are relationships between the two structures (for example, category {category_code, category_desc} and sub-category {subcategory_code, subcategory_desc, category_code}), then you just need to tie the two together product {category_code, category_desc, subcategory_code, subcategory_desc} where they are joined by category_code.

If what you are trying to accomplish is combining two structures that have no relationship, then I would have to ask "Why are you doing this if the data is not related?" Good luck!

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access