MAY 25, 2006 1:00am ET

Related Links

Which CDC method is the best to achieve staging database with changed data?
March 7, 2008
Apart from bloated dimension, what are the negatives of using all known attributes in your SCD?
March 7, 2008
When is it better to have normalized data to create data marts and when is it better to have dimensional data?
March 7, 2008

Web Seminars

Getting Started with Big Data
Available On Demand
Transactions & Interaction: The Correlation of Structured and Unstructured Data
Available On Demand
Deliver Better Enterprise Data through Better Reference Data Management
Available On Demand

Could you please provide tips on getting tables into our data mart?

Print
Reprints
Email

Q: We are in the process of designing a sales data mart for a retail company. We have a source table with flex fields. The name and type of the fields are dynamically updated and stored in configuration tables. The values of the fields are stored in another table. We need to get these tables into our data mart. We are planning to store these as key value pairs in our data mart. Any tips on this would help.

Tom Haughey's Answer:

First, a key value pair is a data model abstraction structure which stores a generic value that is interpreted using other data type and definition tables. Key value pairs will be useful under two circumstances: there are many flex fields and the flex fields are widely variable. If there are few fields, say three, then they could simply be inserted into a table as a small array of columns, together with a Data Type and Units to define their meaning. Even if there are many, as long as all or most flex fields occur all the time, it is possible to have even a large set of columns within one table with each repeating column dedicated to one flex field. If the flex fields vary a lot, then it is better to use key value pairs to avoid the possibility of having many empty columns.

Key value pairs are one way to solve your requirement and are very flexible. These key value pairs can be in a separate table, or part of an existing table. If the number of flex fields is few, they can be put into an existing table, and the domain of these contained in a separate code table. These domain code tables would have multiple relationships to the data table in question, one for each flex field. Alternatively, they can be put into a separate key value pair table. This would be a generic table. The advantage of either of these approaches is flexibility. The disadvantages are performance and understandability. Performance may suffer because of the degree of interpretation necessary to use these columns; in practical terms, it means the number of joins to use them. Understandability can be an issue in two ways. First, SQL uses the column name for the displayed column name. In this case, the name will be some generic name, say, FlexField. Your SQL would have to be very sophisticated to be able to display the instance of data in the flex field. Second, the SQL to use this will be somewhat complex. Of course, once it is written once, everyone can use it because it is generic.

Figure 1 is an example with the value pair within the data table:

  • A single attribute with different domains and other qualifying characteristics
  • Variable attribute is probably defined as character and of the maximum length.

    Figure 1.

    Figure 2 is an example with a generic key value pair:

    • Multiple attributes with different domains
    • Variable attribute is probably defined as character and of the maximum length.

    If the number of flex fields is few, then a simple choice is to create fixed columns in an existing table, one for each possible domain of the flex field. "Few" is a relative term and you have to be the judge as to its value. It could be that even a dozen flex fields would not be too much. For any one instance, several of these fixed flex fields might be empty.

  • Tom Haughey is the president of InfoModel LLC, a training and consulting company specializing in data warehousing and data management. He has worked on dozens of database and data warehouse projects for more than two decades. Haughey was former CTO for Pepsi Bottling Group and director of enterprise data warehousing for PepsiCo. He may be reached at (201) 337-9094 or via e-mail at tom.haughey@InfoModelUSA.com.

    Advertisement

    Comments (0)

    Be the first to comment on this post using the section below.

    Add Your Comments:
    You must be registered to post a comment.
    Not Registered?
    You must be registered to post a comment. Click here to register.
    Already registered? Log in here
    Please note you must now log in with your email address and password.
    Twitter
    Facebook
    LinkedIn
    Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
    FOLLOW US
    Please note you must now log in with your email address and password.