MAY 1, 2006 1:00am ET

Related Links

Visiting Nurse Service Cares About Cloud Security
October 25, 2011
Light at the End of the Silo
October 28, 2010
Pitney Bowes Releases Enhancements to MapInfo Professional
September 13, 2010

Web Seminars

The Big Deal About Big Data Governance
May 22, 2012
Getting Started with Big Data
Available On Demand
Transactions & Interaction: The Correlation of Structured and Unstructured Data
Available On Demand

Ask the Experts

Print
Reprints
Email

Editor's note: This is a sample of the questions asked of and answered by our illustrious panel of experts. Find questions archived by topic at http://www.dmreview.com/editorial/online/ate_archive.cfm.

Q: What is the current state of data warehousing and data mining in terms of its widespread use, its growth and technological advances? And also what are likely to be the next biggest breakthroughs in data mining technology?

Anne Marie Smith's Answer: The current state of data warehousing (DW) and data mining has been influenced greatly by the recession and technical spending slump, but it has also been influenced by the advances in technologies and in the increasing need for reliable, robust and timely data for decision-making. This combination of forces has led to many companies re-examining the DW and data mining projects that they began in the late 1990s and early in this decade - with a view toward expanding the use of available operational data for decision-making. These companies are enhancing their DW programs and may be adding new data for analysis, new functionality for their users and, perhaps, bringing the DW to more departments if it is an enterprise warehouse. Companies that began data mart projects are split between those who are staying with data marts and either expanding them or creating new marts for new areas and those who are examining the need to move to an enterprise DW. Some new DW projects have been started by companies who avoided DW's in their first incarnation, and it seems as if these new projects are split 50/50 between enterprise-oriented DWs and data marts. Data mining technology is benefiting from advances in tools (graphical, increased query power, more support for unstructured data analysis) and in performance and storage of the servers used for these operations.

(Published 4/10/06)

Q: We have a raging debate going here about whether to load null values into the data warehouse or not. One side argues that nulls give misleading query results and that we should translate nulls to some kind of 'dummy' value. The other side insists that loading artificial data violates the integrity of the DW and also carries a cost in terms of storage coonsumption, performance, and data management. What's the best practice approach to loading nulls into the DW?

Joe Oates' Answer: It is been my experience that loading nulls into a data warehouse is generally not a good idea. You don't say whether the issue of nulls pertains to fact tables, dimension tables or both. I am going to assume that you are referring to fact tables.

For fact tables, there are two situations that need to be addressed. One is having nulls where you're expecting an integer or a decimal value such as the number of items sold or totals. The other is when doing surrogate key replacement no valid value is found in the appropriate cross-reference table.

In the case of having nulls for your expecting an integer or a decimal value such as number of items sold or totals, substituting zero for null is no worse than storing the null and not including the row(s) in a calculation. In my opinion, it also gives more accurate results in counts and other functions that can give misleading or incorrect values when nulls are present. You can always run reports to identify rows that contain unexpected zero values. It is unusual for a customer to buy zero products or to buy some number of products and be charged to zero dollars and cents.

If the issue is that a valid customer ID, product ID, etc. could not be found so that proper surrogate key replacement could be done, a strategy of providing a role in each dimension table with the ID of zero and a description field set to "unknown," or some similar value will allow you to take these source system transaction rows into the data warehouse and allow routine queries to be run without having complex logic to handle null values. Also, you can run a separate report after each ETL run showing those transactions with invalid source keys. If you don't want to do that, when reports are run there will be grouping headers with the title of "unknown."

Either way, the end users will be alerted to which transactions in the source system need to be corrected.

Clay Rehm's Answer: For each field, a list of all possible values or range of values must be defined in your metadata repository/data dictionary, which includes what a null means for that field. Having said that, why not have a value of "No Value Available" or something to that effect? I think most nontechnical users struggle with the null concept, so provide an easy to use value that means the same thing.

Adrienne Tannenbaum's Answer: Both scenarios will cause equivalent problems if you cannot guarantee that your future data warehouse users (developers as well as business users) will have easy access to absolutely up to date and correct metadata. And of course there is a lot more to be said as to where these "values" fit in the grand scheme of things. Is the specific field crucial to the data warehouse itself, or is it secondary info?

Tom Haughey's Answer: You are not the only one. Here are some considerations: There are several factors to consider, as follows:

  • class of the column, such as amount, count, quantity, percentage, etc.
  • role of the column, specifically is it a primary key, foreign key or non-key column,
  • how the column is used, for summation or joining and
  • finally, the type of system, namely, OLTP or data warehouse.
Filed under:

Advertisement

Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.