Q: Fact tables contain records which represent event (e.g., sales transaction). How would you generate a report that shows non-events (e.g., items not bought or slow-moving products)?

Tom Haughey's Answer:

Items Not Sold

First, let's assume your data warehouse stores data at the transaction level. Keeping the transaction level data is highly recommended and is the main grain in the data warehouses for most companies that sell stuff. Every item bought is stored there. Because this represents transactions, items that are not transacted or not sold are not represented there.Even with this, there are several ways to determine what was not sold by using SQL against this data even without making changes to the data structure. An SQL outer join could allow you to get the volume and revenue for all items, including those that did not sell. This is the explicit purpose of the outer join. The original SQL join was an equijoin and only showed a result where there were values present in each joined table. By doing a LEFT OUTER JOIN on Product to Sales you could retrieve sales for all products and it would show a zero result for those Products with no sales. Take the example of a real data warehouse that has orders and deliveries. They need to know which orders didn't result in a delivery, and which deliveries were made without an order. The LEFT OUTER JOIN was used to determine the answer to this need. I guess theoretically it is possible at the end of each day to insert a transaction row in the Sales fact table with volume and revenue of zero for any record that did not sell that day but that sort of interferes with the granularity of the main fact table.

Several other SQL options are possible, namely, SQL NOT EXISTS and MINUS. The SQL NOT EXISTS subquery could be used determine (in several subqueries) which items existed at some point in time but do not exist within sales. This could be used to enable you to retrieve only those with no sales for a given period. Similarly it could be possible to create a result set of those items that exist at a certain time MINUS those items that sold at the same time. The difference is those that did not sell.

Slow Moving Products

Queries do not have to start with dimensions. There are two general types of dimensional queries: outside-in and inside-out.

Outside-out queries start with dimensions and retrieve the appropriate fact: "Give me sales volume and revenue for the Eastern Region by Customer by Product and by Week." This is the most frequent access pattern. However, you can also query inside-out, starting with a fact and retrieving the dimensions, as in: "Give me sales where the sales volume or revenue was above or below a given threshold." This could be used to determine slow-moving products; the SQL would have to have the threshold in its selection parameters, as in, "Give me all products in orders where the sales quantity is less than XYZ or the sales revenue is less than ABC."

Aggregate Tables

In each of the above approaches, no special data structures are put in place to handle these special requirements. In this next option, you could create special aggregates to contain this sales data. The previous options used data at the transaction level. No transaction, no fact. Instead, this option creates aggregates off the transaction data. At least two aggregates are possible: a customer sales summary by customer by product by period; and a product activity summary.

With a customer sales summary, these separate tables could include all Products even those with a sales quantity or revenue amount of zero. If the aggregate contains rows for all products, you could select those that have zero activity or a low activity. The aggregated data would be right there and would be easy to query. However, there is no such thing as a free lunch. First, you have to have enough batch time to build this table. The summarizing queries to build this aggregate would have to use some combination of the techniques described in option one and two mentioned previously. The table would have to be built every period. If any of the data varied over time due to organizational changes or errors, the aggregate table might have to be rebuilt, entailing even more cost.

If you create a product activity history table, in addition to storing zero activity, you could also put a value band in there indicating level of activity for that period whether the product was fast or slow moving, and a new row would be added each period. I'd recommend determining slow-moving products in a query rather than storing a special codes or value bands.

When no history is kept, it is called current occurrence only, which is Kimball's type one. Either the data does not change or if it does you do not care to keep the change. In this case, it is possible to put an indicator or value band in the product table indicating that this is a slow-moving product. However, this might entail a good bit of update in the following ways. If your threshold for slow moving changed, then you would have to revise the indicators. If the sales for the product fluctuated throughout the period, you would only know the last activity level and it would probably be updated regularly. Again, I'd recommend determining slow-moving products in a query rather than by storing a special code.

Chuck Kelley's Answer:

Here are some options that you might use for non-events. 1) You can use an outer join where not exists clause. 2) Depending on your database system, you could do an {All Products} MINUS {Products with Events in Fact table} to give you those that have no events against them. Depending on how many non-event items there are, you could create Fact (column or new table) that represent Non-Events (maybe where a column is NULL or low/high - values or a new column that says an event exists or not, much like the factless fact table - there are many ways to do this). Then query where fact has that value and get the product(s) associated with it (them).

Now the issue with slow-moving products is not the same, due to the fact that there are events occurring, just not very fast. Now for these, one way you can do this is to define a "threshold" to determine what "slow-moving" means. Then a query to find where the count(*) of events >= the "threshold."

Joe Oates' Answer:

These are actually two different questions. The first question (e.g. items not bought) is different from asking what are the slow moving products. In both instances, you must specify a particular day or a period of time.

To answer the question regarding what did not sell, I refer you to an article entitled "What Didn't Happen" by Ralph Kimball. It discusses precisely this issue.

To answer the question regarding slow moving products, it is necessary to compare all or a group of products according to the total number of units sold within a given period of time. Once you sum each product over a period of time and order them from most sold to least sold, you need to define a cutoff point below which the products are deemed to be slow moving. The time period is usually a month, quarter or year.

I have seen instances in which looking only at "slow moving products" was a mistake. Some high-value, high-cost products are by their nature slow moving. Therefore, in addition to the slow moving criteria, the revenue and profitability of a product should also be considered.

Clay Rehm's Answer:

Because the dimension tables hold all of the values you are looking to identify, you should be able to develop SQL that utilizes an outer join between the fact table and the necessary dimension tables. An outer join will show the dimension values that do not appear as events in the fact table, which will tell you the dimensions that are not having any activity.

Another way to identify non-events is that because most likely you are already performing inner joins, the dimensions that do not show up on a query are dimensions with no activity.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access