What does granularity mean in the context of a data warehouse and what are the various levels of granularity?


Les Barbusinski’s Answer: Granularity is usually mentioned in the context of dimensional data structures (i.e., facts and dimensions) and refers to the level of detail in a given fact table. The more detail there is in the fact table, the higher its granularity and vice versa. Another way to look at it is that the higher the granularity of a fact table, the more rows it will have.

Let me illustrate with the following example: Say we have a data mart with a single fact (Sales) and three dimensions (Time, Organization and Product). The fact table contains three metrics (Unit Price, Units Sold and Total Sale Amount). The Time dimension consists of four hierarchical elements (Year, Quarter, Month and Day). The Organization dimension consists of three hierarchical elements (Region, District and Store). The Product dimension consists of two hierarchical elements (Product Family and SKU).

As always, the metrics in the Sales fact table must be stored at some intersection of the dimensions (i.e., Time, Organization and Product). Hence, in this data mart, the highest granularity that we can store Sales metrics is by Day/Store/SKU (i.e., the lowest level in each dimensional hierarchy). Conversely, the lowest granularity that we can aggregate Sales metrics to in this data mart is by Year/Region/Product Family (i.e., the highest level in each dimensional hierarchy). We may also (for a variety of performance reasons) choose to store Sales metrics at some intermediate level of granularity (e.g., by Month/District/SKU).

Chuck Kelley’s Answer: Granularity is the level of depth of data. For example, you might have a date/time dimension which could be at the year, month, quarter, period, week, day, hour, minute, second, hundreths of seconds level of granularity. Most data warehouses do not go to the second or hundreths of seconds level, but it could be possible. The granularity with be the lowest level of the depth of data.

Joe Oates’ Answer: Granularity refers to the level of detail of the data stored fact tables in a data warehouse. High granularity refers to data that is at or near the transaction level. Data that is at the transaction level is usually referred to as atomic level data. Low granularity refers to data that is summarized or aggregated, usually from the atomic level data. Summarized data can be lightly summarized as in daily or weekly summaries or highly summarized data such as yearly averages and totals.

Clay Rehm’s Answer: Granularity simply means the level of detail. A typical data warehouse will have some tables in it that have a lot of detail and have other tables that are summarized or aggregated, which means less detail. Each non- key column in a fact table must be at the same level of granularity (detail). For example, if your primary measure on a specific fact table is daily total sales, then that is defining the granularity. Only when the grain for the fact table is chosen can we identify the dimensions of that fact table.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access