A star schema is designed to efficiently answer metric business questions, such as those questions that return amounts, quantities and counts. For example, "What are our total sales by customer and product over the last six months?" Stars are generally small in scope, with a handful of dimensions surrounding a single fact table. But one day, a colleague stops by your desk with a printout of a star schema that makes you jump right out of your seat. It is a sales super star! More than 1,000 data elements from the sales area are on this model. You learn that this star was produced by placing every metric data element from sales in the fact table and every nonmetric data element in a dimension. The fact table contains more than 100 different metrics and there are more than 30 dimensions! You are even more shocked to learn that this star schema is in production!
The sales star schema is experiencing some performance and space issues, and your colleague would like advice on how best to restructure this model. What is your response?
Many database problems, such as performance and space, have as their root a gap in analysis. In this scenario, the gap is not fully understanding the scope or subject matter of the original requirements. Once the requirements are fully understood and properly scoped, a variety of different modeling techniques can enter the picture to reduce the super star into a series of fast and user-friendly stars.
Understanding the Requirements
A star needs to be defined by a set of business questions and not by a broad functional area such as sales. Ideally, a representative set of the business questions within the sales area can be identified and documented, and the metrics behind these questions can be grouped together into efficient stars. The metrics can be grouped by usage or by business area.
Grouping metrics by usage means assigning metrics to stars based on common reporting and queries. Nishant Upadhyay, data architect, would analyze usage by studying the reports generated from this super star. "This will enable me to categorize the reports into sales lifecycle reports, sales process reports, pipeline sales reports, etc. The super star can be reviewed to see if it can be broken up into these subject areas to make them more manageable and focused." Michael Workman, systems analyst, would interview business users and create a series of stars according to different audiences.
Grouping metrics by business area means breaking the sales function down into its business processes. Maurice Frank, data modeler, would begin by investigating the sales business processes. "Sales is very general. Break it down into smaller business processes and map the measures and dimensions to each process which would have its own fact table." Mike Nicewarner, data analyst, supports this approach by saying, "Divide and conquer! Instead of just sales, you have tables named outside sales or marketing efforts."
Applying Different Modeling Techniques
After understanding the requirements, design challengers suggested a number of modeling techniques to restructure this super star, including shifting simple calculations to the reporting tool, selectively applying indexes, re-evaluating history and restructuring dimensions.
Both Mustufa Kapadia, BI analyst, and Eve Halberg, BI team lead, stress the need to shift simple calculations to the reporting tool. Eve says, "I would review all metric elements to determine whether they are required. Base metrics are required; calculated metrics may be better handled by reporting tools, such as averages."
Indexes can be a quick fix to a performance issue when applied properly. Mustufa Kapadia would analyze indexes on both the facts and dimensions. He would also look for opportunities to reduce or offload the amount of history in the sales fact table. Another approach would be to store the oldest sales fact records at a higher level of granularity to save space and increase performance.
After understanding the requirements, the dimensions themselves can undergo a restructuring. Eve Halberg says, "Dimensions that can be merged are status or code dimensions. For performance, I usually look to create what I call 'key dimensions' for large dimension tables. For example, I broke my customer dimension into a customer dimension and customer key dimension. The key dimension contains the elements that are frequently accessed in reports. Also, sometimes I've seen metrics in a sense repeated, such as checking account deposit amount and savings account deposit amount. I would replace these with one amount and create a key dimension that types the amount."
If you would like to become a design challenger and have the opportunity to submit modeling solutions, go to www.stevehoberman.com/designchallenge.htm and add your email address. If you have a challenge for us, please email me a description of the scenario at email@example.com.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access