I am in the process of designing the data model for the data warehouse. Is star schema or snowflake better?


Les Barbusinski’s Answer: The choice of whether to use star schemas, snowflake schemas or a hybrid of the two in a data mart is largely dictated by the business intelligence tool you choose to use. Some tools, such Business Objects or Brio, prefer standard star schemas with conforming dimensions. Others, such as MicroStrategy, work best with snowflake schemas. MOLAP engines such as Oracle Express, Hyperion Essbase or Cognos PowerPlay prefer to draw their data from long, flattened tables that contain both dimensional data and metrics in order to build their proprietary hypercube databases. The data architecture pretty much has to conform to the tool vendor’s recommended structure. Otherwise, you’re in for a lot of performance problems.

Chuck Kelley’s Answer: That’s like asking which fruit is better, apples or oranges? The answer will depend on your tastes (or in the case of the data warehouse, what you are building). Some BI tools work best in a star schema environment, other work best in a snowflake. Star schemas and snowflake schemas are part of dimensional modeling. Your data warehouse can be either a dimensional model or the relational model. The data marts should be whatever your BI tool works best with.

Joe Oates’ Answer: I would suggest a multidimensional design. Snowflaking increases the number of tables that you must join for many queries.

Clay Rehm’s Answer: None of the above. Without knowing what the purpose of the data warehouse is and without seeing the actual data and a number of other factors, I would not ever recommend what data design to recommend. There is not one perfect design that will work as a generic model across companies or industries.

You have many options – your model could be normalized to a 3rd normal form, a Boyce-Codd Normal Form (BCNF), or even higher (4th and 5th) or even Domain/Key Normal Form. Then there are star schemas, snowflakes and then there are hybrids between all of the above.

You may have one design type for your staging area/ODS (if you need one) and a more dimensional design for the data your users will touch. The model that works best for your users is the model that should be developed.

Scott Howard’s Answer: To claim best or better is too open ended as your motivations or needs may vary. Let’s describe the benefits and drawbacks of each schema so you can interpret which is better for you.

A snowflake schema is the traditional multi-dimensional model implemented on a flat system, like a relational database management system. It consists of a central FACT containing the measures or some content of interest, and is surrounded by dimension entities containing conformed context for the analysis of the FACT, or simply put, the questions you ask of the fact. The dimensions usually relate to the facts in one to many relationships. The snowflake schema exposes the dimensions as fully normalized structures usually consisting of many entities with often complex intra dimension relationships. An advantage of snowflakes is that usually represent and clearly expose the dimensional hierarchy which often directly relates to the aggregation levels that can be applied to the fact. For example, the geography dimension may consist of a hierarchy representing store, city, state/province, country and major market. This hierarchy is directly related to the geographic aggregations that can be applied to the fact, that is, view the facts consolidated by store, or by city, or by state, etc. This is easy for the end user and support personal to visualize and thus better supports the human factors involved in an ad-hoc analysis environment. The disadvantage of the snowflake is the large number of tables that could need to be joined to support even the most basic queries.

Star schemas on the other hand, produce models that boast a minimum join distance. That is that the star simply consists of the fact surrounded by a single level of collapsed or consolidated dimension entities. For our example, the information representing stores, cities, states, countries and markets would all have to be effectively represented in a single entity. It’s obvious that this will usually lead to better query performance as queries now will join fewer tables, a benefit to the RDBMS that does not perform multi-table joins efficiently.

Most modern RDBMS provide a complex star join optimization technique that upon recognizing a star join request, can greatly improve performance. However, some of these RDBMSs can not recognize a snowflake schema that could also benefit from the performance benefits of such a join technique. (Intelligent optimization is so critical in BI systems. Please do your homework researching this ensuring that you have the right RDBMS for the task.) Also, like any denormalized model, the star schema hides relationships like the dimensional hierarchy or consolidation path so evident in the snowflake. The star will also introduce redundancy and processing anomalies that require a more skillful query professional to address. Now you can decide which is best for your needs.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access