Question: Could you please let me know when to use a star schema and when to use a snowflake schema? What are the major differences between them?
Chuck Kelleys Answer: My personal opinion is to use the star by default, but if the product you are using for the business community prefers a snowflake, then I would snowflake it. The major difference between snowflake and star is that a snowflake will have multiple tables for a dimension and a start with a single table. For example, your company structure might be
Corporate à Region à Department à Store
In a star schema, you would collapse those into a single "store" dimension. In a snowflake, you would keep them apart with the store connecting to the fact.
Joe Oates Answer: First of all, some definitions are in order. In a star schema, dimensions that reflect a hierarchy are flattened into a single table. For example, a star schema Geography Dimension would have columns like country, state/province, city, state and postal code. In the source system, this hierarchy would probably be normalized with multiple tables with one-to-many relationships.
A snowflake schema does not flatten a hierarchy dimension into a single table. It would, instead, have two or more tables with a one-to-many relationship. This is a more normalized structure. For example, one table may have state/province and country columns and a second table would have city and postal code. The table with city and postal code would have a many-to-one relationship to the table with the state/province columns.
There are some good for reasons snowflake dimension tables. One example is a company that has many types of products. Some products have a few attributes, others have many, many. The products are very different from each other. The thing to do here is to create a core Product dimension that has common attributes for all the products such as product type, manufacturer, brand, product group, etc. Create a separate sub-dimension table for each distinct group of products where each group shares common attributes. The sub-product tables must contain a foreign key of the core Product dimension table.
One of the criticisms of using snowflake dimensions is that it is difficult for some of the multidimensional front-end presentation tools to generate a query on a snowflake dimension. However, you can create a view for each combination of the core product/sub-product dimension tables and give the view a suitably description name (Frozen Food Product, Hardware Product, etc.) and then these tools will have no problem.
Chuck Kelley is an internationally known expert in database and data warehousing technology. He has 30 years of experience in designing and implementing operational/production systems and data warehouses. Kelley has worked in some facet of the design and implementation phase of more than 50 data warehouses and data marts. He also teaches seminars, co-authored four books on data warehousing and has been published in many trade magazines on database technology, data warehousing and enterprise data strategies. He can be contacted at firstname.lastname@example.org.
Joe Oates is an internationally known speaker, author and consultant on data warehousing. Oates has more than 30 years of experience in the successful management and technical development of business, real-time and data warehouse applications for industry and government clients. He has designed or helped design and implement more than 30 successful data warehouse projects.