I'm designing a star schema for a client for reporting on the use of several systems. Every single usage of a system is logged and modeled as events in the fact-table QUERIED with an attribute called query=1. In combination with two dimensions User and System we can report on the number of queries a user fired on a system. Here's my question: I also want to report on the answers the user got from the system he queried. So, the star schema has to incorporate both event data and all the answers related to the events. I currently modeled it this way:
QUESTION and ANSWER are related to the fact table QUERIED and must provide the actual questions and answers causing the event to happen. Is this the way to model it or are there any other ways?
Scott Howards Answer: Proper design here must be based on a good understanding of the user query patterns and their scope of analysis, neither of which I possess. Now that Ive established that Im not qualified to answer your question, lets try; as that hasnt stopped me before ;-). Youve designed a factless fact table, which should be avoided unless the typical scope of analysis is very random or indeterminate. Thats a generalization based on experience, so let me be more specific. If you have data that will be returned for the majority of your queries or joining from the dimension to the fact is a performance problem, that data should exist only in the fact table. The curious thing about your example is your maintaining a history of the answer sets. I assume that these answer sets can be large, so it may not make sense to keep them in the fact table even if they are frequently required for the typical query. Youll have to base your decision on your own experiences with the BI tools and RDBMS with which you deal. However, your QUESTION dimension appears to be a major scope of analysis and probably can be incorporated into your fact table as even large queries can be easily stored as a single attribute. Doing so would eliminate one dimension. I recommend this based on the assumption that the most likely query will be what question(s) did a specific user ask on a specific system or a specific day. A secondary query may be what was the answer set. I dont know if a query of who asked a question that returned X would be too common (just an assumption based on not being intimate with your application). If my assumptions were wrong, use the following advice to help construct the proper star: the dimensions should represent the questions that you ask of the fact. In your case; who, the USER; when, TIME; where, the SYSTEM; and finally what which can be both QUESTION or ANSWER. Notice that you have two whats here, the basis of our quandary. The fact should represent the major measurement or the reason for analysis. Is that reason the QUESTION or the ANSWER? This will depend on whether your system allows ad-hoc QUESTIONS or users are restricted to a finite set of canned or predetermined queries. I recommend that you keep your dimensions to a minimum and your reason for analysis in the fact table.
Chuck Kelleys Answer: I am not clear from your question what QUERY=1 means. From my best guess at what you are trying to do, you could create a new dimension that would be the equivalent of the QUERY=1 with all the data associated with it and tie that back to your fact table.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access