I have a data modeling question regarding patient diagnosis and hospital procedures that I can’t get to work. Each patient can have up to 10 diagnoses and 10 procedures per stay. I need to build a data mart that allows the user to count the number of times a procedure occurred for all patients within any of the 10 recordings for any diagnosis combination. I tried a schema that has a Time Dim, Hospital Dim, Dimensions for each diagnosis, Dimensions for each procedure and a count in the measure table for each diagnosis and each procedure.
The patient record has:
What suggestions would you give me for the design?
Les Barbusinski’s Answer: The problem you describe is a variation on the market basket analysis theme (e.g., what percentage of time does a customer purchase beer at the same time they purchase diapers and laundry detergent?). You can derive the desired metrics from the star schema you describe, but you’d have to use a high-end BI tool like MicroStrategy. If that’s not an option for you, you may want to try substituting a diagnosis combination dimension for your diagnosis dimension. Each row on the combo dimension would have a unique identifier and consist of up to 10 diagnosis IDs. Your ETL would have to insure that each combination is unique say, by insuring that diagnosis IDs are always stored in ascending sequence in the combo dimension and creating a unique index on all the DIAG_TYPE_ID columns. Your star schema would then look something Figure 1:
Figure 1 From this structure it should be easy enough to compute the aggregate metrics you described. Hope this helps.
Chuck Kelley’s Answer: Design on the fly is never a good thing. Too many questions are outstanding. For example, your paragraph takes of procedures, but the patient record has nothing, so I am not sure how to go about that. But, I shall try with what I can see. That way, maybe you can get a picture.
I would create time dim, hospital dim, diagnosis dim, patient dim, diagnosis # dim and Stay Dim The diagnosis number dim has 10 rows (1,2,3,4,5,6,7,8,9,10). It is only there since a patient can only have 10 diagnosis per stay. The Stay dim would just be a sequential number. Then I would create a Factless Fact Table (as Ralph Kimball calls them) which has a key for each dimension and one column (DoesItExist) which equal 1. Only put a row in the fact table for a patient on a day in a hospital had a diagnosis for stay for diagnosis # of that stay. For example, if there was not a diagnosis #10 for Stay 1 of Patient Chuck Kelley at hospital University Hospital on 10/19/2003, then I would not have a row in my factless fact table. The factless fact table is no different than a fact table, except it shows the existence or non-existence of an event.
With this, you can do all kinds of analysis, including the one you want.
Adrienne Tannenbaum’s Answer: This issue has characteristic logical data modeling issues. Before you start with a dimensional design, it is essential that you put together a normalized design. As you can see, it is not clear in your requirements whether or not there is a relationship between diagnosis and procedure. Is there? I think maybe there is if each patient can have many diagnoses and each diagnosis can have many procedures your design needs to be re-evaluated the dimensions diagnosis and procedure are not substantial enough because you have to account for the fact that they are relatively dependent upon each other...if they are.
Your design would be more of a snowflake in that case diagnoses for patients, as you have and then procedures per patient/diagnosis combination. Hope this helps
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access