Entity Relationship Modeling with UML
If you don’t know your destination, any road will lead you there. The goal of entity relationship (ER) modeling is to create a valid representation of the entities, their attributes and their relationships that will fulfill the needs of the business. While most of the available modeling notations can serve this purpose, the unified modeling language (UML) allows you more flexibility to achieve this goal, especially when dealing on the business side of the equation.
You don’t need to understand every aspect of the UML to benefit from its use in creating your logical ER models. When starting out with UML, you could begin creating your ER diagram by simply using classes (classes that have a stereotype of "entity," behave as any entity would), attributes and their associations (i.e., relationships) to other entities. This is how many data analysts get their feet wet using UML for logical data modeling. As you can see in Figures 1 and 2, the diagrammatic difference between ER models represented in the UML vs. other notations is inconsequential.
Figure 1: ER Diagram Using UML Notation
Figure 2: ER Diagram Using Crow's Feet Notation
However, UML provides you with a host of additional useful capabilities. For example, you can directly represent many-to-many associations in your data models along with the appropriate association (a.k.a. intersection) entity. The association entity belongs to the relationship and not to either of the two participating entities. This enables you to have a true visualization of the model elements including additional attributes defined within the association entity.
Figure 3: Many-to-Many Relationship Using UML Notation
This representation clearly highlights to all team members that this relationship needs to be examined further. What business needs are being realized by this many-to-many relationship? Does the association entity contain additional attributes (beside the PKs of the other entities) that support the business needs? Are there other attributes that can be added to association entity that can simplify queries? Should this really be a many-to-many relationship? Making such relationships explicit enables all stakeholders to understand these entities and to resolve them in ways that create a good database structure that is responsive to the business requirements.
The Bigger Picture
With all the discussion on notation, the bigger picture is often forgotten. Where do all these entities and relationships originate? They spring from the business requirements that they are meant to serve. For years, studies have shown the most frequent reasons causing project failure are: business people not participating enough in the development of the system, developers not understanding the business or its objectives and unclear or constantly changing business requirements. Here is where the UML provides valuable capabilities beyond other traditional notations. For example, who will use the database and how? What data security restrictions need to be enforced and which will be managed by different user views? A traditional logical data model examines the data that is needed, but does not provide a business or system description of why and how the data will be used.
UML helps address such areas beginning earlier in the development lifecycle where you develop your conceptual model. You can use the UML’s use cases to create a model of the system’s existing and/or desired functions. The simplicity of use case diagrams allows the business, database team, and development team to easily understand these models. Use case modeling is a simple way to a) understand the current business, b) elicit the desired requirements for the new system you are creating, and c) establish who will be interacting with the system and how.
In this way, you reach a clear understanding of what is expected of the system you must build and gain agreement from everyone involved.
Figure 4: Use Case Diagram
As you develop your use case models, you will discover the important business entities that are needed and come to a common agreement of their definition, early in the lifecycle. How many times have you seen disagreement on key domain entities in your business such as: Customer, Account or Agent? How many times has this gone unnoticed until it was too late, causing you to degrade your database implementation?
Taking this approach to conceptual modeling, you may also choose to use the UML’s sequence diagrams that show how the users of the system interact with the important entities you have defined. Not only does this elaborate the system design further, it also helps to begin the definition of the user transactions into the database. For example, in the Figure 5 you can see what entities (e.g., Order, Treatment, etc.) the medical supply vendor needs to access.
Figure 5: Sequence Diagram
Only the Beginning
Of course, there is much more you could do with the UML when designing databases (see UML For Database Design; Naiburg and Maksimchuk). By just using the UML to create a valid conceptual model of the system and then a logical model that realizes those business needs, you will have an excellent foundation for (automatically) transforming this logical model into an initial physical data model upon which you can elaborate and build your databases.
Simply taking these beginning steps will empower you, the data analyst, to really become a data steward. Since you can now talk a common language with the business people and application developers, you can protect the data assets of your company from architectural decay, redundancy and other data degradation that causes additional maintenance, increased defects and even project failure.
By starting with the UML at the beginning of the development life cycle, all stakeholders (e.g., business sponsors, application developers, database developers) can have a common understanding of the project goals. Your UML models are the maps to your final destination. Once everyone understands where they are going and how to get there, your chances of successfully arriving at that destination are much greater.