Increasingly, large enterprises are recognizing the value of an enterprise data warehouse (EDW) in their information and knowledge strategies. The potential benefits include cost-effective consolidation of data for a single view of the business and creation of a powerful platform for everything from predictive analysis to near real-time strategic and tactical decision support throughout the organization.
Too often, a company's struggles with designing and implementing a logical data model (LDM) are the source of the problem. The LDM is of vital importance, largely because other key components of enterprise data management rely on it. It is especially disturbing that the frustrations around LDM implementations are often preventable, having more to do with people, process and organizational concerns than with the technology itself. If companies can better understand what is tripping them up when they create and implement LDMs, they will realize the full potential of their data warehousing projects much more quickly.
Why the LDM Matters
An LDM is a representation of business concepts laid out in a visual format that clearly shows these concepts and their various relationships. It is independent of the underlying database implementation. Designing an LDM that fits the needs of the business is crucial, not just because it reflects the commitment to treat data as a true enterprise asset but also because it enables efficient and effective storage of data that businesses can readily access to create various information and knowledge products. Conversely, a poorly designed LDM negatively affects many EDW components, making rework quite expensive. The LDM affects:
- The design of data movement techniques and standards, including extract, transform and load (ETL) and enterprise application integration (EAI), degree of normalization, use of surrogate keys and cardinality;
- Reference data;
- Business and technical metadata and the metadata repository; and
- Packaged applications, such as enterprise resource planning (ERP) or HR systems, which have their own LDMs that, in some instances, must work together with the overall enterprise model.
LDMs also serve as a foundation for data quality. Models that don't follow first normal form or have the wrong relationships often store duplicate data, resulting in loss of data quality. Proper modeling of items such as domains and data types helps validate data quality checks. LDMs also must comply with data governance guidelines and any overall data standards in the enterprise. The amount of history stored in a warehouse depends on the design of LDMs and physical data models (PDMs), thus influencing the storage strategy.
The Data Modeling Process
Building an EDW environment is not unlike building a house, which often includes a three-step modeling process that begins with a sketch showing the various rooms. This initial plan drawing does not include detailed construction-specific information such as the type of materials needed, but does include details such as room dimensions, plumbing details and electrical details. Think of this first sketch as the LDM.
When building a house, the architect hands the sketch over to a construction engineer, who creates detailed plans to construct the house. The construction engineer must apply his knowledge of materials and other factors to narrow down the design choices. The end product is an engineer's blueprint that specifies the various materials. For example, if he is using copper tubing for hot water plumbing, this drawing would specify the gauge and diameter of the pipe, among other things.
Finally, there may be many people living in the house with their own expectations about features and functionalities. Also, the same underlying feature may have different applications across the house. For example, even though water is used throughout the house, in the kitchen you need a standard sink and faucet with hot and cold water, in the bathroom you need a shower and sink, and in the swimming pool you only need cold water with an additional filtration/chlorination system. Just as you don't build a plumbing system to satisfy one room or person, you don't build a data infrastructure to satisfy the needs of just one business user. The semantic layer acts as the interface to the specific uses and needs of a business and uses the language/semantics that are understood by specific business areas.
Actual Modeling Processes
Logical data models: In an ideal scenario, the development of an LDM begins with a broad set of data requirements for the EDW. Using the requirements along with interviews with subject matter experts (SMEs) creates a conceptual model showing the relationships between key entities.
The next stage is detailed attribution, where you assign primary keys and verify the cardinality, optionality and identifiability of the various relationships. Detailed attribution is created based on the data available in the enterprise for specific applications. This involves mapping data elements in the source environment to the LDM. Further, with the help of SMEs, you also develop metadata (including domains, data types, definitions, comments and notes). Following this, peers and SMEs typically review the model. After review and, if necessary, revision, the model becomes available for the next step in the modeling continuum: the PDM.
Physical data models: For PDMs, data architects create index structures, perform selected denormalizations, create aggregations/summary tables and conduct performance tuning. They also handle staging area tables as part of physical data modeling. Because architects optimize PDMs for the underlying database architecture, PDMs are not easily interchangeable across database platforms.
Semantic data models (SDMs): Designed for specific end-user applications, SDMs give business users a view into the database that reflects specific business area semantics or terminology. SDMs are a layer on top of the PDM that uses database views as well as additional summary or other minor tables. If companies use reporting or OLAP tools, the semantic layer sometimes reflects those tools' characteristics.
Five Key Challenges to Designing and Implementing an LDM
While creating these models may seem straightforward in theory, in practice, a number of roadblocks can emerge that have complicated this critical stage of an EDW implementation. The five most significant roadblocks are:
- Lack of a common vision for the data warehouse: This has a cascading effect resulting in unclear requirements and a badly designed data warehouse. Added to these factors, unrealistic time frames and limited availability of SMEs can create the perfect storm. Some organizations try to get around these problems by assigning an application expert to oversee the process, but more often than not, this simply creates another layer of bureaucracy where unanswered questions often stall.
- Organizational muddle: Many large corporations result from M&As, with the parent company inheriting a smorgasbord of business and IT resources that may not fit efficiently into the larger organization. For example, if an IT group has typically worked in a data mart environment and now must create an EDW without clear guidance, the resulting model will likely comprise a series of disconnected and duplicate subject areas. An often-related problem is that some EDW projects begin without a clear delineation of responsibilities. This creates conflicting priorities and dilution of scope. For example, some groups may try to push pet projects, thus overburdening resources. Other times department heads may not see the value of the EDW - or feel threatened by it and so sabotage the process.
- Adapting to existing models and migrations: Many EDW implementations include a "forklift" of existing data from the data mart into the EDW. In these situations, companies do not usually model the data. Rather, they transfer it as is so its tables and columns are familiar to end users, eliminating the need for training. The problem is this creates unnecessary silos within the warehouse, thus undermining the warehouse's key advantages. Ultimately, the company will need to model the data anyway. While forklifts can be a necessary tactical decision to ease migration, they are not a final design solution.
- Source system-based design: If requirements are not clear or SMEs are unavailable, the easiest approach is to look at the existing system for an answer. Moreover, improvement in data profiling technologies has resulted in inferring primary key and foreign key relationships from source systems (relational or mainframe databases). Consequently, some companies use only source-based logical models. However, the result is that companies merely replicate the design of the source system application - an inefficient model that carries redundant information or modeling only those concepts that are represented in the source system. By not modeling business concepts that the enterprise needs, companies fail to infuse the logical model with a true enterprise focus. Another problem with source system reliance is that architects can mistakenly assume that processes where source system data elements are mapped to target physical models are part of the logical data modeling phase; this creates another type of scope creep.
- Trying to create an all-in-one model: For those who don't understand the value of LDMs, creating hybrid models that combine logical, physical and semantic information in a single model can also be a tempting trap. Such models are difficult to use and implement. In our blueprint analogy, imagine crowding parts and pieces of appliances, plumbing, electrical, HVAC, masonry, carpentry and cable information for one house in one blueprint. It would be a tangled mess that would only sow confusion, rather than provide direction.
Before proposing any specific solution for overcoming these challenges, organizations should conduct an assessment of existing tools, processes and organizational behaviors; the assessment can be formal if you have the necessary backing or informal if you don't. Either way, use the assessment to discover if any of the following solutions apply to your specific LDM challenges.
People and Organizational Solutions
Develop a common vision for the enterprise around the purpose and usage of the data warehouse. Along with the vision, a roadmap that shows the implementation schedule for various subject areas will be useful for all stakeholders.
Ensure there is a business and IT champion who can support the LDM development process. This champion should believe in the value of an EDW and have enough authority to overcome roadblocks.
Identify key stakeholders from the business and technology sides who are directly affected by the LDM development and identify their positions (support, neutral, oppose). Use the business champion to defuse opposition, create buy-in and align all key stakeholders. Create a process for keeping everyone informed of all key developments.
Use the business champion to ensure that only the relevant business and technology people are involved in the modeling effort and that SMEs are available to support the modeling effort.
Train key stakeholders. Bring the modeling team up to speed with logical data modeling methodology, especially those who have a data mart or application-based modeling approach. Modelers with domain expertise are ideal, but depending on team skills, training should include sections on topics such as normalization, subtype/supertype, standard modeling patterns and model management. Ensure business users and IT managers understand the value of having an enterprise logical model. Use samples to demonstrate how the LDM can help answer cross-business and ad hoc questions.
Develop a standardized logical modeling methodology with reusable design patterns. Because logical modeling is a fairly mature process, there are a number of white papers, presentations and textbooks that present reusable design patterns.
Develop a model management process. When multiple modelers work on the same model, there should be a process for change control, revisions, authentication and commitment of work.
Develop accelerators to speed up the JAD (joint application design) sessions and reduce business user time. In addition to making sure SMEs are available for validating the model, accelerators such as a straw model with key concepts and definitions help engender discussion and focus the JAD session. You can construct the accelerators based on prior experience, industry standards or other reference models.
Create a process for addressing and validating the lifecycle of an LDM, including requirements, design, development and testing. Design metrics to evaluate the efficacy of the lifecycle and the overall process. These metrics can address budget, usability, completeness and efficiency.
Accept the fact that in many cases, requirements are not available. If domain expertise is available in the modeling team, use that to come up with a set of skeleton requirements and validate these with the business users. You can also use source system design as a starting point to create a straw-man model that can serve as an accelerator.
Develop periodic status reports for all stakeholders to inform them of the progress and list any concerns.
There are a select number of tools available to develop logical data models, and it is essential to translate some of the processes described here into tool-specific templates. This could include such things as a formalization of the modeling conventions, naming standards and model management procedures.
Use a data profiling tool to arrive at standard domains and valid values, among other things. Store metadata required for the enterprise (such as definitions, data steward information, domain values, valid range, allowable values, data type information and relationship phrases) in the tool; make sure that other data management components, such as data quality and data movement, can extract and use that metadata. Most tools will also allow you to separate logical and physical modeling information.
Data warehousing is, at least in part, recognition that data is an extremely valuable enterprise asset that requires rigorous management. Logical data modeling is a key data management discipline, the success of which is critical to ensuring efficient and effective EDW implementation. Understanding and effectively facing the challenges in creating LDMs by using a combination of people, process and technology solutions will help enterprises successfully develop large EDWs.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access