A typical information management initiative starts out with a plan, but for a plan to provide true value to an organization, it needs to be flexible and able to evolve with the constantly changing business policies and dynamics within organizations. This is very relevant in the world of design in the information management space. One may start out with a project blueprint of what was relevant at conceptualization, but the design needs to be flexible and correctly represent the constantly evolving landscape over time.
For our purposes here, information design refers to the design of various structures required to store information in the information management space. The design of analytic components will be addressed later. This design phase of the lifecycle is critical, because this is where the business needs and rules are captured and implemented. As a result, the design phase will determine the success or failure of an initiative. If the business needs are not completely met at this phase, the analytics phase will not be able to deliver the accurate information required to make business decisions.
The nature of the requirements and business functions will determine the design methodology. Several popular design themes can be adopted based on the business requirements. If the solution addresses enterprise-wide requirements, where multiple functions are combined within a single system and the goal is analytics and reporting, a data warehouse may be the appropriate solution. If the solution is intended to address a single business function, where appropriate capture of business requirements and storage is the priority, then a normalized structure might be more appropriate.
Either way, the priority should always be the appropriate capture of business rules and the creation of a base layer, on top of which structures can be created that are suitable for analytical and reporting purposes, if required. This principle helps in determining whether to create normalized versus denormalized designs or third normal form versus data warehouse designs.
Examining these two approaches in greater detail will reveal the underlying characteristics of each. Understanding of the principles of relational database design is assumed here. A normalized or third-normal form design lays out the relationships between the entities in a manner that reduces redundancy and eliminates repetitive information. It is the purest form of the design of information that accurately captures business rules and implements it. Hence it should always be the first choice of design. There may be other pressing reasons for not using this approach.
Most of these reasons are usually business-driven where time and cost may be of greater importance than adhering to principles of design. Another may be if the intended goal of the initiative is to provide for analytics and reporting to enable decision-making. Thus, ultimately – and unfortunately – the reasons for the design are beyond the control of the folks designing the system.
It is important to consult with the senior members of the design team in adopting an approach. This will help ensure that the basic tenets of design are adhered to and corporate goals are aligned with and account for these principles of good design. This will also ensure appropriate implementation of the system, and establish a solid and sound foundation to build upon.
An About Face
Having discussed a normalized or third normal form approach, I am going to contradict my earlier statement about the existence of a second approach: There should be no second approach.
The right approach should be to always design a normalized or third-normal form system. Once this is done, a solid foundation will have been created . Once this normalized design is in place, you can build a layer on top of it that will account for analytical and reporting needs.
This second layer can be the data warehouse, operational data store, data marts or whatever you may want to call it. Of course, each of these analytical approaches comes with its own design nuances.
In building a system and sourcing information from multiple areas, information needs to be staged in a landing area. If an enterprise data warehouse is the ultimate goal, in the interest of time and cost, this landing area or an intermediary staging area can be designed in the third normal form. This ensures that whatever the ultimate goal may be, basic design principles are adhered to, and from that point an analytical system can be designed and built.
The Analytical Layer
Moving on to the design of the analytical layer, as mentioned earlier, this layer could take either the form of an ODS, a data warehouse or data marts. The business requirements and information processing realities will help determine the appropriate approach. Some instances of these needs may be information processing, information load times, real-time integration or real-time analytics.
These are some instances where an ODS style of design may be appropriate. At this point, let us consider an example. Consider a simple abbreviated product hierarchy of three levels: product, line and package.
Product line is the highest level in the hierarchy. Each product line will consist of several products. In turn, each product will come in different packaging units in terms of quantities. Once the sale is made, we want to track sale of these products. The analytic requirement is to analyze the sale of all product lines across all customers across all regions. There are two scenarios to consider here: One is getting the information into the structures designed, and the other providing the information for analytical purposes.
To get the information into the structures or load the information, we need to consider the processing speed and load times. Each independent entity, product line, product and product package will have its related entities that store numerical or transactional information such as quantities. Information is first loaded into the qualitative or descriptive structures and then transactions are loaded into the respective structures.
If the analytical requirement was to analyze information by product line, product and product package, then the structures need to be able to provide this information easily with minimal logic.
On the other hand, information can be denormalized and aggregated and stored in appropriate structures, in which case there will be no need for expensive database joins.
For these reasons, the design should support business requirements and analytical needs. It makes sense to create the base layer and then build the requisite layers on top of it to support analytic needs and quick response times. This way the design does not compromise the support of business rules and business logic, which is captured in the base layer. All analytic and reporting needs can be provided for in abstractions created on top of the foundation or base layer.
The discussion continues in Part 5, to be published on December 16.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access