Advances in software and computing technologies have led to great efficiencies by automating and streamlining manual processes while amassing large volumes of data. Entity-relationship modeling, the database design technique often used in these applications, organizes data by reducing data redundancy in a manner called data normalization. While this technique works well for transactional applications, performance and functionality issues arise as these databases are accessed for reporting, analysis and decision-making purposes because entities have indexes to optimize data entry and processing as opposed to data extraction. Because redundancy is eliminated, multiple entities must be accessed to obtain the desired information. As the number of relationships increases, so does the complexity of those relationships, which also impacts performance. Functionality issues occur because data needed for reporting and analysis is often stored in disparate databases, thereby causing individuals to access multiple databases to obtain the data they need. These issues spawned the movement toward the creation of data structures and information repositories.
Data needed for reporting and analysis can be arranged and stored in numerous structures including data files, denormalized tables, OLAP/multidimensional cubes, operational data stores (ODSs), data marts and data warehouses (DWs). Each is vastly different, and each serves a specific purpose. Data files are typically provided to external parties requesting a set of detailed information or are used internally for analysis by another application. Denormalized tables are designed to contain redundant information and do not adhere to the rules of data normalization. They are often created as a quick solution to satisfying multiple reporting and analysis needs for the same set of data. Similar to a denormalized table, an OLAP cube contains highly redundant data. It also contains dimensions, which are used for trend or comparative analysis. ODSs, data marts and data warehouses are data structures that use relational database management systems. Through their database schema, they provide broader capabilities by containing both detailed and summarized data while providing extensive reporting and analysis possibilities. To create an organized and comprehensive information repository for your organization, an understanding of the role of each data structure is needed.
Information Repository Architectures
The organizational structure of your company often influences whether the information repository architecture is centralized or decentralized. In a centralized architecture, an enterprise perspective can be created, thereby providing management with a broad range of analytics for monitoring, assessing and decision-making purposes. In a decentralized architecture, information repositories are created for each sponsoring business unit or group. This architecture is often reflective of standalone operations and satisfies the information needs of that respective group.
Within an information repository, there are several approaches to organizing data structures and designing technical architectures. The three common approaches presented here hold both benefits and drawbacks.
Federated Data Warehousing
The federated approach to data warehousing enables virtual access to disparate data sources as if they were one. In disparate information environments, individuals often seek an architecture that seamlessly integrates information without needing to tackle internal issues of data stewardship, nomenclature and standardization.
However, the practical realities of this approach highlight the issues of performance, integration and functionality required for an information repository. Current data integration technologies have made advances but have not yet become enterprise platforms for information. In addition, meta data standards among transactional systems and business intelligence (BI) applications hamper the effectiveness of federated data warehousing.
The technique for designing a star schema or dimensional model is the foundation of dimensional warehousing. It is an elegant approach to achieving the performance and functionality required for reporting, analysis and decision-making purposes. Conforming dimensions are incorporated into several star schemas, thereby reducing the number of similar data structures.
As a rule of thumb, dimensional warehousing is best suited for less than three unique source systems. This is due to the increased complexities of integrating data on the same subject matter from multiple unique source systems. For example, integrating customer data by matching customer name from two unique source systems is challenging but feasible in a dimensional model. As the number of unique source systems increases beyond two, the data integration challenges grow exponentially. To address the complex data integration challenges from multiple unique source systems, we have found that the corporate information factory (CIF) architecture works best.
Corporate Information Factory
The CIF is the most comprehensive and effective information repository architecture. It provides a framework for organizing data in a manner that supports the integration of subject areas and data supplied by various transactional systems. In a CIF, the data warehouse and/or operational data store are designed using entity-relationship modeling techniques. Data marts are then created from the DW or ODS. The design used to create the data marts is dimensional modeling, which provides the rapid extraction and functionality needed by information consumers.
For organizations with less than three unique source systems, the CIF architecture may be more work than is necessary to support the information needs of the organization.
An information repository contains different types of data structures to satisfy the needs of information consumers. Organizing and cataloging these data structures creates an information repository architecture that fosters efficiencies and data consistencies needed for reporting, analysis and decision-making purposes. The CIF and dimensional warehousing are the two best architectures to consider.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access