EIS effort in the industry today is increasingly facing the challenge of providing a robust, yet flexible, enterprise architecture for data mart delivery to end users within a short period of time and with limited resources. This mart delivery architecture further needs to support a diverse set of end users ranging from power users having good knowledge of SQL to the true end users wanting to look at canned reports or perform OLAP queries. In addition, this architecture has to be flexible enough to support data delivery in ROLAP, DOLAP, Excel spreadsheets or simply flat file formats. This article proposes an enterprise architecture for delivering dependent data marts that will satisfy varying end-user DSS needs. One of the main features of the proposed architecture is its ability to allow the reuse of dimensions across the enterprise. The focus of this article is on the layer between a data warehouse and data marts. We call this layer the "Enterprise Mart Cooker."
FIGURE 1: Context-level view of proposed Enterprise Mart Cooker
There are certain key assumptions necessary for a successful implementation of the proposed architecture. The assumptions are as follows:
- The Enterprise Mart Cooker is part of a dependent data mart environment. It is assumed, therefore, that a data warehouse will feed this layer with data that has been extracted, cleansed and transformed.
- The concept of a data mart in our context transcends its traditional definition to include broad data formats ranging from relational to dimensional cubes to flat files such as Excel spreadsheets. This architecture will accommodate the broad definition of a data mart and yet provide cost effective means to deliver them.
Description of System Components
1. Mart Cooker Input Staging Layer
This is an area where the data warehouse extracts are initially stored. Extracted data is not in a dimensional form since the data warehouse is mostly relational. This layer allows de-coupling of the data warehouse and the data mart environments that have different data organizations--relational versus dimensional. Advantages of the input staging layer are threefold. First, this layer will allow a very simple restart and recovery mechanism for mart cooker processes by buffering the extracted warehouse data. Second, the source and target specification between the data warehouse and the input staging layer will be very simple due to the intermediate, non-dimensional data organization of the input staging layer which eliminates the relational-to-dimensional transformation process detail; and third, since the data is staged within an RDBMS, database features such as stored procedures, triggers, etc., can be effectively used to simplify the programming effort and hence simplify the data mart building process.
2. Mart Cooker Process Layer
This layer consists of a set of processes that create facts and dimensions using data from the input staging layer and then storing the newly created mart in an output staging layer for delivery to the end-user environment by the mart delivery layer. While creating a data mart, these processes can reuse pre-existing dimensions from the corporate dimension repository instead of recreating them, thus eliminating the redundancy and duplication of effort. New dimensions created during the mart building process, that have reuse potential, can be added by the cooker process to the corporate dimension repository while the existing dimensions can be refreshed. Reuse will greatly reduce the subsequent mart building time since less code has to be written to create a new one. One of the key features of this layer is its ability to allow the generation of data marts having structurally varying data formats. Furthermore, this architecture lends well to automating the mart creation process itself using various configuration techniques. For example, a user interface can be used to input configurable parameters such as source and target mapping specifications between the input and output staging layer, information as to whether a dimension should be created from scratch or can be reused from the corporate dimension repository and transformation rules necessary to create the dimensional model, followed by automatic code generation. We feel that the automation aspect of this architecture has a huge potential for return on investment by reducing the cost and level of staffing. Therefore, we recommend investing a good amount of time and effort in developing such a process.
FIGURE 2: Drill- down view of the Enterprise Mart Cooker detailing its major components
3. Corporate Dimension Repository Layer
A corporate dimension repository is a store for commonly used business dimensions within a company. The mart cooker processes store newly created dimensions that can be reused into this repository and retrieve existing dimensions during mart creation. This is a vital component of the overall architecture since it promotes knowledge reuse and enforces a consistent view and common vocabulary across the enterprise. For example, once a customer dimension and its hierarchy have been established over a period of time, this dimension can be reused by any organization within the company without having to reinvest time and resources. As a result, every organization using the customer dimension will have the same view of their customers, thus promoting a common vocabulary. Also, the reuse of a corporate dimension will help reduce time and cost involved in building new data marts, which is critical to the success of an enterprise effort. Technical meta data (specific to vendor implementation) can coexist with business dimensions to provide reusability downstream.
4. Mart Cooker Output Staging Layer
Newly created marts (by the mart cooker processes) are stored in this area before delivery to the end-user environments. This layer provides easy restart and recovery for the mart delivery layer. Furthermore, since the mart data is staged for a certain period of time, delivery to multiple destinations is easily accomplished. This layer helps in pulling computation away from end-user environments, thus reducing end-user hardware capacity needs and resource requirement. Additionally, by facilitating the creation and staging of marts in a common area, this layer eliminates the redundant effort required for creating the same data mart by multiple end users. Finally, this layer also provides end users the option of not needing a production-supported environment since the mart can be easily sent within the specified mart retention time period.
5. Mart Delivery Layer
This layer supports and facilitates the transportation of data marts to end-user environments. Since heterogeneous mart environments (such as relational, flat files or dimensional) have to be supported by this layer, different mart loading techniques will have to be used. Middleware, such as publish and subscribe, may be used for incremental or complete loading of data across the network. The delivery mechanism can be as simple as FTP of flat files or as complicated as the use of a middleware tool that will invoke a loader in the target database.
It is very important to implement a business process that will effectively utilize key features of this architecture on an enterprise level. We believe that the steps listed below can serve as a template to formulate and refine the enterprise mart delivery process.
1 User submits initial request for building a new data mart. This request includes a statement of intended use (objective, etc.).
2 Mart team conducts interviews with the user group and gathers requirements.
3 Business analyst on the mart team develops a dimensional model.
4 Initial extraction request for data from the warehouse is prepared by the data mart team. This specification consists of high-level item descriptions needed for creating the data mart. The mart team is assumed to have no knowledge of warehouse data structure. This specification is forwarded to the warehouse team.
5 The warehouse team investigates specifications submitted by the mart team and creates source mapping specifications. If a field is not found in the warehouse, the warehouse team initiates an ETL process for this data and communicates the expected date of delivery to the mart team. This source specification document is then submitted back to the mart team.
6 Based on the detailed source specification provided by the warehouse team, the mart team designs the target specification for the input staging area. The layout and the loading frequency are added to the target specification document by the mart team. This specification is then handed back to the warehouse team.
7 The warehouse team develops the ETL necessary to populate the input staging area.
8 Using the source specification document from the warehouse team, the business dimension model developed by the mart team and the corporate dimension repository data model, the mart team determines whether existing dimensions can be reused from the corporate dimension repository. Dimensional transformation needed to create the mart from the input staging area is also specified.
9 The mart team designs and develops mart cooker processes for building the new mart. As an example, the building of a cooker process can be as simple as entering the dimensional transformation specification via a user interface followed by automatic code generation.
10 Delivery specification is created by the mart team with help from the user group.
11 Using the delivery specification, the delivery process is built to transport the newly created data mart to its destination. The delivery mechanism of data mart can be as simple as an FTP of flat file (in case of Excel format) or as sophisticated as the use of middleware that will transport data across the network and invoke the loader for the target database.
There are several advantages of the proposed Enterprise Mart Cooker architecture. They are as follows:
- This architecture is simple, scalable, multitiered and easy to implement.
- It allows creation of marts in any data format such as relational, dimensional or flat files.
- It allows the reuse of dimensions and technical meta data across the enterprise which ultimately reduces the cost and mart creation cycle time.
- It promotes corporate-wide common business vocabulary by sharing common business dimensions.
- Modular nature of the architecture allows a simple interface specification between the data warehouse, input staging layer, mart cooker processes, output staging layer and the mart delivery layer.
- Marts can be maintained with or without the production support staff since it is easy to resend newly built marts from the output staging layer.
- Architecture allows for the automation of the data mart creation process.
- Mart cooker layer pulls the number crunching involved in mart creation away from the end users into an IT function so that the end-user DSS server can remain simple, inexpensive and small.
- The architecture does not tie any specific DSS vendor or database methodology to its design and is truly open for the enterprise.
- Centralized location of information ensures data consistency across the corporation.
Enterprise decision support efforts are very costly to implement. In order for these efforts to succeed, they have to be cost effective. Cost effectiveness can only be achieved by automation and reuse of information. This architecture provides the means to achieve both of these objectives. It also emphasizes the need for having a back- end data warehouse that will feed clean and transformed data to the Enterprise Mart Cooker. In this article we have introduced two important concepts (corporate dimension repository and mart cooker process) that will be critical to the success of any enterprise EIS effort.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access