With the front-end focus of many BI projects (after all, this is what sells BI engagements and what the business users see at the end of the day), it is not surprising that organizations often lose sight of the workhorse of the BI effort - the back end of the production line where data is acquired, cleansed and transformed into useful information for downstream consumption. This effort often accounts for at least 70 percent of the overall development effort of a typical BI project and, if not architected correctly, renders the presentation of data useless. Garbage in is garbage out, whether the data is coming from the source systems or an enterprise data warehouse.
Typical high-level enterprise data warehouse architectures contain several noticeable key components. We've all seen them, and most are usually cookie cutter in nature, as illustrated in Figure 1. However, the most important component is often talked about briefly prior to the beginning of an engagement and given the lowest priority once development has started. I am talking about quality control, a component that is not solely concerned with the quality of the data as it moves through the different layers of the architecture (as important as that is), but one that focuses on the quality of the ETL deliverables and processes. Implementing a quality solution from a back-end perspective is not an easy task. It should be no less complex than implementing a quality OLTP or ERP solution. In many cases, it may be more complex. However, it is not an impossible task, and when architected correctly, it can be leveraged for future phases and/or future BI efforts.

Figure 1: EDW Architecture
The key to building a quality ETL solution is to define and develop standard processes and templates throughout the architecture, design and development phases of the project and ensure that those standards are leveraged correctly. These standards encompass more than just corporate naming standards for database development; they also include process standards, standards for all ETL processes and data movement activities across the entire data architecture. Developing a consistent set of standards ensures conformity across the components, reduces development and testing times, simplifies maintenance and reduces total cost of ownership. Also, because many BI efforts are now developed using a combination of local and contractor resources, including offshore development and support models, it is increasingly important that these solutions are designed and built with solid common standards and processes in place that have been tried and tested.
Developing Standards
As different as most BI projects are, from an enterprise perspective the architectures are very similar, and standards can be developed for many of the components of the back-end architecture. The minimum standards that should be defined include:
- Database naming standards for the enterprise data warehouse (EDW) and/or data marts, including intermediate staging areas;
- ETL standards for all the ETL components, whether using an ETL tool or other methods for ETL; and
- Operating system and standards for security where appropriate.
These standards, however, only scrape the surface when it comes to developing a quality solution that should also include:
- Defining and/or developing processes for all common components of the architecture;
- Defining and implementing procedures and templates for ensuring that standards are followed and implemented correctly; and
- Defining appropriate levels of testing and oversight to ensure that all deliverables have been tested thoroughly and implemented consistently.
Common Process Standards
Standards should be applied throughout the back-end architecture, leveraging existing internal and vendor standards where appropriate. However, to be truly effective it is important to standardize all back-end processes. Defining standards for these processes up front is one which will certainly pay off during other development initiatives, as future phases and projects can be implemented more quickly and new resources can get up to speed in a shorter time frame. From an enterprise's perspective, activities that can and should be standardized include:
- Data acquisition and staging activities,
- Model and ETL standards for maintaining history in the different layers of the architecture,
- Data quality and exception handling activities,
- Conformed dimension strategies and standard methodologies for maintaining dimension and fact tables with considerations for maintenance and performance, and
- Metadata management and integration with ETL metadata.
Data Acquisition and Staging Standards
Data acquisition and staging processes should be defined with considerations for efficient data extraction and to minimize intrusion against upstream source systems. Activities that can be standardized include:
- Defining the formats and content for source system extracts and acquisition methods (including file naming standards, formats and delivery mechanisms);
- Defining additional staging metadata (used to support common downstream processes such as auditing); and
- Defining common processes for all data staging activities.










Be the first to comment on this post using the section below.