JUL 1, 2007 1:00am ET

Related Links

When Fast is Not Enough
July 18, 2008
TopQuadrant Software Imports Email MetaData into Semantic Applications
March 26, 2008
An Open Challenge to the Open Source Community
November 30, 2007

Web Seminars

6 Key Things to Fast Track your Mobility Strategy
February 23, 2012
Why Getting Started in MDM Doesn't Have to Be Difficult
February 29, 2012
Dashboards: How's Business? Ask your Data!
March 15, 2012

Building a Quality BI Framework Solution Starts with a Quality ETL Solution

Print
Reprints
Email

Over the past 10 years, the business intelligence (BI) community has been bombarded with an ever-increasing number of emerging vendors and products, touting new and better functionality than before with more buzzwords and acronyms than even your most savvy IT professional can keep up with. The world of reporting, database and extract, transform and load (ETL) tools as separate commodities is starting to fade as vendors are offering one-stop shopping for all your BI needs. In particular, many database and reporting tool vendors now provide ETL functionality as part of their BI suite, either through enhanced functionality or via acquisition of third-party software. However, even with the tremendous availability of software and functionalities on the market, there is still no shortcut to building a quality BI solution, even with the best tools available. In fact, the task often becomes more difficult as enterprise solutions are often built using best-of-breed technologies from different database, ETL and BI/reporting tool vendors. Developing a quality solution requires quality within and across all components of the BI architecture.

With the front-end focus of many BI projects (after all, this is what sells BI engagements and what the business users see at the end of the day), it is not surprising that organizations often lose sight of the workhorse of the BI effort - the back end of the production line where data is acquired, cleansed and transformed into useful information for  downstream consumption. This effort often accounts for at least 70 percent of the overall development effort of a typical BI project and, if not architected correctly, renders the presentation of data useless. Garbage in is garbage out, whether the data is coming from the source systems or an enterprise data warehouse.

Typical high-level enterprise data warehouse architectures contain several noticeable key components. We've all seen them, and most are usually cookie cutter in nature, as illustrated in Figure 1. However, the most important component is often talked about briefly prior to the beginning of an engagement and given the lowest priority once development has started. I am talking about quality control, a component that is not solely concerned with the quality of the data as it moves through the different layers of the architecture (as important as that is), but one that focuses on the quality of the ETL deliverables and processes. Implementing a quality solution from a back-end perspective is not an easy task. It should be no less complex than implementing a quality OLTP or ERP solution. In many cases, it may be more complex. However, it is not an impossible task, and when architected correctly, it can be leveraged for future phases and/or future BI efforts.


Figure 1: EDW Architecture

The key to building a quality ETL solution is to define and develop standard processes and templates throughout the architecture, design and development phases of the project and ensure that those standards are leveraged correctly. These standards encompass more than just corporate naming standards for database development; they also include process standards, standards for all ETL processes and data movement activities across the entire data architecture. Developing a consistent set of standards ensures conformity across the components, reduces development and testing times, simplifies maintenance and reduces total cost of ownership. Also, because many BI efforts are now developed using a combination of local and contractor resources, including offshore development and support models, it is increasingly important that these solutions are designed and built with solid common standards and processes in place that have been tried and tested.

Developing Standards

As different as most BI projects are, from an enterprise perspective the architectures are very similar, and standards can be developed for many of the components of the back-end architecture. The minimum standards that should be defined include:

  • Database naming standards for the enterprise data warehouse (EDW) and/or data marts, including intermediate staging areas;
  • ETL standards for all the ETL components, whether using an ETL tool or other methods for ETL; and
  • Operating system and standards for security where appropriate.

These standards, however, only scrape the surface when it comes to developing a quality solution that should also include:

  • Defining and/or developing processes for all common components of the architecture;
  • Defining and implementing procedures and templates for ensuring that standards are followed and implemented correctly; and
  • Defining appropriate levels of testing and oversight to ensure that all deliverables have been tested thoroughly and implemented consistently.

Common Process Standards

Standards should be applied throughout the back-end architecture, leveraging existing internal and vendor standards where appropriate. However, to be truly effective it is important to standardize all back-end processes. Defining standards for these processes up front is one which will certainly pay off during other development initiatives, as future phases and projects can be implemented more quickly and new resources can get up to speed in a shorter time frame. From an enterprise's perspective, activities that can and should be standardized include:

  • Data acquisition and staging activities,
  • Model and ETL standards for maintaining history in the different layers of the architecture,
  • Data quality and exception handling activities,
  • Conformed dimension strategies and standard methodologies for maintaining dimension and fact tables with considerations for maintenance and performance, and
  • Metadata management and integration with ETL metadata.

Data Acquisition and Staging Standards

Data acquisition and staging processes should be defined with considerations for efficient data extraction and to minimize intrusion against upstream source systems. Activities that can be standardized include:

  • Defining the formats and content for source system extracts and acquisition methods (including file naming standards, formats and delivery mechanisms);
  • Defining additional staging metadata (used to support common downstream processes such as auditing); and
  • Defining common processes for all data staging activities.

Data Quality and Process Standards

Filed under:

Advertisement

Comments (0)

Be the first to comment on this post using the section below.

Add Your Comments:
You must be registered to post a comment.
Not Registered?
You must be registered to post a comment. Click here to register.
Already registered? Log in here
Please note you must now log in with your email address and password.
Twitter
Facebook
LinkedIn
Login  |  My Account  |  White Papers  |  Web Seminars  |  Events |  Newsletters |  eBooks
FOLLOW US
Please note you must now log in with your email address and password.