One important trend in the business intelligence (BI) marketplace is toward DBMS vendors (such as IBM, Microsoft and Oracle) integrating ETL, OLAP and data mining features into their database products. Given that many of these capabilities are available for free or at little cost, organizations are asking, "Why should we purchase a third-party BI tool when we can get the BI capabilities for free from our DBMS vendor?" In the next two columns, I will try to answer this question.

To evaluate database ETL processing, organizations need to understand the level and type of ETL capabilities being offered by DBMS vendors. These fall into three basic groups:

  • Coexistent ETL facilities that can be directly exploited by ETL tools and user-coded applications to improve productivity and performance.
  • Substitute ETL facilities that replace certain types of ETL tool processing by equivalent capabilities in the DBMS, with the objective of reducing ETL development time and eliminating the need for certain ETL tool components.
  • Competitive ETL tools that directly compete with third-party BI products.

Two examples of coexistent ETL facilities are DBMS improvements to the manipulative and analytical power of the SQL language for data transformation and aggregation, and the ability for developers to code and maintain database stored procedures written in either a scripting language or a programming language such as Java. User-coded ETL applications and vendor ETL tools can both make use of these capabilities. The primary benefits of doing this are that the DBMS can exploit its optimization, parallel processing and scalability features to provide good performance when executing these SQL statements and stored procedures. The ability to maintain stored procedures in a common recoverable data store is also an advantage.

One example of a substitute ETL facility is the ability of DBMS products to use SQL or stored procedures to access foreign data stores and application middleware. This facility could, in some situations, eliminate the need for third-party ETL tool adapters that access source information during extract processing. DBMS vendors, for example, are now beginning to provide the ability to issue SQL statements that can interact with Web services-based applications and the queues of messaging products. These latter facilities are particularly useful for building and maintaining a near real-time data warehouse. One key advantage of the third-party ETL products here, however, is that they support a wider range of data sources and usually have better meta data facilities in this area than DBMSs.

Another example of a substitute ETL facility is a materialized view.1 These view definitions can be used to define an aggregated summary table that is automatically updated whenever one or more associated tables of detailed data are modified. This means that as detailed data flows in a data warehouse, summary tables are automatically maintained by the DBMS. It is important to point out that summary tables created using materialized views may not be as flexible as independent summary tables created by ETL tools because of restrictions in materialized view implementation of some database products.

Most DBMS vendors offer graphical ETL development tools that exploit the ETL capabilities of their database systems. These competitive ETL tools provide many of the features offered by third-party ETL tools but at zero cost, or for a license fee that is a fraction of that of independent tools. Examples here include the IBM Data Warehouse Center (which is provided with DB2) and DB2 Warehouse Manager, Microsoft Data Transformation Services (which is included in SQL Server), and Oracle9i Data Warehouse Builder (which is a component of the Oracle9i Developer Suite).

At present, DBMS ETL development tools vary considerably in the functionality they provide; and independent ETL tools still offer important additional benefits in terms of the data sources supported, transformation power and administration capabilities. The DBMS ETL tools, however, are very useful for building departmental data marts.

I expect DBMS vendors to continue to enhance their ETL capabilities and compete aggressively with the third-party ETL tool suppliers to increase their share of the BI market. The recent release of Oracle9i Data Warehouse Builder and Microsoft's plans for the next release of SQL Server (code name Yukon) are a clear indication of this. IBM is also continuing to improve DB2's ETL capabilities, but is not as aggressive in its development plans or marketing as Microsoft and Oracle.

In summary, the ETL capabilities being offered by DBMS vendors both enhance and compete with independent ETL tools. It is likely in many installations that a mix of DBMSs, third-party software and in-house developed ETL capabilities will be used, and careful determination of requirements and evaluation of products is therefore required.

Reference:
1. An excellent book on the use of SQL extensions and materialized views for BI can be found in the IBM Redbook entitled "DB2 UDB's High Function Business Intelligence in e-business," IBM document number SG24-6546. This book can be found online at ibm.com/redbooks.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access