The Internal Revenue Service (IRS) revised its mission statement in September 1998, to reflect its renewed commitment to America's taxpayers. Our new mission statement reads as follows: "Provide America's taxpayers top quality service by helping them understand and meet their tax responsibilities and by applying the tax law with integrity and fairness to all."
Achieving our new business vision and objectives are the highest priority of the IRS. I firmly believe that a large degree of our success will hinge on our ability to align information technology (IT) with this business vision. Alignment is a fundamental strategic concept and identifies the ways in which multiple groups cooperate and collaborate to achieve a common agenda.

We determined that an analytical processing (decision support) architecture was necessary to build the foundation which would enable IRS research and business communities to analyze and respond creatively and proactively to their business environment.

As part of the analytical processing architecture, we've developed a Compliance Data Warehouse (CDW) to leverage new and emerging decision support technologies. The CDW implementation has enabled the IRS, for the first time, to provide the research and business communities with a decision support system that can model a myriad of factors which affect the decision making process.

The IRS' Compliance Data Warehouse

The CDW application manages three terabytes of storage with approximately 1.2 terabytes of raw data and provides a mechanism for users to query this data using a wide range of analytical tools.

CDW is comprised of: Sybase Adaptive Server IQ; Sybase PowerBuilder; Dual Sun Enterprise 6000s running Solaris 2.6, 32GB RAM; Sybase Open Client and Open Database Connectivity (ODBC) APIs; EMC data disk arrays.

Building a data warehouse of this size and complexity posed many challenges. The primary technology challenge was to build a system that could manage multiple terabytes of data and yet was sufficiently open to facilitate queries from a variety of different off-the-shelf products. We selected Sybase IQ as the data management server based on its strength with decision support type queries. The IQ server can process queries from any application built with the open client or open database connectivity (ODBC) application programming interfaces (APIs). This provides sufficient flexibility to connect users who want to use a wide variety of off-the-shelf tools. This capability minimized the need to retrain users on new tools and helped gain user acceptance for this system.

In addition, we selected Sun Microsystems' hardware as our server architecture. We currently have dual Enterprise 6000s running Solaris as our server platform. The Sun architecture provides an open, scalable computing environment that can grow with the business needs of the organization. Also, Sybase IQ and Solaris are a well- established high-performance combination.

Sybase Professional Services has played a major role in the project. Sybase was responsible for designing the complex data models, doing data staging and transformation, and populating the data warehouse. In addition, it played a significant role in the development of our CDW Web site and was also responsible for developing a front- end application for our accounts-receivable data.

The CDW Web site is also a key part of the system architecture. This site publishes all meta data for the warehouse. This meta data includes entity relationship diagrams, detailed table and file descriptions, established rules for querying columns and tables and listings of information currently in the warehouse. This site manages meta data for the IQ warehouse and for other flat file, SAS and SPSS data sets. This site is extremely important to our users since they all have different data requirements and need a detailed understanding of the available data to properly develop queries. This site also distributes connectivity software and useful templates to the user community.


The IRS' Compliance Data Warehouse (CDW) represents the most innovative uses of new and emerging decision support technologies for enabling Compliance business units to manage information. CDW has achieved what has been difficult for most IT efforts and that is the alignment of information technology with the business strategies of the organization. Our ROI has already produced an approximate 200:1 ratio. As additional objectives are achieved through CDW, the anticipated ROI will soar to unprecedented levels.


We found three major kinds of challenges: technical, organizational and resource. A critical aspect of building a data warehouse is the transformation of legacy data to the analytical environment. With the many tax-law changes over the years, the data structure of IRS data sets is constantly changing. Integrating the various data sets of multiple years proved to be a significant challenge to the project team.

In addition, the data volumes were larger than anything we'd previously implemented on our server products. We accepted some risk in this area, but found that the server environment scaled to handle this load.

The source-data formats also presented technical difficulties for the team. Many of these data sets were available separately but had never been integrated into a single consolidated set. Some of the source data originated from hierarchical sources and had to be reorganized into an easily understandable relational database structure. Finally, this data had to be converted from a variety of different formats into an ASCII format compatible with the load process.

The most difficult organizational challenges were educating the organization on the differences and benefits between operational (OLTP) and analytical processing architectures as well as convincing it to accept the risks associated with the investment.


Although there are many factors to consider when building a data warehouse, several come to mind that may be of assistance.

First, it is important to gain an understanding of the "big picture" for your analytical processing environment and then build it one step at a time. Don't try to conquer all of your organization's needs/problems at once. Rather, adopt an iterative approach to the development of your warehouse, not only because it is the best way to build such an application but also because it lets you show return-on- investment early on. Remember that the data warehouse is data-driven and not the automation of business processes.

Second, deliver value ­ return on investment. A data warehouse is no good unless you deliver value to the users. Then get ready. Once the business units start to gain access to strategic information, their appetite for more information will be enormous.

Third, leverage your existing network and technology as much as possible to reduce costs and schedule.

Fourth, make sure you research your target technology thoroughly, and don't be afraid to make vendors prove their claims, because you must build an open, scalable computing architecture that can grow with the needs of the organization.

Fifth, make sure that the business drives the evolution of the data warehouse, because your goal is to provide the business functions with the information necessary for them to respond creatively and proactively to their environment.

Sixth, don't underestimate training and the costs associated with data transformation.

Finally, find an executive in your organization that has the vision to support your effort and then stay the course. You'll probably encounter a lot of opposition within your organization when it comes to investing in a data warehouse, especially if your company is rooted in mainframe technology.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access