If you work for a major U.S. corporation and participate in a 401(k) and health benefits plan, the data and bookkeeping for your plans will likely pass through the massive "retirement warehouse" of Hewitt Associates, a global management consulting and benefits delivery firm.

Hewitt manages benefits and retirement programs for more than 2,400 companies worldwide, including two-thirds of the Fortune 100. Headquartered in Lincolnshire, Illinois, we have offices in 34 countries.

Client plans may have 100,000 or even a million accounts which must be updated monthly with income, investment and historical information. In addition, we publish a widely used index of 401(k) investment activity. Designed to provide insight into 401(k) plan participants' investing behavior, the Hewitt 401(k) Index tracks investment transfer activity of nearly 1.5 million employees in large employer plans with approximately $68 billion in collective assets. Data management is vital to producing the index and to serving our clients and their millions of employees.

Data Warehouse Created

Warehousing is the key to our data management strategy. Our current data warehouse structure goes back to 1996. As the company grew rapidly, the volume and variety of information requests from clients was becoming overwhelming. Moreover, the data to respond to those requests was nonuniform and scattered throughout the company and among our many clients.

We needed to improve our ability to share and analyze information across a highly distributed organization. Our existing tools couldn't meet the increasing level of client-specific analysis we required. To spearhead the company's response, Hewitt formed a data warehouse unit.

Our warehouse group made a fundamental decision to move from a mainframe (MVS) to a UNIX platform to take advantage of the flexibility offered by UNIX. For software to serve as the workhorse of the system, we chose SyncSort UNIX (from Syncsort Inc., Woodcliff Lake, New Jersey).

Flat File Philosophy

Our philosophy is that flat file processing is a faster, more efficient way to perform data transformations on source files and to position data to be loaded into the warehouse, as opposed to doing everything within the database itself. We want to be independent of the database engine.

In addition to providing speed, processing outside the database insulates processing from possible changes. If we were to change versions of the database or change the database entirely, that would not affect the extract, transformation, load (ETL) process.

We have used the sort utility on flat files in a variety of ways to save time and achieve our goal of database engine independence. A dramatic example is that by using SyncSort to reformat and choose only the data we need from the mainframe, we cut transfer time over the network from eight hours to 10 minutes.

Mapping to Standard Values

Creating the retirement warehouse presented immense challenges. Retirement plans and data models differ from client to client. The new warehouse had to communicate with clients, all with different-looking data; at the same time, the warehouse data had to conform to a generic model to facilitate analysis.

The warehouse uses only five employment codes. However, Hewitt has customers who use 20 different employment categories, and different clients use different codes. Thus, it was necessary to develop a method of mapping the many client-specific values to the five standard warehouse codes. The solution was to use SyncSort to do the mapping during the ETL process for several fields.

The result is that both the client and standard codes are available in the warehouse, giving Hewitt the analytic flexibility it needs. Fields containing client-specific values are used to generate client reports, while fields containing the warehouse standard values generate cross-customer reports.

We had a related mapping problem where we used SyncSort to replace a slow C program and reduced what would have been a 10-hour job to 10 minutes. One table required that we convert 10 codes, and our tests indicated the C program would need one hour per code. Using SyncSort, we created a data conversion application that included if-then logic. Then we dynamically linked the ETL process to SyncSort. With SyncSort doing the mapping and batching, it took only about 10 minutes to run all 10 codes.

Financial Warehouse

We utilized the experience gained from the retirement warehouse to build a financial warehouse based primarily on source data from our ERP system. We wanted to create a flexible, scalable architecture that can grow and change as Hewitt's business intelligence needs grow and change. Hewitt's strong data warehousing infrastructure team has become a key element of our ability to service our customers. We are able to implement projects that support strategic initiatives and meet specific business needs ­ and we are able to do it with speed and efficiency.

SyncSort is a high-performance data manipulation product that speeds extract, transform, load (ETL) applications. It performs extract routines and cleansing functions; sorts, merges, aggregates and converts data; and resizes records to produce multiple load files. Its companion product, Visual SyncSort, allows ETL applications to be created through a Windows-based graphical interface.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access