The U.S. Census Bureau's Foreign Trade Division recently rolled out a SAS data warehousing solution to aid in the complex job of analyzing and disseminating timely data about United States foreign trade. The division processes some 5 million records of import/export data each month from the U.S. Customs Service, analyzes it and distributes it to government agencies, economists, analysts, students and other interested parties. The new data warehouse, based on flexible and robust software from data warehousing leader SAS Institute and running on powerful Intel processor-based servers, has enabled the division to cut analysis time from days to minutes, increased analytical depth and flexibility and decreased costs.
Ten years ago, the Foreign Trade Division migrated from the Bureau's Unisys mainframe to microcomputers running DOS, the Microsoft LAN Manager network operating system and the Clipper database. The microcomputers gave the division more self- reliance and a greater ability to deal interactively with its data update master files, produce reports, maintain data and the like. More immediate access to trade data allowed analysts to fill data requests faster and perform richer types of analyses. They were free from COBOL programming, batch processing and lengthy waits for data.
In the late 1990s, the Foreign Trade Division upgraded its computer infrastructure further by moving to Microsoft Windows, Microsoft Win-dows NT Server, Novell NetWare 4.1, SAS software and Intel Pentium II Xeon processor-based servers. At the same time, we upgraded our network communications from 10Mbps to 100Mbps. The goal was to prepare for things such as Web-enabled information access, an increase in the volume of trade data due to expanded U.S. trade relations and an increased number of customers interested in this type of data.
The new SAS data warehouse allows the Foreign Trade Division to perform online analytical processing (OLAP) on the 5 million data records coming in each month from the U.S. Customs Service some 60 million records a year. This data currently covers 18,000 import commodities and 9,000 export commodities traveling between 250 countries and 50 regions within the U.S. SAS software gives us the ability to perform multidimensional analysis on these vast sums of data, and the Intel processor-based servers give us the power required for compute-intensive processing. Prior to this new data warehouse, if someone wanted a year's worth of monthly data for imports and exports on 20 different commodities, a programmer would have to write a program that pulled the data from 24 CD-ROMs. That data then needed to be consolidated into one file, which was then used to create a report or exported to a spreadsheet a task requiring several days. Today, using the warehouse and SAS software OLAP capabilities, a clerk can access, transform, analyze and provide a Web-based or paper report on the data in a matter of two hours.
For another example, we get requests for data on one or more specific customs service districts. To fill these requests, a programmer would write a dBase program. Depending on the results, the programmer might then be asked to write subsequent programs to calculate commodity totals for particular districts. Today, using the SAS data warehouse, when a clerk satisfies the initial request, the group totals are already computed.
Profile of the Data Warehouse
When the raw import/export data arrives from the U.S. Customs Service, it's passed through a server that extracts, transforms, cleans and loads the data into the warehouse. The warehouse itself is currently 16GB in size and is a straight index file that feeds several data marts and direct Internet requests. Data marts contain subsets of data on specific countries and commodities as well as two years of historical data and are used to answer questions on popular topics much faster than would be possible sending requests through the entire data warehouse.
SAS software runs on 12 servers: two 4-way Intel Pentium II Xeon processor-based servers (Micron 6200 Series), eight 2-way Intel Pentium II Xeon processor-based servers (also Micron 6200 Series) and two 2-way Intel Pentium Pro processor-based servers (COMPAQ ProLiant 5000). The two servers that run SAS/IntrNet software run Windows NT Server, and the others run Novell NetWare. The division has approximately one terabyte of storage among its servers. The data warehouse is used by approximately 100 clerks and analysts, 15-30 of whom are accessing it concurrently.
The total system roll out took about two and a half years and cost approximately $250,000 including hardware and software. Implemen-tation was without incident or need for major organizational changes.
Harvey Monk, Chief, Foreign Trade Division for U.S. Census Bureau
More Done at Less Cost to Taxpayers
The SAS data warehousing solution running on fast Intel processor-based servers has yielded several significant benefits to the division. SAS software gives analysts far more powerful tools for performing rich, OLAP analysis. They can discover trends and fill information requests that just weren't possible with our previous database software. SAS software also allows our non-programmer staff to fill many information requests that were handled by database programmers. Everything happens much faster, too. A request that took three to four days to process in Clipper takes 10-20 minutes with SAS software. Plus, much of the needed information is precalculated in the SAS applications, freeing programmers from churning out multiple programs so they can concentrate on filling more complex requests. This speedup allows our programmers and analysts to be far more productive and allows the division to respond to many more information requests.
Overall, SAS software and the SAS data warehouse have lowered our production costs by approximately 30 percent and reduced our database administration costs by giving us a consistent data format across all our databases. Moving from Clipper to SAS software cut our code base by a whopping 80 percent.
The Intel architecture servers give us unprecedented price/performance and scalability, delivering mainframe-caliber performance without the mainframe price tag. The Pentium II Xeon processor has a number of features that contribute significant performance gains in computational-intensive data mining tasks. We can upgrade performance incrementally as our needs expand, rather than paying for big, expensive outlays of computing power every few years. Availability has been outstanding: our uptime runs better than 99.5 percent; any downtime is due to interruptions in power rather than servers.
In the near future, the Foreign Trade Division plans to move many of its data fulfillment services to the Web, allowing customers to service their own information needs. SAS software supports Web-enabled data distribution; in fact, today we use this capability to allow staff in remote offices to tap into our data warehouse. The Intel and SAS data warehouse is a powerful, scalable solution on which to build a future that includes broader Web access, higher data volumes, more information consumers and more sophisticated types of analysis.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access