CATEGORY: Data Integration
REVIEWER: Julie Filbrun, research associate at MDRC.
BACKGROUND: How have welfare recipients and other low-income urban residents fared in the new welfare environment? Which groups are better off and which worse off? How have social and economic conditions in big cities changed since the 1996 federal welfare reform? These are just a few of the questions that MDRC seeks to address as part of its Project on Devolution and Urban Change. Launched in 1997, Urban Change is a multifaceted study of the implementation of welfare reform and its effects on families, neighborhoods and institutions in Cleveland, Los Angeles, Miami and Philadelphia. It integrates data from surveys, administrative records, ethnographic research, neighborhood-level statistical data and other sources to present a coherent picture of welfare reform in each county.
PLATFORMS: Our platform consists of a Compaq Proliant 8000 with Windows 2000 running SAS 8.2. It has four Pentium III Xeon processors and four gigabytes of memory. The server is configured with 14 separate SAS workspaces to run SAS jobs and a storage area network (SAN) with more than half a terabyte of disk space to store data.
PROBLEM SOLVED: We use SAS to clean, restructure and analyze our data. Files from various sources had to be ordered and merged prior to analysis. The files averaged between five and ten gigabytes, but some were as large as 20 gigabytes. When we tried ordering these large files, the SAS program would occasionally crash. We then did tests with SyncSort. We used SyncSort in combination with SAS options that control the memory size and were able to halve the ordering time for many files. Because SyncSort's ordering routine was more efficient, we also didn't have a problem with SAS crashing during the order. Now our server is set up to use SyncSort whenever data is ordered by SAS.
PRODUCT FUNCTIONALITY: Prior to using SyncSort, the process of ordering a 6.5GB data set took 34 minutes. We used SAS's SORTSIZE option and got the processing time down to 21.5 minutes. Next, we used SyncSort with SAS's SORTSIZE option and reduced the processing time to 12.5 minutes. We also used SyncSort in combination with SAS compression. Another test was run with 11.8GB of data. With SAS alone, the process took one hour and 15 minutes. Running SAS with SyncSort, the elapsed time was reduced to 26 minutes. By using SAS compression along with SyncSort, the elapsed time was cut to 22 minutes a reduction of 65 percent versus the original processing time.
STRENGTHS: It was very easy to integrate SyncSort with SAS software. SAS has an option that lets you configure it to always use SyncSort for ordering. SyncSort is called each time PROC SORT is run in a SAS program. Because most of our programmers work with much smaller data sets, we made sure that SyncSort enhanced performance for these datasets as well. Even for the much smaller data sets, we always achieved equal or better time. Because of our test results, we configured SAS so that the default ordering utility is SyncSort.
WEAKNESSES: SyncSort does offer performance-tuning options that can be set when SyncSort is run from the Visual Syncsort Interface or called from the command line. It would be nice to be able to take advantage of these settings when SyncSort is initiated from SAS.
SELECTION CRITERIA: I began searching for a way to relieve the bottleneck and avoid the crashes that resulted when ordering large SAS data sets. SyncSort was an appealing option because it was easy to tie into our SAS system. Because SyncSort could be called automatically from SAS, our SAS programmers would not have to learn to use a new application. Syncsort allowed us to test the product prior to purchasing it. The dramatic improvement in processing times ultimately made our decision for us.
DELIVERABLES: After implementing this new process with SyncSort, MDRC can now focus on analyzing the data rather than on just processing it. MDRC has already released an in-depth look at the welfare reform experience in Cleveland and is currently producing reports on welfare reform in Philadelphia, Los Angeles and Miami. We now have the tools we need to break through the bottleneck and analyze data for the next studies. These reports provide legislators and practitioners with the information they need to develop more effective policies and programs.
VENDOR SUPPORT: When we first started running tests on the largest data sets, we were receiving a system error. SyncSort's technical support team was very responsive. We provided them with information about our server configuration and they hunted down a fix for us on the Microsoft Web site. They also provided advice about how to use SAS options with SyncSort to maximize the performance benefits on our system. The team was very helpful throughout the process.
DOCUMENTATION: The documentation is good and has met our needs.
Register or login for access to this item and much more
All Information Management content is archived after seven days.
Community members receive:
- All recent and archived articles
- Conference offers and updates
- A full menu of enewsletter options
- Web seminars, white papers, ebooks
Already have an account? Log In
Don't have an account? Register for Free Unlimited Access