Medicare Data, SAS, WPS and R, Part 1
I've recently written on Medicare utilization data that's been made available to the public for analysis. Along with the csv file, the Centers for Medicare and Medicaid Services (CMS) provide a text script housing the SAS “infile” code to load the csv into a SAS data set for further analysis.
A newsletter from SAS has led me into deeper involvement with Medicare data. Under Books & Training is mentioned a new monograph entitled “SAS Programming with Medicare Administrative Data”, by Matthew Gillingham. Intrigued by its claim to be “most comprehensive resource available for using Medicare data with SAS. This book teaches you how to access Medicare data and, more importantly, how to apply this data to your research.” I purchased the book and visited its website. Between the two, I found lots of goodies.
SAS Programming's a quick 140 page read, quite useful for learning about the Medicare program and the types, functions and locations of Medicare administrative data. In addition to the CMS website, which has some data files available for download, there are other agencies like the Research Data Assistance Center (RESDAC), whose site provides even more comprehensive Medicare/Medicaid data for a fee. As with CMS, RESDAC data appears to be closely tethered to SAS.
SAS Programming works through a comprehensive research example, providing background, design and soup-to-nuts SAS code. The illustration is pretty hefty, supported by a 172 MB zip file accessible from the bottom of this page.
It was SAS deja vu for me exploring the book's code and data. One folder of the uncompressed zip contains all the SAS data sets used in the book's analyses, several of which are in excess of 1M records. A second holds source code pertaining to chapters 5 through 10 of the book, covering ETL, Enrollment, Utilization, Cost, Conditions and Output.
I could pretty much understand everything that was being done. The code represents data programming SAS style an amalgam of data steps, macros and everyday procs such as SORT, MEANS, SUMMARY, SQL, FORMAT, PRINT etc. Indeed, this metaphor of using SAS for assembling and presenting data without deploying the product's advanced statistical functions was quite common when I was a SAS consultant back in the day, and I'm sure similar SAS data programs are in production today. Indeed, in the early 90s, my then-company used SAS as an ETL platform to populate the new-fangled decision support relational database which became known as the data warehouse.
One snippet of SAS Programming code that particularly hit home was pivoting claim lines into a “wide” data set using proc SORT along with data step code that deployed “by” groups, SAS arrays, do loops, retain statements and first and last functions. Seems like just yesterday....
Statistical data programming is also the grist for SAS-clone WPS, a product from U.K. vendor WPL that provides SAS functionality at a fraction of the price. As I examined the code and SAS data from SAS Programming's download, I thought it'd be a nice to determine if WPS was up to the challenge of reproducing the SAS results. So I decided to put it to the test.
Much to my gratification, the migration of SAS Programming's scripts to WPS was transparent. Once I created the folder structures and copied the data to my pc, I pretty much had only to change the SAS “libnames” and supporting file names in the scripts from the author's structure to mine. The one minor glitch was with a picayune feature of the ODS (Output Delivery System) statement that wasn't recognized by WPS. I simply commented out the offending code and the compiler was happy.
SAS Programming's scripts were designed to run in batch console mode, as is the norm for code from SAS publications. Had they been developed in a productive environment such as the spiffy WPS Workbench, however, the code would be much simpler. The ODS statements would disappear, output instead managed by the workbench. And the proc PRINT statements to display records from newly-created SAS data sets similarly would be redundant, the data accessible to browse in the environment. All told, WPS Workbench is a delight for SAS development and made my migration/testing a breeze.
That SAS is so pervasive in commerce, government and health presents an analyics challenge for the data science world. Even if it's not the analyst's statistical platform of choice, whatever is must certainly be able to read and operate on SAS data sets. Like Excel, SAS is now a significant legacy data source.
WPS provides help here too. Its new proc R is capable of both importing and exporting data between SAS and R data frames, as well as executing R code in WPS scripts. In an upcoming blog, I'll discuss using proc R to export from SAS Programming's data to R and subsequent experiences adapting the algorithms between the two statistical platforms.