SAS continued fortifying its data management capabilities in the 80's and 90's, providing access to just about any type of file format, ASCII or EBCDIC, mainframe, mini or PC, and relational databases. It's introduction of a powerful SQL procedure further distanced SAS from competitors. With extensive programming language and function libraries, file and database access features, and data set handling capabilities, SAS could not only analyze your data, it could programmatically build your intelligence data stores as either SAS data sets or relational databases. I started using SAS more and more for data migration and reporting tasks, often in situations that had little to do with its statistical lineage.
As an executive with a fast-growing business intelligence consultancy in the 90's, one of my strategic goals was to make SAS as successful in other business environments as it was in pharmaceutical and financial services research – to the mutual benefit of both SAS and my company. Data warehousing was becoming increasingly popular, but ETL (Extract, Transform, Load) tools like Informatica and Ascential were not yet pervasive in the market. Ubiquitous in financial services and pharma, SAS became a second generation (Cobol was first) language for populating the data warehouse in many instances. Companies were using SAS data step programming to load data and issue reports.
I left my then-BI company at the end of 2001, joining a small consultancy in early 2002. I immediately called the local office of SAS in search of my sales contact but found instead a new regime – one much less supportive of partnering with small companies like mine. Much to my chagrin, I was unable to negotiate a demo license deal, offered only a software discount that was nevertheless prohibitively expensive for my new employer. I had to settle for plan B, purchase of an inexpensive but limited-capability student license. After a few futile attempts to work with this neutered product, I gave up, deciding instead to find non-SAS alternatives. After 21 years, I was getting a statistical divorce.
To replace SAS, I needed to find tools that offered both statistical and programming/data management capabilities. Fortunately, I wasn't starting from square one. I'd been working with Perl for a few years and grown to appreciate its capabilities. The more experimenting I did, the easier it became for me to do things in Perl I'd historically done in SAS. A year or so later, I started using Perl competitors Python and then Ruby – like Perl but object-oriented as well. By that time, I was pretty adept at programming with agile languages. They had powerful but simple programming and data constructs, and could access the Internet as well as a variety of file types and relational databases. That the languages were OO meant they were more suitable for larger tasks than SAS data step programming.
On the statistical side, I started using S+ from Insightful. I'll be the first to admit the learning transition from SAS to S+ was steep and sometimes frustrating. But after several months, I started to “get it” – thinking in S+ rather than in SAS. The big step up in graphics capabilities alone was worth the short term angst. I also discovered S+'s academic step brother R a bit later, and finally had the pieces I needed for my statistical needs – open source R alongside Python or Ruby, with databases MySQL and Postgres for good measure. There was even an interface that allowed Python to embed R. Much as I liked SAS back in the day, I preferred this statistical platform in 2002.
In 2009, statistical packages are simply components of a larger information factory, co-existing with relational databases, ETL tools, query and reporting packages, OLAP, visualization software and dashboards. In my meanderings across the BI and statistical markets for OpenBI, however, I continue to find companies shackled with SAS legacy code that's worked well for a long time – but with onerous licensing annuities. The sunk development costs of these apps paralyzes the firms from considering upgrades to more modern and inexpensive ETL and reporting technologies like open source Pentaho and Talend.
A few weeks ago, I discovered the World Programming System, software “that provides a code interpreter that can execute programs written in the language of SAS.” Intrigued, I downloaded a 30 day trial and started constructing simple scripts. It was like seeing an ex after eight years of separation. A couple of hours into the work, I started feeling comfortable with the forgotten SAS syntax, though I repeatedly omitted terminating semi-colons and run statements. I then submitted more sophisticated programs in deep storage from yesteryear, continually increasing the size of data sets I was loading, ultimately finishing with one of several million records. A 800,000 case, 40 attribute SAS data set took less than a minute to create with WPS on my Vista notebook. All told, WPS has more than met my expectations: Every line of data programming syntax I've tried to date has worked with excellent performance.
After a review of the WPS website and a chat with a very helpful marketing manager, along with research that confirmed WPS licensing costs to be a fraction of SAS, I've concluded that WPS could provide a huge savings benefit to those companies with a significant commitment to SAS data step programming. One caveat: WPS has at this time implemented just a few SAS statistical procs, so it wouldn't be feasible to migrate statistical proc-intensive applications. But for those companies with a legacy commitment to SAS programming who can partition their code into data and statistics, I'd enthusiastically recommend a look at WPS. Do a 30 day trial and put the software through its paces with your SAS data and programs. Assess the ROI versus the status quo. You might just find savings you can use in other BI areas.
Steve Miller's blog can also be found at miller.openbi.com.