I’d like to thank colleague Josie Pettygrove for writing this week’s blog on her experience at the recent Strata Rx 2013.

Better medical outcomes at lower cost -- these are the goals of modern health care. Data and analytics can play a major role in helping to achieve these objectives by enabling evidence-based decision making in the industry. Alas, with but a few exceptions, that promise has not yet been delivered. Why?

Health care is a risk-averse industry. With decisions that can have life and death consequences, new business models and processes are adopted cautiously. Unfortunately, health care has not widely taken to many technologies now pervasive in the commercial world.  And data that is common – claims, clinical, electronic health records, genetic, and personal health device-generated -- exist primarily in non-integrated silos.

The good news is that the landscape is starting to change as health information exchanges assume the role of clinical data repositories from multiple providers, and accountable care organizations begin to integrate payment and quality measures. Yet while insurance companies, researchers and clinicians have made significant contributions to the industry through analysis of their respective data, the ultimate promise lies in integrating these disparate data sets to gain new knowledge. 

Readers with experience in the health care industry might now be thinking “Privacy!” “HIPAA!” -- incredulous as they consider the challenges of sharing and integrating health data for analyses. Indeed, privacy regulations are a big obstacle blocking the use of all the data in the industry to meet health and financial goals. 

But help may be on the way. Last week at the Strata Rx 2013 conference in Boston, I learned of two exceptional cases that illustrated researchers leveraged big data technologies to revolutionize modern health care.

In the first presentation, Dr. Isaac Kohane of Harvard Medical School married clinical and biomedical data to identify new sub-classes of autism spectrum disorder (ASD). Kohane’s team sent distributed queries to multiple electronic health record systems to assemble an integrated data set with more extensive information than previous investigations. The researchers used natural language processing to glean findings from physician notes, which often contain richer, more detailed information than claim diagnosis codes. Then, with statistics driven by R statistical software, the team identified multiple clusters of ASD, including epileptic, bowel disorder, and schizophrenic autism – each of which has different treatment indications.

In related research, Dr. Kohane developed a test for autism. By comparing blood samples from autistic patients to non-autistic samples, he identified a set of genes that predict autism. The test increases the chance of early detection, which promotes early intervention.

The net result of Kohane’s traditional genetic research and data wrangling/analytics exercise:  physicians are now better able to identify and treat pediatric autistic patients.

Tellingly, Kohane’s study acknowledged patient privacy by sending distributed queries to multiple data sources, receiving in return aggregate, de-identified results. Had the researchers simply collected data for aggregation, they’d have potentially violated privacy regulations. ‘Sending the query to the data’ may prove to be a key strategy for privacy-respected data integration success in health care.

In another session, Jeff Hammerbacher (of Facebook and Cloudera data science fame) described the adoption of big data thinking and technologies in a major health care environment. The Mount Sinai Medical Center has a legacy data warehouse of clinical data; an electronic medical record system; a well-established high performance computing cluster (with over 7000 cores, 120 servers, and a petabyte of storage); telemetry data from its 1100-plus bed facility; publicly-available health data; an EMR-linked biobank of tissue samples and genomics; and data from its imaging facility. Just as important, they also have Hammerbacher to apply his data science and technical analytic skills.

Hammerbacher’s team runs Cloudera software on a large cluster of servers, with a mission to make a large, integrated health data available to the Mount Sinai community. His group is building data management and analytics libraries, tools for easy programming, and embedded DSLs -- all with the intent to promote data-driven therapeutics that improve outcomes.

Several consistent themes emerged in other Strata Rx 2013 sessions I attended. Today, individual health data is de-centralized and stored in disparate formats. Electronic health records are owned by facilities or health information exchanges, but do not include information from medical encounters outside those groups. Personal health devices and apps (ex: fitbit, RunKeeper, and food intake apps) create valuable data that is unfortunately isolated in silos. Though there was general agreement that mobile platforms can facilitate the collection of personal health data, the challenge remains to integrate those data with existing clinical, claims and genetic datasets.

Health data is becoming more accessible, but true data availability is still a ways out. The department of Health and Human Services has made public over 1000 data sets, while the Center for Medicare and Medicaid Services has released charge data. But even as hospitals implement expensive EMR systems, they sometimes can’t extract data for analytics. And it’s still difficult for individuals to obtain their own health care data from providers. Blue Button, a tool deployed by HHS to make personal health data available to VA patients, is a promising start, but a similar private sector tool does not as yet exist.

The purview of health care data is expanding. New devices and health-related apps appear on the market frequently, and each generates health-related data. An example is the Personal Analytics Companion (PACO) innovation from Google. PACO is a mobile survey tool for conducting health and science experiments, its major contribution the collection of new types of data. Users can create experiments and invite participants. Data from the experiments are secure: participants can only see their own information. An example use case is an allergy survey. Patients can be queried at random intervals about their allergy symptoms. At the end of the experiment, all response data can be retrieved in JSON format and integrated with external air quality data for subsequent analyses. The beauty of PACO is that it was developed with data integration in mind. The JSON data extract format makes it easy to combine results with other data sets.

Information latency remains a challenge in health care. Practitioners want actionable data at the point of care to prescribe treatment during the patient encounter. Alas, most health research findings have yet to be translated into action items and are not available in real time (or even near real time).

Several Strata Rx 2013 presenters gave informative instruction on the analysis of data in health care. In one of my favorite sessions, Wendy Hou and Roxy Cramer of Rogue Wave made statistical modeling fun, providing practical advice on how to develop statistical models, overcome data deficiencies, choose the right algorithm for data, and avoid model overfitting. Good stuff!

Khaled El Emam, author of Anonymizing Health Data, presented key strategies for de-identifying health data sets. He employs methods of date shifting which maintain the order of services and the intervals between services, both of which are critical information in health care.  He also recommended removing public figures and high-cost and high-visit outliers from health data sets.

All in all, the Strata Rx 2013 conference presented a picture of an industry with brilliant clinical and analytic talent and sophisticated legacy systems. Yet, the industry faces integration challenges and has only sporadically adopted the latest technologies and thinking. I’m already looking forward to the April Strata Rx 2014 conference. I anticipate seeing more cases of big data technologies and analytics augmenting traditional research in a way that propels the health care industry forward.

Josie Pettygrove is a senior BI and analytics practitioner with health care industry focus for OpenBI.