Few organizations have embraced the challenge -- and promise -- of big data with as much commitment and resources as the University of Pittsburgh Medical Center. One of the leading academic medical centers in the country, UPMC is finding ways to gather, assimilate and analyze disparate data feeds that previously were difficult to access and aggregate at the enterprise level.

Today, the $12 billion health care system is using big data to advance breast cancer research, cardiac care and more.

But it hasn’t been easy.

“It has been challenging to aggregate and harmonize clinical data from multiple information systems covering 7 million patients,” says Ed McCallister, chief information officer at the Pittsburgh-based organization.

UPMC is both a health care insurance business and a provider and its sheer size has the organization awash in data that must be controlled and adapted for use. “We have over 2 million health plan members, 22 hospitals and 3,500 physicians,” McCallister says. UPMC provides researchers and physicians access to structured and unstructured data, including x-rays, CT scans, MRIs, pathology reports, physician notes, histopathology reports and post-op notes. UPMC has a total of 8 petabytes of data that is readily accessible online, plus an additional 14 petabytes in long-term storage.

“The great opportunity we have is the abundance of data sources,” McCallister says, “including from our insurance services, health services and international division.”


Also See:

How Big Data Improves Care at Children’s Healthcare of Atlanta

How Big Data Helps the Tiniest Patients


For more than a decade, UPMC has had a data warehouse and analytics for the insurance services division. It contained primarily claims information.

But other parts of the organization weren’t as advanced in collecting and utilizing data sources.  “There were some haves and have-nots. Some areas were stronger in the use of business intelligence tools than others.”

However, over the last few years UPMC has accelerated its effort to extend data analysis capabilities throughout the organization. In 2012, UPMC began a $100 million, five-year enterprise analytics effort that includes the application of big data as part of an organization-wide data warehouse effort.

An initial step was to define exactly what data was useful to doctors and researchers. “We did this through a combination of identifying business needs and applying analytics to produce an ‘information layer’ for our user community,” McCallister says.  

A second step was to create the enterprise-wide data warehouse and data analytics. In doing so, UPMC also sought to leverage its existing IT investments where possible, McCallister says. “We went with the tools we had in place,” he adds.

Today, whether using new or existing BI tools, researchers and clinicians have access to the same information contained in the enterprise data store.

Big Data Foundation

UPMC’s enterprise data scheme encompasses a host of technologies, built on a virtualized server and storage environment. UPMC manages more than 8,000 virtual servers -- all of its Unix/Linux servers are virtualized and 99% of Windows servers are virtualized.

UPMC has placed on that hardware foundation its big data software system, which consists of:

  • An Informatica suite that extracts, transforms, and loads aggregated data from various source systems into the UPMC data warehouse.
  • An Oracle Healthcare Data Warehouse Foundation, which is deployed on Oracle’s Exadata database platform.
  • IBM’s Netezza database appliance, which provides access to data via a dimensional model tuned for use by UPMC’s various end-user business intelligence tools.

For business intelligence and data analysis, UPMC is utilizing products from IBM Cognos, IBM SPSS, Oracle and advanced statistical tools from SAS and the R Project.
In addition, UPMC is following a “bring your own tools” approach, permitting physicians and researchers to continue to leverage many of the existing BI tools already in use to obtain information from the data.

Big Data Takes On Big C

UPMC is already making inroads with big data in the area of breast cancer.

For cancer research, UPMC is using data from The Cancer Genome Atlas (TCGA) specific to several types of cancer. The organization is using Oracle’s Translational Research Center platform to analyze this data. 

By integrating structured clinical data and genomic data from a population of UPMC patients, researchers discovered molecular differences between the cancers in pre-menopausal and post-menopausal women.

More research will be necessary to fully understand these differences, but the results could eventually provide a roadmap for creating more targeted, personalized therapies.

Previously, this kind of analysis was difficult and time-consuming, if not impossible, McCallister says. “Now, with the integrated UPMC enterprise data warehouse, clinical and genomics data are made more easily accessible to enable big data analytics.”

Toward Better Cardiac Care

In addition, the ability to now analyze data that previously was difficult to access and aggregate due to a variety of unlinked databases and different data definitions is paying off in cardiac care.

By studying data coming from different areas (clinical, financial, and quality/outcome) for heart attack patients who had received cardiac stents, UPMC researchers found that one group of cardiologists was spending more money to care for heart attack patients and having better outcomes six months following the stent procedure. This group exhibited lower mortality rates and less need for additional procedures.

The patient populations -- those that received the more expensive care versus those that didn’t -- were similar by themselves, offering no clue to the reason for the difference in outcomes. But by drilling into the data, researchers found that the doctors with the better outcomes were using a more costly type of catheter to treat patients.

“Now we’re applying more sophisticated analyses to further identify which patients truly benefit from these devices,” McCallister says. “All of this is part of our effort to deliver high-quality, cost-effective care. As the health care industry moves from ‘volume to value,’ we think such big data efforts will be critical.”

UPMC has an integrated health care model, meaning that by combining the health care provider and insurer, their economic incentives are aligned to provide the best and most cost-effective care. "Our integrated model allows physicians to understand more about that person," McCallister points out.

Indeed, personalized medicine is yet another goal. UPMC is developing a personalized Cancer Medicine Information Management System to serve as a foundation for efforts to develop and deliver individualized cancer care. The idea is to bring together clinical, genomic, consumer, and financial data in a data warehouse to enable researchers and clinicians to test hypothesis through visualization and interactive tools. 

“We want to enable the best, safest patient experience possible,” sums up McCallister. “The patient is at the center of everything we do.”

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access