United Healthcare is big into big data. The nation’s largest health insurer is using big data and advanced analytics for financial analysis, fraud and waste monitoring, cost management, pharmacy benefit management, clinical improvements and more.
“The data is so interrelated across business groups, I would be hard-pressed to find an area that is not directly or indirectly touched by these initiatives,” says Ravi Shanbhag, director of data science, solutions and strategy at United Healthcare.
And this deep infusion of big data is providing the insurer with a bonus. It’s making the organization not just smarter, but more nimble able to respond quickly with the right data tools for the right job.
And for an organization the size of United Healthcare that’s impressive. The insurer and its Optum health services unit --which provides health management, care delivery and healthcare technology services -- together serve more than 85 million members across the U.S. and some 125 other countries. The company offers a range of health benefit programs for individuals, employers, military service members, retirees and their families, and contracts directly with more than 800,000 physicians and care professionals and some 6,000 hospitals and other care facilities nationwide.
Like many large enterprises, United Healthcare has a variety of relational data stores that serve as the backbone of its operational processes, Shanbhag says.
“Typically, these tend to be coupled with business intelligence [BI] and traditional reporting tools to serve up static and semi-customizable end-user analytics,” Shanbhag says. “We also rely heavily on several SAS technologies ranging from data integration, OLAP [online analytical processing] and data mining capabilities. The key driver for using SAS usually tends to be the seamless integration that SAS provides to all our data assets; the ability to push the query processing from SAS into these data stores and, of course, the statistical analysis and modeling that [users] can perform with the results.”
A growing volume of the company’s data is being processed inside a Hadoop big data framework and No-SQL distributed database technologies such as HBase, which provides fault-tolerant, low-latency access to data in Hadoop. “We are realizing a need to orchestrate hybrid data pipelines that blend several technologies together, where the right tool gets used for the right kind of workload,” Shanbhag says.
For instance, a particular data flow might necessitate structured claims data being layered with unstructured clinical information and then being micro-batched several times a day into Hadoop. The big data framework can then process both types of data in the same environment. “Moving data around is one of the costliest operations as it involves a lot of I/O,” says Shanbhag, “so being able to process different kinds of data together without having to hop around is key to our utilization of big data technologies like Hadoop.
“Hadoop is not just a framework but an entire ecosystem of tools that support different data processing patterns ranging from batch, micro-batch, streaming, real-time, graphing etc.,” Shanbhag says.
Data might get processed partly using Hadoop tools such as Pig, a high-level scripting platform for creating MapReduce programs that can be processed across distributed servers, and Hive, a warehousing infrastructure built on top of Hadoop for data analysis. Python and R—open source programming languages for data analysis—are used to build custom scripts that process the data natively in Hadoop, Shanbhag says.
“The goal is to be able to use the right technology and processing pattern for the right task, and these custom scripts and hybrid pipelines allow us to do just that,” Shanbhag says. “That way we don’t end up having a one-size-fits-all approach to solving our analytic problems.”
From here, “the machine learning can happen either in Hadoop or in SAS, which a lot of end users and statisticians prefer,” Shanbhag says. Machine learning and predictive modeling allows the company to systematically assess patterns from its data assets without the need to explicitly create pre-programmed, rule-based assessments, which can be time-consuming to maintain.
“You again have more choices depending on the visualizations you want to draw or if further explorations are needed. SAS Visual Analytics, Tableau Visualization, other BI tools or even custom applications can be built to serve up the customer-facing modules,” he says.
Among the main business drivers for adopting big data technologies was the ability to both manage and leverage the organization’s growing data stores, which now stand at several hundred terabytes of structured and unstructured data. By managing these assets better, the company was looking to achieve operational efficiencies and better insights from its data as close to real-time as possible.
“It really arose from an enterprise-wide vision for simplification of our data landscape, refreshing warehouse architectures and a need to be able to deliver insights into our data before it has aged,” Shanbhag says. “Weve been working with big data technologies for several years now, and I would like to think that we have progressed quite well over a reasonably short period of time. We have grown from being able to absorb structured and unstructured data with ease to [being] able to stream data real-time while using sophisticated modeling and visualization techniques to serve up analyses across all levels of management.”
Even though United Healthcare is a large organization, “the ultimate goal is to stay nimble, be able to rapidly prototype new concepts and continue to innovate aggressively, as we believe that will be a key driving force behind our ability to serve our members well each time, every time,” Shanbhag says.
In one example of an innovative effort, United Healthcare is exploring use cases where social network data as well as unstructured data from its internal applications can be processed through several text mining models, to derive “a signal from all the noise and hopefully be able to tailor better products and address the customer’s needs promptly,” Shanbhag says.
United Healthcare has realized other big data benefits as well, including the ability to ingest and analyze huge amounts of data without disrupting normal systems operations.
Because Hadoop uses lower cost commodity hardware and is fault-tolerant at scale, it allows United Healthcare to horizontally add more storage and computational horsepower as its data grows, says Shanbhag. “On top of that the fact that it can, indeed, process that expanding data universe effectively is what gives us this ability to continue to ingest more data without holding up anything upstream and downstream,” he says.
“Imagine that the data landscape at any company is an ecosystem with several constituents, some of which are producers’ of data while others are refineries’ of data and the rest are consumers’ of data,” Shanbhag says. “External or internal events can cause such an ecosystem to be prone to shocks or ripples that permeate through the topology. For example, a producer might create a huge wave of new data that the refiners struggle to process and that sets of a cascading effect, where the consumers don’t get what they want.”
United Healthcare’s plan: Using big data as a “shock absorber” within this ecosystem, which can eliminate or reduce the ripple effect of any upstream or downstream changes. “So if the producers generate more data, the refiners can rely on big data technologies to bear the brunt of shock while they ramp up their throughput,” Shanbhag says.
“We do not hold up the upstream producers [such as systems, warehouses and processes that are generating data for other consumer] of our data just because we are not ready to analyze it,” Shanbhag says. “Instead, we are in a position where we can adapt much quicker to new data while the business problem is still relevant. It also helps us avoid several cost-prohibitive and time-prohibitive warehousing changes.”
Meeting the Challenge
The big data/analytics efforts have not come without challenges.
“Healthcare data is complex and requires a lot of domain expertise,” such as consumer experience management, bioinformatics, medical treatment programs, pharmacy benefit management, network management, regulatory frameworks and financial analyses, Shanbhag says. Data processing patterns that might apply to other industries such as manufacturing or retail might not apply out of the box in highly-regulated environments, he says, “because domain expertise is one of the key components of being able to convert data into information. Knowing the right questions to ask is sometimes as important as having the right answers. Hiring data scientists with the right mix of technology, a healthcare and statistical background is one of the biggest challenges.”
In addition, keeping up with the constantly evolving distributed data processing structures is important, too. “Data scientists need to continue to learn new techniques, architectures and design patterns,” he says.
To address these challenges, the company continues to experiment and “aggressively try new mixes of technical componentry until something gives us the desired results,” Shanbhag says. “I cannot overstate how important it is to incubate new ideas and [let] smart, passionate people with the right tools work away at them. Great things happen when we have the right mix of talent, tools and tenacity.”
Clearly, big data and analytics will retain their importance at United Healthcare in the coming months and years. “Our focus on staying a customer-centric enterprise and several aspects of the Affordable Care Act require us to have a 360-degree view of the member,” Shanbhag says. “We definitely have the wherewithal to do that, considering all the touchpoints we have with our members and the investments we have made in both skills and technology across the enterprise.”
The company will continue to refine the initiatives it has underway to use advanced analytics tools across the board for processing data from NoSQL technologies, Hadoop and structured warehouses seamlessly.
“We are one of the most diversified healthcare companies in the world, serving 85 million individuals worldwide,” Shanbhag says. “So we have data that touches every aspect of the healthcare industry: member, claims, hospital, provider, clinical, operational, financial etc., and at a scale that is unparalleled within the healthcare industry. I would say that analytics is at the core of everything we do. It helps us understand our data better, get to root-causes of problems quickly, build innovative solutions and actively adapt to the changing healthcare landscape.”