Finding new actionable insights in old data research

There is a common problem often associated with managing data across scientific disciplines. As the stock of information rapidly grows through scientific discoveries, a major data management challenge emerges as data professionals try to tap prior research findings.

Current methods to aggregate quantitative findings (meta-analysis) have limitations. They assume that prior studies share similar designs and substantive factors. They rarely do.

Take for example studies estimating basal metabolic rate – the measure of human energy expenditure. Study results can have important implications for understanding human metabolism and developing obesity and malnutrition interventions.

Over 47 studies have estimated BMR. But these calculations are based on different body measures, such as fat mass, weight, age, and height – to name a few. How do we combine those studies into a single equation to get usable insights?

To address this issue, my colleagues and I designed a new method for aggregating prior work into a meta model, called “generalized model aggregation” (GMA). Building on advances in data analytics and computational power this method enables one to combine previous studies, even when they have heterogeneous designs and measures.

We used the BMR problem as an empirical case to apply GMA. Using only the models available from the literature, we estimated a new model that takes into account all the different body measures considered in prior studies for estimating GMA. Then, on a separate dataset, we compared our equation’s predictive power against older equations as well as state-of-the-art equations used by the World Health Organization and Institute of Medicine.

Our equation outperformed all other equations available, including the more recent ones.

Besides providing theoretical proofs, this application and multiple others on synthetic data illustrate the potential value of this method.

The beauty is that we estimated the equation without any new data collection. We didn’t even need the raw data from those prior studies. Utilizing only the published prior research, we were able to extract an equation that works better than all of those prior studies.

With an increasing amount of research globally, we need better data management methods that build on advances in data analytics to quantitatively combine, contrast, and build on prior work. This is evident in the exponential growth of meta-analysis papers in the last decade. Across nine major databases, we found that papers with the term “meta-analysis” in the title have experienced over 25-fold growth in that time.

A broader method for quantitative aggregation of prior research like GMA could be enormously beneficial in several areas. Here are just a few examples:

Environmental Science – Over 125 studies have analyzed the impact of the pesticide Atrazine on freshwater vertebrates, but no quantitative conclusion can be drawn without a method to combine them.

Climate change - A meta-regression study combined 60 prior estimates of the impact of climate change on human violence. However, its findings were questioned because the method mixes different variables and does not include cross-study correlations.

Occupational health - Studies have estimated the effectiveness of workplace return-to-work interventions after injury or illness. Yet the variability in study design and methods have prevented aggregation of the findings.

Energy - Multiple methods exist to estimate diffuse solar energy in a location using data from distant sensors, but there is no model that aggregates these methods into a single estimating equation.

Marketing - Is it better to give consumers a choice of 10 brands of soup or 20? Choice overload studies provide conflicting results. Some say that fewer options lead to higher customer satisfaction, while other studies dispute the importance of choice overload. A single model that combines all the variables of prior studies could resolve this issue.

For those problems that have been investigated by multiple researchers, but no clear consensus has emerged, this method offers a quantitative way to resolve the uncertainty.

By enabling more complex meta-analyses, GMA could also allow researchers to leverage previous findings to compare alternative theories and advance new models in diverse domains.

For reprint and licensing requests for this article, click here.