Even medical scientists have trouble with logic

Register now

(Bloomberg View) -- Statistics professor Sander Greenland had just finished lecturing a group of students and doctors at Harvard Medical School about errors researchers make when interpreting evidence. Chatting with me in the hallway afterwards, he brought up an unlikely topic -- a celebrity-driven campaign called the “thimerosal challenge.” As I quickly learned, actor Robert De Niro and lawyer Robert Kennedy Jr. had offered $100,000 to anyone who could demonstrate the safety of this mercury-containing preservative used in flu vaccines.

Greenland mentioned this not to shame non-scientists for meddling in medical affairs. He wanted to point out the errors of logic and reasoning on the side of the professionals reacting to the challenge -- the people who should know better.

The challenge itself is based on a misunderstanding. Medical researchers don’t prove drugs or additives are absolutely safe; rather, they try to establish a reasonable risk-benefit ratio. But scientists, he said, often wrongly interpret their data to make unjustifiably absolute claims -- a logical leap he diagnosed in his lecture as “dichotomania.” He made a case that at the heart of many of the mathematical and statistical errors in medical research are logical errors -- false dichotomies, unjustified assumptions, and the failure to distinguish between evidence of absence and absence of evidence.

Doctors and science journalists often reassure people that “there’s no evidence” a given treatment is dangerous. This statement “makes it sound like you have something you’ve observed regarding safety,” he said. But the statement might mean only that there’s very little evidence pointing one way or the other.

That same logical error shows up in the way scientists sometimes misuse the concept known as statistical significance. “Statistical significance” is a great marketing tool because it sounds like a mathematical seal of approval. But statisticians complain that many scientists don’t understand what it really means.

Computing statistical significance is useful for helping scientists avoid being fooled by randomness, since people’s behavior, performance on tests, experience of headaches or even deaths vary in an unpredictable way that may have nothing to do with whatever intervention drug or food is being studied. Statistical significance is usually expressed as number between 0 and 1 known as a p-value, with a lower value indicating greater statistical significance.

People wrongly think the p-value is equivalent to the odds that the effect they’re testing does not exist, said Greenland. But it’s really something more subtle: a statement about how unlikely it is to get a certain set of data, assuming that the effect they’re studying doesn’t exist.

Let’s say a drug is tested among a group of 200 patients for five years, and 20 of them die. Does that mean your drug is a killer? The p-value can’t tell you, but it offers a clue. It indicates how likely it would be for at least 20 people to die in a group of that size over the same time period, assuming the drug had no effect. The smaller the p-value, the less likely the deaths and the bigger the red flags.

While statistical significance is a continuum, medical research by convention has turned it into a yes-or-no question: Journals have informally decided that results should be considered statistically significant only if the p-value is 5 percent or lower. In the above example, this would correspond to a 5 percent chance of all those deaths occurring assuming the drug posed no danger. But what if a study gave a p-value of 6 percent? Greenland’s concern is that scientists might wrongly interpret that result to mean the deaths weren’t worth noting.

He’s not alone in his concern. Last year, the blog Retraction Watch claimed “We’re using a common statistical test all wrong.” Researchers from psychology, economics and biomedical research are now reconsidering a rash of dubious claims whose apparent statistical significance evaporated when others tried similar experiments.

But so far most of the focus has been on false positives –- researchers overselling low p-values as proof their findings are real. Greenland worries about false negatives –- overselling a high p-value to declare there’s no danger to an intervention. Results can be statistically significant and turn out to be wrong, or statistically insignificant and right.

To use a real-life example from Greenland’s talk, a 2013 study found a statistically significant association between statins, which are a class of drug used to lower cholesterol, and a type of cancer called glioma. In this case, it was a positive side effect -- people taking statins had fewer cases of glioma than those in a control group. A study attempting to replicate the 2013 finding showed the same association as well, but it wasn’t statistically significant -- the p-value was higher.

The authors of the second study claimed they had refuted the first one, Greenland said, but the opposite was actually true. The second study bolstered the first, since researchers had once again observed the same effect.

This comes back to Greenland’s “dichotomania,” which has made it difficult to regain public trust regarding the safety of vaccines. The best medical researchers can do is consider risk-benefit ratios, as another statistician, Steven Goodman, has explained regarding the MMR vaccine. There have been dozens of studies -- enough to show that the risks associated with the MMR vaccine are far lower than the risks unvaccinated children face from the diseases.

Greenland says Kennedy and De Niro should indeed be asking for a risk-benefit ratio for thimerosal, which in the U.S. was removed from childhood vaccines but is still used in flu shots. Their critics have erred, too, by simply pointing out that people still die from the flu. That doesn’t speak to the risks of taking away the preservative, since flu vaccines can still be made without it.

Those on the side of science probably can’t win the $100,000 no matter what, since the Hollywood side failed to pose the right question. But those who believe in the value of thimerosal should still do the correct risk-benefit calculations. It’s good practice, it will engender public trust, and people’s lives are at stake.

This column does not necessarily reflect the opinion of the editorial board or Bloomberg LP and its owners.

(About the author: Faye Flam is a Bloomberg View columnist. She was a staff writer for Science magazine and a columnist for the Philadelphia Inquirer, and she is the author of “The Score: How the Quest for Sex Has Shaped the Modern Man.”)

For reprint and licensing requests for this article, click here.