Researchers urge caution in selecting, using algorithms to support analytics

Register now

A new study of predictive analytics in healthcare asks a provocative question: How can we know it works?

Current interest in predictive analytics for improving healthcare is reflected by a surge in long-term investment in developing new technologies using artificial intelligence and machine learning to forecast future events to improve individuals’ health, contends author Ben Van Calster and colleagues in the Department of Biomedical Data Sciences at Leiden University Medical Center in the Netherlands.

Predictive algorithms help identify individuals at increased likelihood of disease the researchers note. In the era of personalized medicine, predictive algorithms are used to make clinical management decisions based on individual patient characteristics and to counsel patients, and there is no sign that the flood of new algorithms will slow with medical imaging data routinely collected, along with electronic health records and national registry data.

Also See: NIST studies algorithms to protect data from advanced computing

The researchers acknowledge that scientists are making efforts to improve data sharing, increasing study registration beyond clinical trials and making reporting transparent. What’s needed now is a discussion in the healthcare industry of the importance of transparency in the context of medical predictive analytics.

Before recommending a predictive algorithm for clinical practice, it is important to know whether and for whom it works well, the authors say. To start, predictions should discriminate between individuals with and without the disease, such as higher predictions in those with the disease compared to those without the disease.

Algorithm development may suffer from over-fitting, resulting in poorer discrimination and calibration when evaluated on new data. “Although the clinical literature tends to focus on discrimination, calibration is clearly crucial,” the authors caution. “Inaccurate risk predictions can lead to inappropriate decisions or expectations, even when the discrimination is good.”

The authors also have concerns about machine learning algorithms. “Machine learning methods are becoming increasingly popular to develop predictive algorithms. The architecture of these algorithms is often too complex to fully disentangle and report the relation between a set of predictors and the outcome. This is the commonly addressed problem when discussing transparency of predictive analytics based on machine learning. We argue that algorithm availability is at least as important.”

Author Ben Van Calster and colleagues also address the commercialization of proprietary algorithms. Developers may choose not to disclose an algorithm and offer it on a fee-for-service basis. For example, a biomarker-based algorithm to diagnose ovarian cancer may have a cost of $897 per patient.

“Assume we want to validate this algorithm in a center that has 20 percent malignancies in the target population,” they explain. “If we want to recruit at least 100 patients in each outcome group, following current recommendations for validation studies, the study needs at least 500 patients.”

This implies a minimum cost of $448,500 to obtain useful information about whether this algorithm works in this particular center. “It is important to emphasize this is just the cost required to judge whether the algorithm has any validity in this setting,” they caution. “There is no guarantee that it will be clinically useful.”

The authors conclude on a somber note. “We believe selling predictions from an undisclosed algorithm is unethical. This article does not touch on legal consequences of using predictive algorithms, where issues such as algorithm availability or black-box predictions cannot be easily ignored. When journals consider manuscripts introducing a predictive algorithm, its availability should be a minimum requirement before acceptance. Clinical guideline documents should focus on publicly available algorithms that have been independently validated.”

The complete study is available in the July issue of the Journal of the American Medical Informatics Association.

For reprint and licensing requests for this article, click here.