The University of Southern California has made available one of the largest open-source datasets of brain scans from stroke patients in a push to spur the development of machine learning to automatically process MRI images and identify lesions.

The Anatomical Tracings of Lesion After Stroke (ATLAS) dataset, which contains 304 manually segmented MRI scans that took more than 500 hours to create, is now available for download to researchers around the world.

“The unique thing is that we have manually traced the lesions on all of these brains—304 brains in total,” says Sook-Lei Liew, assistant professor with joint appointments at the USC Mark and Mary Stevens Neuroimaging and Informatics Institute, the Chan Division of Occupational Science and Occupational Therapy, the Division of Biokinesiology and Physical Therapy, and the USC Viterbi School of Engineering.

According to Liew, manually traced lesions are currently the gold standard for lesion segmentation on T1-weighted MRIs, but are time consuming and require neuroanatomy expertise. And while algorithms that employ machine-learning techniques hold promise for automating the process, it requires large training datasets to optimize performance, she contends.

Data is stored by the International Neuroimaging Data-Sharing Initiative (INDI), housed at the Child Mind Institute in New York, and by the Inter-University Consortium for Political and Social Research (ICPSR), housed at the University of Michigan. Stroke researchers who wish to access the data can download a normalized subset from INDI or the full dataset from ICPSR.

“They receive an encryption key so they can download the data online and then de-encrypt it,” notes Liew, who says the compiling, archiving and sharing of the ATLAS dataset is supported by the National Institutes of Health.

Also See: NIH releases data from adolescent brain development study

“The goal of ATLAS is to generate a dataset that machine learning and computer scientists could use to develop better automated algorithms to identify the lesions” and provide a standardized dataset for comparing the performance of different segmentation methods, she adds.

Ultimately, Liew says the aim is to identify biomarkers that can predict which stroke patients will respond to different rehabilitation therapies and to personalize their treatment plans.

“Even if people aren’t interested in stroke, it’s also an interesting dataset to train any sort of computer vision algorithm because it’s a challenging problem,” she concludes.

Liew and her colleagues describe the ATLAS dataset in an article published February 20 in the journal Scientific Data.

Going forward, she says that her team is working on creating a separate dataset that will be available for researchers to test their algorithms—not to train them. “In machine learning, you always need a training dataset and a testing dataset,” Liew observes.

Register or login for access to this item and much more

All Information Management content is archived after seven days.

Community members receive:
  • All recent and archived articles
  • Conference offers and updates
  • A full menu of enewsletter options
  • Web seminars, white papers, ebooks

Don't have an account? Register for Free Unlimited Access