Previous studies have shown that medical algorithms have caused biases in care delivery, and that image algorithms may perform unequally for different demographic groups. In 2019, a widely used algorithm for prioritizing care for the sickest patients was found to disadvantage Black people. In 2020, researchers at the University of Toronto and MIT showed that algorithms trained to flag conditions such as pneumonia on chest x-rays sometimes performed differently for people of different sexes, ages, races, and types of medical insurance.
Paul Yi, director of the University of Maryland’s Intelligent Imaging Center, who was not involved in the new study showing algorithms can detect race, describes some of its findings as “eye opening,” even “crazy.”
Radiologists like him don’t typically think about race when interpreting scans, or even know how a patient self-identifies. “Race is a social construct and not in itself a biological phenotype, even though it can be associated with differences in anatomy,” Yi says.
Frustratingly, the authors of the new study could not figure out how exactly their models could so accurately detect a patient’s self-reported race. They say that will likely make it harder to pick up biases in such algorithms.
Follow-on experiments showed that the algorithms were not making predictions based on particular patches of anatomy, or visual features that might be associated with race due to social and environmental factors such as body mass index or bone density. Nor did age, sex, or specific diagnoses that are associated with certain demographic groups appear to be functioning as clues.
The fact that algorithms trained on images from a hospital in one part of the US could accurately identify race in images from institutions in other regions appears to rule out the possibility that the software is picking up on factors unrelated to a patient’s body, says Yi, such as differences in imaging equipment or processes.
Whatever the algorithms were seeing, they saw it clearly. The software could still predict patient race with high accuracy when x-rays were degraded so that they were unreadable to even a trained eye, or blurred to remove fine detail.
Luke Oakden-Rayner, a coauthor on the new study and director of medical imaging research at Royal Adelaide Hospital, Australia, calls the AI ability the collaborators uncovered “the worst superpower.” He says that despite the unknown mechanism, it demands an immediate response from people developing or selling AI systems to analyze medical scans.
A database of AI algorithms maintained by the American College of Radiology lists dozens for analyzing chest imagery that have been approved by the Food and Drug Administration. Many were developed using standard data sets used in the new study that trained algorithms to predict race. Although the FDA recommends that companies measure and report performance on different demographic groups, such data is rarely released.
Oakden-Rayner says that such checks and disclosures should become standard. “Commercial models can almost certainly identify the race of patients, so companies need to ensure that their models are not utilizing that information to produce unequal outcomes,” he says.
Yi agrees, saying the study is a reminder that while machine-learning algorithms can help human experts with practical problems in the clinic, they work differently than people. “Neural networks are sort of like savants, they’re singularly efficient at one task,” he says. “If you train a model to detect pneumonia, it’s going to find one way or another to get that correct answer, leveraging whatever it can find in the data.”
More Great WIRED Stories