Google Research Shows How AI Can Make Ophthalmologists More Effective

As artificial intelligence continues to evolve, diagnosing disease faster and potentially with greater accuracy than physicians, some have suggested that technology may soon replace tasks that physicians currently perform. But a new study from the Google AI research group shows that physicians and algorithms working together are more effective than either alone. It’s one of the first studies to examine how AI can improve physicians’ diagnostic accuracy. The new research will be published in the April edition of Ophthalmology.

This study expands on previous work from Google AI showing that its algorithm works roughly as well as human experts in screening patients for a common diabetic eye disease called diabetic retinopathy. For their latest study, the researchers wanted to see if their algorithm could do more than simply diagnose disease. They wanted to create a new computer-assisted system that could “explain” the algorithm’s diagnosis. They found that this system not only improved the ophthalmologists’ diagnostic accuracy, but it also improved algorithm’s accuracy.

More than 29 million Americans have diabetes, and are at risk for diabetic retinopathy, a potentially blinding eye disease. People typically don’t notice changes in their vision in the disease’s early stages. But as it progresses, diabetic retinopathy usually causes vision loss that in many cases cannot be reversed. That’s why it’s so important that people with diabetes have yearly screenings.

Unfortunately, the accuracy of screenings can vary significantly. One study found a 49 percent error rate among internists, diabetologists, and medical residents.

Recent advances in AI promise to improve access to diabetic retinopathy screening and to improve its accuracy. But it’s less clear how AI will work in the physician’s office or other clinical settings. Previous attempts to use computer-assisted diagnosis shows that some screeners rely on the machine too much, which leads to repeating the machine’s errors, or under-rely on it and ignore accurate predictions. Researchers at Google AI believe some of these pitfalls may be avoided if the computer can “explain” its predictions.

To test this theory, the researchers developed two types of assistance to help physicians read the algorithm’s predictions.

Grades: A set of five scores that represent the strength of evidence for the algorithm’s prediction.
Grades + heatmap: Enhance the grading system with a heatmap that measures the contribution of each pixel in the image to the algorithm’s prediction.

Ten ophthalmologists (four general ophthalmologists, one trained outside the US, four retina specialists, and one retina specialist in training) were asked to read each image once under one of three conditions: unassisted, grades only, and grades + heatmap.

Both types of assistance improved physicians’ diagnostic accuracy. It also improved their confidence in the diagnosis. But the degree of improvement depended on the physician’s level of expertise.

Without assistance, general ophthalmologists are significantly less accurate than the algorithm, while retina specialists are not significantly more accurate than the algorithm. With assistance, general ophthalmologists match but do not exceed the model’s accuracy, while retina specialists start to exceed the model’s performance.

“What we found is that AI can do more than simply automate eye screening, it can assist physicians in more accurately diagnosing diabetic retinopathy,” said lead researcher, Rory Sayres, PhD.. “AI and physicians working together can be more accurate than either alone.”

Like medical technologies that preceded it, Sayres said that AI is another tool that will make the knowledge, skill, and judgment of physicians even more central to quality care.

“There’s an analogy in driving,” Sayres explained. “There are self-driving vehicles, and there are tools to help drivers, like Android Auto. The first is automation, the second is augmentation. The findings of our study indicate that there may be space for augmentation in classifying medical images like retinal fundus images. When the combination of clinician and assistant outperforms either alone, this provides an argument for up-leveling clinicians with intelligent tools.”