An interdisciplinary team at Duke University has developed a proof-of-concept machine learning model capable of detecting symptomatic Alzheimer disease using multimodal retinal imaging data.
A common neurodegenerative disorder, Alzheimer disease (AD) is typically diagnosed symptomatically because of the high cost and invasiveness of alternative methods like positron emission tomography scans and lumbar punctures. With the identification of microvascular and neurosensory structural alterations in the retinas of individuals with AD, non-invasive retinal imaging may be an adjunctive or even an alternative means of detecting AD early.1
To test this possibility, the Eye Multimodal Imaging in Neurodegenerative Disease (iMIND) lab at Duke University applied modern machine learning techniques to retinal images and clinical patient data, as shared in their recent publication in the British Journal of Ophthalmology.2 The model was able to distinguish those with symptomatic AD from cognitively normal controls, serving as a proof of concept for using the retina as a “window to the brain” and exhibiting promise for artificial intelligence (AI) as a powerful tool in the diagnosis of AD.
For this study, the iMIND team collected and used the following types of retinal images: ganglion cell-inner plexiform layer (GC-IPL) thickness colour maps from optical coherence tomography (OCT), superficial capillary plexus en face 3 × 3 mm and 6 × 6 mm OCT angiography (OCTA) images centred on the fovea, ultra-widefield (UWF) scanning laser ophthalmoscopy (SLO) colour fundus photos, and UWF SLO fundus autofluorescence images.
In addition to the raw images, subjects’ age, sex and visual acuity as well as quantitative metrics of retinal microvasculature and structure from the OCT and OCTA scans were included.
The ground truth (AD or cognitively normal) was determined by a team of expert neurologists; participants with underlying conditions, which could bias the detection accuracy, were excluded from the study.
The images were manually assessed for quality prior to adding to the input dataset. Poor-quality images were excluded and the UWF images were cropped to a standardised region of interest to avoid or remove artifacts, yielding high-fidelity data.
When imaging data are the primary input for a machine learning model, one of the most powerful and commonly used subtypes of machine learning is a convolutional neural network (CNN), which employs hierarchical learning to assimilate the complexities of images and spatially inconsistent features.
CNNs decompose image inputs into more basic visual features such as lines and edges, which can themselves combine to form shapes that in turn can combine to form groups of shapes. Each of these steps is referred to as a “layer” of the CNN.
By simplifying a complex problem into several simple problems, the model can efficiently learn the proper way to extract the useful features and biomarkers from images.
Training, validation, testing
With a dataset of 222 eyes from cognitively normal patients and 62 eyes from AD patients, the team divided the data into training, validation and testing sets for the model.
The training set included 34 AD and 152 cognitively normal eyes (18.3% AD) whereas the validation set included 6 AD and 24 cognitively normal eyes (20.0% AD).
The test set included 22 AD and 46 cognitively normal eyes (32.3% AD) that were accrued a year after the training and validation images by different imagers using the same imaging machines. A model should be capable of accurate predictions using images collected by photographers different from those who supplied its training dataset.
The model was initially trained on the training set and had its parameters adjusted by feedback from the area under the receiver operating characteristic (AUC) and accuracy metrics when tested against the validation set.
After adjustments were finalised, the model was trained on both the training and validation data and tested against the test set (data not seen during any part of the training or parameter adjustment process). Eleven models differing in their combinations of inputs were trained and tested, and their scores reported with the AUC metric.
The model using GC-IPL maps, quantitative OCT and OCTA data, and patient data achieved the best-performing peak average AUCs of 0.861 (95% confidence interval [CI] 0.727, 0.995) on the validation set and 0.841 (95% CI 0.739, 0.943) on the test set.
This performance was impressive, especially when considering it had been trained on a small dataset. Increasing the size of the dataset could yield even better performance for predicting AD using retinal images and related metrics.
Successes and limitations
The CNN model made significantly accurate predictions for the presence of symptomatic AD via retinal imaging and clinical data, providing an excellent proof of concept. The project illuminated the capabilities of using the retina to detect AD, showing promise for AI detection that, if developed, could serve as a non-invasive means of screening and expand patient access to care.
Although the data were curated meticulously and filtered for quality and ocular artifacts, the dataset was relatively small and did not assess racial diversity.
The AD model was limited to data collected from AD patients enrolled from the Duke Memory Disorders Clinic and cognitively normal participants from the Duke Alzheimer Disease Research Center or community volunteers aged 50 years or older. As a result, 284 eyes were used: 222 from cognitively normal participants and 62 from participants with symptomatic AD.
Having significantly more distinct data points would further improve the designed model’s ability to generalise and make more accurate predictions. Understandably, the high-quality dataset was small but leaves room for improvement in future models. The struggle of any machine learning venture is robust data, and image and data inputs from diverse populations will need to be incorporated in future AD detection efforts.
The iMIND team took several measures to ensure that a primary challenge in machine learning, overfitting, was avoided.
Overfitting is when a model learns to associate ungeneralisable details with the output. More specifically, when the model associates a participant’s images, quantitative and patient data with the output rather than learning what features of the inputs across the entire dataset accurately contribute to the output.
Instead of creating a CNN feature extractor for each imaging modality, a modified, parsed ResNet18 was used, reducing the trainable weights to one-third of what they could have been. The team’s model was especially prone to overfitting because of the small dataset, which could cause the model to memorise the images associated with a certain output as opposed to learning the features of the inputs predictive of the output (AD or not AD).
Having fewer parameters reduces the possibility of overfitting when paired with techniques such as regularisation (penalises the learning of complex, noisy features), which the team employed. More data would enable the exploration of a larger model to assess whether a slight increase in complexity would improve overall accuracy in detecting AD without the issue of rapid overfitting on small datasets.
Future steps for this CNN model would be to acquire a larger, more diverse population to ensure a greater density of sample points for the model to learn from, improving the model’s generalisability. Increasing the dataset with more high-quality images would yield the highest return on prediction accuracy and would additionally offer the ability to assess the benefits of an augmented model size.
Future efforts should focus on AD detection in patients with underlying conditions that also impact retinal imaging, such as diabetes and glaucoma, as well as using different imaging formats because image standardisation is still evolving across imaging platforms in ophthalmology.
Notably, the automation of quality assessment procedures is a crucial task to facilitating future endeavours, as manually determining the quality of images and filtering for high-quality images are time- and resource-intensive tasks.
Duke University’s interdisciplinary team at iMIND lab established the potential of machine learning to detect symptomatic AD and the ability to use retinal imaging paired with numerical and categorical data for the task. The effort yielded an important proof-of-concept model, expounding promise for AI application in this exciting field.