MICCAI 2022 Challenge
Quality augmentation in diffusion MRI for clinical studies:
Validation in migraine
AND COMPLETE INSTRUCTIONS
Deep Learning (DL) techniques have been used in medical imaging to improve quality and generate new images from reduced medical imaging acquisitions. They have implied a true revolution in the medical field, with myriads of new applications rising every year. We cannot deny the excellent outcomes these applications produce, with high-quality images and compelling results. However, when applied to medical images, most of the validation of these techniques has been done visually and/or qualitatively, not necessarily adequately assessed in clinical studies. There is a key question that may affect many of the DL applications in medical studies: “are we losing relevant quantitative clinical information when generating high-quality images with artificial intelligence techniques?”. The question is related to the validity of traditional quality measures such as the Peak Signal-to- Noise Ratio (PSNR), Structural Similarity Index (SSIM) or Normalized Root Mean Squared Error (NRMSE), commonly used in medical image analysis. Strictly speaking, it is not enough that the images look alike as they must also preserve all the relevant clinical information.
In this challenge, we try to answer the question about the validity of reconstructed images in a real clinical study. To that end, we will focus on a real diffusion magnetic resonance imaging (MRI) study on migraine. Data were acquired for a clinical study carried out in a local hospital (Hospital Clinico Universitario, Valladolid, Spain) by a group of neurologists.
Migraine is a primary disabling disorder characterized by recurrent episodes of headache that usually last 4-72 hours. It is more widespread among young and middle-aged women. Despite the high prevalence of migraine, its pathophysiological mechanisms are not well known, and there are no biomarkers currently. Two types of migraine are currently distinguished: episodic migraine (EM) and chronic migraine (CM). This classification criterion is based exclusively on the number of headache days per month (15 or more days with headache per month for chronic migraine patients). The unique, relevant radiological findings in migraine are white matter hyperintensities observed through T2-weighted images, and their role is unclear. The advantage of migraine in a challenge like the present one is that MRI findings related to diffusion MRI are subtle compared to healthy controls, according to previous studies. In severe disorders such as Alzheimer’s disease or schizophrenia, it is relatively easy to find statistically significant results with classic methods (i.e. Diffusion Tensor Imaging, T1-, T2-weighted MR imaging), and thus it is challenging to appreciate techniques or parameters that can better define pathophysiological properties. There are some diffusion MRI studies assessing migraine. Diffusion Tensor Imaging (DTI) has been the most employed technique to evaluate microstructural properties with differences found between controls and migraine patients (MP) and between EM and CM patients for DTI-related scalars like fractional anisotropy (FA), mean diffusion (MD) and Axial Diffusivity (AD).
With these features in mind, the principal purpose of our challenge is to validate if those DTI-based parameters generated from low-quality data directly or DL-based augmented data are able to replicate the statistical findings that appear when using standard quality data (i.e., if part of the relevant quantitative information is missing). Thus, we can validate the usefulness of DL-based reconstruction techniques in real clinical studies.
We have selected the migraine problem due to the following reasons:
Currently, we are working with 160 volumes, all including a unique q-space coverage scheme that enables us to easily subsample the data by merely selecting appropriate 21 gradient directions out of 61 without the need of applying interpolation algorithms. With this database, we can control the preprocessing pipeline and compare the quantitative measures obtained from the DTI under a reduced acquisition scheme to those estimated from fully- sampled data.
diffusion-weighted MRI, diffusion tensor MRI, data augmentation, deep learning, migraine, neurology, neuroimaging
We plan to publish a paper in a top-tier journal (e.g. Medical Image Analysis, Neuroimage, or Neuroimage: Clinical) presenting the results from the Challenge and further recommendations in diffusion MRI data augmentation techniques for clinical studies.
Up to three members per team indicated by the team leader qualify as the paper's authors from the challenge. The top 6 ranked teams qualify as the authors following the criteria:
The angular resolution (i.e. the parameter that is proportional to the reciprocal number of diffusion sensitizing gradient directions) is one of the crucial design parameters used in a diffusion MRI experiment. Depending on the method employed to represent the diffusion MRI signal, a different number of gradient directions are required to fit the basis starting from six gradients in DTI to several dozens or hundreds in High Angular Resolution Diffusion Imaging (HARDI) techniques. In clinical studies, we are generally interested in optimizing the number of gradient directions to limit the acquisition duration and guarantee the patient's comfort during the examination. However, reducing the number of gradient directions may lead to a loss of subtle changes in angular characteristics of diffusion MRI data, which translates then to quantitative measures retrieved from a fitted model.
Although several angular diffusion MRI data augmentation techniques have been proposed, or even the MICCAI MUDI '2019 Challenge organized for diffusion-relaxation MRI signal prediction, the evaluation is mainly limited to numerical measures such as aforementioned the PSNR or SSIM indexes. In other words, no matter how powerful the algorithm is, primarily the center of interest of the authors remains in quantitative parameters reflecting the method's accuracy.
However, recent studies have suggested that decreasing the number of gradients leads to clinical information loss, and it becomes impossible to detect differences in various types of medical conditions. A reported key factor influencing the values of diffusion/DTI descriptors is the number of diffusion gradient orientations, which impacts the results of their statistical comparison between clinical groups.
The presented proposal is not a conventional challenge involving training and validation/testing sets that requests the participants for the best solution in terms of a numerical parameter showing the robustness of the methods. Instead, we share two datasets:
The task aims to estimate three DTI based parameters, namely the FA, MD and AD from the MP dataset acquired with 21 diffusion gradient directions at b=1000 s/mm2. However, before that, the participants angularly augment the diffusion MRI data from 21 to 61 gradient directions to provide the most faithful representation of the signal and consequently the quantitative parameters, including FA, MD and AD. The participants can evaluate their algorithm using the HC dataset in terms of any measure they think would reflect the algorithm's power, for instance, above-mentioned the NRMSE or SSIM.
The participants submit three volumes (FA, MD, AD) for 100 MP subjects. However, the evaluation procedure in the organizers' site is carried out in terms of a statistical test whether significant differences between CM and EM patients can be detected. In other words, instead of evaluating the methods using the PSNR or SSIM, our goal is to assess the clinical impact of the algorithms. With this challenge, we want to find the answer to the question if it is possible to show the same or a comparable level of significant differences between CM and EM diffusion MRI data acquired in a reduced scenario (i.e. 21 gradient directions) as one could find with a fully-sampled data obtained with 61 gradient directions.
The local Ethics Committee of Hospital Clínico Universitario de Valladolid (Valladolid, Spain) approved the study regarding the MRI acquisitions (PI: 14-197). All participants read and signed a written consent form prior to their participation.
External data is not allowed to be used in the training process.
Source: All the data is acquired at a Philips Achieva 3 T MRI unit (Philips Healthcare, Best, The Netherlands) equipped with a 32-channel head coil (Laboratorio de Técnicas Instrumentales, Universidad de Valladolid) with patients from Hospital Clínico Universitario (Valladolid, Spain).
Acquisition protocol: Diffusion-weighted imaging MRI data were acquired under a unified protocol with the parameters defined as follows: b=1000 s/mm2, 61 non-collinear diffusion-sensitizing gradient directions, one baseline acquisition at b=0, volume size of 2x2x2 mm3, matrix size 128x128, 66 axial slices that cover the whole brain, flip angle 90°, repetition time (TR) 9000 ms, and echo time (TE) 86 ms.
Two data sets will be released:
All datasets were anonymized.
How to subsample the 61 gradient directions:
For each subject, a folder with the following files is provided:
Each subject will be stored in a different folder with the following names:
Healthy controls were recruited by convenience sampling and snowball sampling. Controls with a history of migraine, other headache disorders different to infrequent tension-type-headache (less than one attack per month), or a history of other neurological or psychiatric disorders were excluded. Healthy controls were aged between 18 and 65 years. Additionally, a questionnaire was provided to the controls to assess whether they suffered from headaches with migraine features.
Migraine patients were recruited to a neurologist specialized in headache disorders at their first visit. Due to migraine, these patients had been referred to the Headache Unit at the Hospital Clínico Universitario de Valladolid (Valladolid, Spain). Patients were included after a definite diagnosis of episodic migraine or chronic migraine according to the third edition of the International Classification of Headache Disorders (ICHD-3) [7, 8], version ICHD-3 beta for the first patients, and version ICHD-3 for the last patients (the ICHD-3 was updated during the recruitment period). No methods or tests different to the anamnesis were applied to diagnose migraine, considering that there are no actual migraine biomarkers and that the current diagnosis of migraine is based exclusively on clinical symptoms.
In order to avoid any influence of the preprocessing steps in the results of the challenge, all the data volumes underwent the same preprocessing pipeline, i.e. denoising using the MP-PCA approach , corrections of eddy currents and motion artifacts , and B1 field inhomogeneity , all done with the MRtrix3 software . Specifically:
MRtrix, release 3.0 (version available in November 2017), was used for diffusion preprocessing and brain mask extraction. It is worth noting that some commands shown in this section have changed for the latest version of MRtrix. The following steps were carried out:
The commands employed for these steps were:
dwidenoise dwi_original.nii.gz dwi_den.nii.gz
dwipreproc -fslgrad dwi_original.bvec dwi.bval -export_grad_fsl dwi.bvec dwi_updated.bval -rpe_none PA dwi_den.nii.gz dwi_corr.nii.gz
dwibiascorrect dwi_corr.nii.gz dwi.nii.gz -fsl -fslgrad dwi.bvec dwi_updated.bval
The brain mask was extracted with the following command:
dwi2mask dwi.nii.gz dwi_mask.nii.gz -fslgrad dwi.bvec dwi_updated.bval
For the analysis carried out in this challenge, only three DTI-derived metrics are considered:
These metrics were selected for being the ones detecting significant differences in the preliminary clinical study with migraine patients.
For training, the participants are asked to obtain the three DTI-based parameters using the following FSL command (inside the folder of each subject):
dtifit -k dwi.nii.gz -o dti_files/dwi -m dwi_mask.nii.gz -r dwi.bvec -b dwi.bval
With this command, a set of compressed nifti files are stored in the folder specified by the user. Following the notation employed in this example, the files that would be assessed in relation to this challenge are dwi_FA.nii.gz, dwi_L1.nii.gz (AD or first eigenvalue), and dwi_MD.nii.gz , all of them stored in subject_path/dti_files/.
Note that we will also use this command to calculate the metrics for the 21 and 61 gradient volumes of the MP. These values will be used for the clinical studies that will be used for references. So we recommend using the same command to calculate the final metrics.
For the evaluation of the results, all datasets will be non-linearly registered to the common template (i.e. FMRIB58_FA template being a high-resolution FA parameter averaged over 58 subjects) using the FSL FNIRT tool  in Montreal Neurological Institute (MNI) space.
The FNIRT tool uses a b-spline representation of the registration warp field. After the registration, a mean FA image was generated and thinned to create a mean FA skeleton of white matter tracts using a FA value of 0.2 as a threshold to distinguish white from gray matter. Then, each subject’s aligned FA images were projected onto the mean FA skeleton. Similarly, the same process was repeated for MD and AD using the protocol devoted to non-FA images. The Johns Hopkins University ICBM-DTI-81 White-Matter Labels Atlas provided in the FSL toolbox was used to identify the white matter tracts. We executed group-wise comparisons of CM vs EM.
DTI-BASED MEASURES ESTIMATION FROM REDUCED ACQUISITIONS
Participants are expected to estimate three DTI-based parameters (FA, MD and AD) from the migraine dataset acquired with 21 diffusion gradient directions at b=1000 s/mm2, but with a quality similar to the parameters estimated from 61 gradient directions. To that end:
he data submitted by each group must follow the next requirements:
A method to submit the data will be provided in the following months.
The purpose of the challenge is to validate methods in a real clinical study. So, the results of the challenge will be evaluated using the aggregation of two metrics, based on tools typically used for clinical studies with DTI data:
1.- Skeleton Metric: The quality of the results is measured based on a statistical test carried out with FSL TBSS:
FP - false positives of the submission, defined as those points with p < 0.05 in the participant’s study and p > 0.05 in the original study,
FN - false negatives of the submission, defined as those points with p > 0.05 in the participant’s study and p < 0.05 in the original study,
NP - number of points where the evaluation takes place.
Since three metrics will be obtained (one for each measure: FA, MD and AD), the final skeleton metric will be the average of the three.
2.- Region-of-interest metric: the second metric is based on a region-oriented analysis.
Since three metrics will be obtained (one for each measure: FA, MD and AD), the final skeleton metric will be the average of the three.
The purpose of the challenge is to measure how good the DL reconstructed images mimic the original ones in a clinical study. The best method is that the results produced by the study with the reconstructed data show statistical differences in exactly the same areas of the brain in which the original data does.
To that end, the metrics proposed are based on the following considerations:
On the other hand, it is vital to measure the false positive ratio. It will give an idea of the reliability of the method.
We decided to use two different metrics, one skeleton oriented and one ROI oriented, in order to replicate the usual studies typically carried out with diffusion MRI data in clinical environments.
Both metrics will be used to compute the ranking following the formula:
This metric gives the number between 0 and 1. The higher the value of the general metric, the better solution. In the case of two groups obtaining the same General metric, the following procedure will be followed:
Laboratorio de Procesado de Imagen
Campus Miguel Delibes
Universidad de Valladolid
47011, Valladolid, Spain
created withWebsite Builder Software .