Malignant tumors of the head and neck (HN) region include a diverse group of cancers in the oral cavity, nasopharynx, oropharynx, hypopharynx, larynx, and paranasal sinuses; although salivary and thyroid carcinomas are also located within the HN region, they are typically thought of as separate tumors (1). HN tumors are heterogeneous with complex anatomy ranging between oral cavity to hypopharynx (2, 3). Accurate detection and delineation of tumor extent is critical to optimize treatment planning; patients therefore routinely undergo noninvasive imaging for careful assessment of this complex anatomy by an experienced neuroradiologist (4). Noninvasive magnetic resonance imaging (MRI) has served an important role as a diagnostic test for initial staging and follow-up of tumors in the HN region (5–8).
The quantitative MRI (qMRI) technique, diffusion-weighted imaging (DWI), assesses the Brownian motion of water molecules at a cellular level (9). Apparent diffusion coefficient (ADC), derived by fitting DWI data to a monoexponential model using ≥2 b-values (ie, diffusion-weighting factor), reflects tumor cellularity (10, 11). Repeatability of ADC has been tested in both phantoms and solid tumors (12–15). In previous studies, ADC has exhibited promise as a quantitative imaging biomarker (QIB) of treatment response in HN cancer (16–20). The use of ADC is helpful in differentiation between malignant and benign solitary thyroid nodules and assessing tumor aggressiveness in papillary thyroid cancer (PTC) (21, 22).
Recent literature reflects interest in acquisition of DWI data using multiple b-values, which allows the measurement of both water diffusion for higher b-values (>200 s/mm2) and vascular perfusion fraction at lower b-values separately without contrast agent injection (23, 24). Le Bihan et al. developed a biexponential model using multiple b-value DWI data and termed it “intra-voxel incoherent motion” (IVIM) (25, 26), which has shown utility for the assessment of treatment response in various cancers, including HN cancer (27, 28). Test–retest studies using IVIM-DWI metrics in normal liver and metastases have a tendency towards better repeatability of measurement of true diffusion coefficient (D), whereas use of perfusion fraction (f) and pseudo-diffusion coefficient (D*) are still exploratory in nature (23, 29).
Underlying biological structures can alter the Gaussian distribution of the water diffusion as assumed in IVIM to be non-Gaussian (NG) in nature (30). This NG behavior has been incorporated in the non-monoexponential diffusion kurtosis imaging (DKI) model which provides the kurtosis coefficient (K) metric, a surrogate QIB of tissue microstructure, in addition to diffusion coefficient (31–33). Lu et al. incorporated the NG diffusion into the IVIM-DWI model (NG IVIM-DWI) and provided estimates for all the aforementioned quantitative imaging metrics (f, D, D*, and K) (34).
QIBs are being used in oncology clinical trials to monitor the effects of treatments, identify subjects likely to benefit from treatment, and as trial endpoints. As compared with other modalities and endpoints, QIBs have the advantage of being noninvasive and requiring little or no subjective interpretation. Furthermore, for disease conditions with multiple treatment options, early detection of nonresponders enables physicians to consult patients about other treatment options earlier, to potentially improve outcomes and limit adverse effects of ineffective treatments.
Before QIBs can be used in clinical trials, their technical performance must be assessed, similarly to how sensitivity and specificity must be established for diagnostic tests (35). Technical performance includes precision, bias, and the property of linearity. Perhaps the most important QIB performance metric is precision, that is, the ability to provide the same, or nearly the same, measurement value on repeated observations (36). Once precision and performance metrics are established, they may be used to formulate a clinical trial's eligibility criteria, to determine the cut-point for defining true change over time, and to compute the sample size required for the trial (37).
There is currently a paucity of repeatability literature for DWI measurements in the clinical setting, particularly for HN cancers and PTC. Hence, it is critical to perform test–retest studies as the fundamental building blocks for QIB discovery and clinical application of these more advanced quantitative imaging methods. The objective of this study was to establish the repeatability measures of quantitative Gaussian and NG diffusion metrics using data from phantoms and from patients with HN cancers and PTC.
Materials and Methods
Quantitative DWI Phantom
The quantitative diffusion phantom (High Precision Devices, Inc, Boulder, CO) developed by National Institute of Standards and Technology (NIST)/Radiological Society of North America (RSNA)-Quantitative Imaging Biomarker Alliance (QIBA) consists of 13 vials filled with varying concentrations of polyvinylpyrrolidone (PVP) in aqueous solution (38). The phantom was specifically designed for quantitatively mapping isotropic Gaussian diffusion of water molecules and generating physiologically relevant ADC values. The distribution of PVP concentrations in the phantom is as follows: 0% (vials 1–3), 10% (vials 4–5), 20% (vials 6–7), 30% (vials 8–9), 40% (vials 10–11), and 50% (vials 12–13). The space between the vials within the phantom was filled with an ice-water bath at 0°C to eliminate thermal variability across scanner locations and timepoints in ADC measurements. In this study, we will focus on the measurements obtained from 2 vials, that is, (1) water-only and (2) PVP-20%, as they relate to data from the novel isotropic diffusion kurtosis imaging (iDKI) phantom. Details of the NIST/QIBA DWI phantom have been published previously (38, 39).
The newly developed iDKI phantom used in this study was designed and fabricated by coauthors at the University of Michigan (40). The phantom captures a range of in vivo kurtosis values (Kapp ranges, 0.4–1.7) (31). Here we report data from 2 vials in the iDKI phantom: 1 vial containing chemical ceteryl alcohol and behentrimonium (CA-BTAC), a vesicular suspension formed by water solution of 2% CA-BTAC with other (minor) stabilizing ingredients (vial #2 [V2]), and a negative control consisting of a 20% solution of PVP in water (vial #4 [V4],), similar to the vial in NIST/QIBA DWI phantom (41). The iDKI phantom has been detailed in the poster presented at the NCI/Quantitative Imaging Network (QIN) meeting (40), and its full repeatability and long-term stability study is summarized in a research paper by Malyarenko D et al. submitted to this issue of Tomography.
The above 2 phantoms were studied to assess the technical performance of the quantitative imaging metrics among the 3 participating sites. There was a need to compare the vials with similar chemical composition for both the standard NIST/QIBA DWI and novel iDKI phantoms to emphasize the differences between the quantitative imaging metrics values for both diffusion and kurtosis coefficients.
The institutional review board of Site 1 (Memorial Sloan Kettering Cancer Center [MSKCC]) approved this prospective study for patients with head and neck squamous cell carcinoma (HNSCC) and PTC and was compliant with the Health Insurance Portability and Accountability Act. We obtained written informed consent from all eligible patients. A total of 14 patients were enrolled in the study between December 2016 and August 2017. In total, 30 MRI examinations were performed for these 14 patients, which comprised 60 test–retest MRI data sets. Nine patients with HNSCC were enrolled. All subjects had with metastatic nodes (M/F: 7/2, mean age: 59 years, range = 55–68 years) and underwent standard chemoradiation therapy (dose, 70 Gy). MRI examinations were performed before initiation of the standard chemoradiation treatment (pre-TX) and during treatment (intra-TX weeks 1 and 2) for patients with HNSCC. One patient with pre-TX MRI did not participate in MRI examinations during treatment. Five patients with PTC who underwent surgery (M/F: 4/4, mean age: 47 years, range = 37–61 years) were studied. All patient characteristics are summarized in Table 1.
|Patient||Age (years)||Gender||Primary Cancer|
DWI Data Acquisition
Quantitative DWI Phantom.
Diffusion studies were performed using the NIST/QIBA DWI phantom at 0°C on 1.5T and 3T scanners using a 16-channel head coil at all 3 sites (Site 1 [MSKCC], Site 2 [Columbia University Irving Cancer Center; CUMC] and Site 3 [University of Michigan; UMich]). Localizer images were acquired for accurate positioning of the phantom. DWI images were acquired using a single-shot echo planar imaging sequence with 4 b-values (ie, b = 0, 500, 900, 2000 s/mm2) and the following parameters: repetition time (TR) = 15 000 milliseconds, echo time (TE) = minimum (109–110 milliseconds), number of averages (NA) = 1, acquisition matrix = 128 × 128, field of view (FOV) = 220 mm, number of slices (NS) = 36, slice thickness = 4 mm, all 3 orthogonal directions at both 1.5T and 3.0T scanners. The total acquisition time for the multiple b-value DWI data acquisition was ∼2 minutes 30 seconds.
The iDKI phantom, designed and fabricated by Site 3 (UMich), was imaged by all 3 sites at different field strengths of 1.5T and/or 3T MRI scanners using a 16-channel head coil at ambient temperature. Localizer images were acquired for accurate positioning of the phantom. DWI images were acquired using a single-shot spin-echo echo planar imaging (SS-SE-EPI) sequence with 11 b-values (ie, b = 0, 50, 100, 200, 500, 800, 1000, 1500, 2000, 2500, 3000 s/mm2) and parameters on both 1.5T and 3T scanners were kept similar as follows: TR = 10 000 milliseconds, TE = minimum (93–107 milliseconds), NA = 1, matrix = 128 × 128, FOV = 220 mm, NS = 5, slice thickness = 5 mm, all 3 orthogonal directions. The total acquisition time for the multiple b-value DWI data acquisition was ∼5 minutes 20 seconds.
Four repeatability experiments for the NIST/QIBA DWI phantom in the study and 2 test–retests for iDKI phantoms with physical repositioning of the phantoms after each diffusion acquisition were performed.
MRI examinations were performed at Site 1 for patients with HNSCC on a Philips 3T MRI scanner (Ingenia, Philips Healthcare, The Netherlands) with a neurovascular phased-array coil (maximum number of channels: 20). Standard T1W and T2W imaging was followed by a multiple b-value DWI sequence (28). The DWI data were acquired using a SS-SE-EPI sequence with 10 b-values (ie, b = 0, 20, 50, 80, 200, 300, 500, 800, 1500, 2000 s/mm2) with TR = 4000 milliseconds, TE = 80 (minimum) milliseconds, NA = 2, matrix = 128 × 128, FOV = 200–240 mm, NS = 8–10, and slice thickness = 5 mm. For patients with HNSCC, DWI was acquired with full field of view as part of the standard clinical imaging protocol. The total acquisition time for the multiple b-value DWI data acquisition was ∼5 min. Two multi b-value DWI data sets were acquired at the same MR examination for each patient with HNSCC to test for the repeatability of the derived quantitative imaging metrics. Eighteen multiple b-value DWI data set were acquired at pre-TX (week 0). In addition, 32 multiple b-value DWI data sets were acquired at intra-TX week 1 and week 2 (during chemoradiation therapy). A total of 50 multiple b-value DWI examinations (pre-TX [9 patients], intra-TX week 1 [8 patients], and intra-TX week 2 [8 patients]) were performed (2 MR examinations at each session). As a note, these DWI data sets were acquired with full FOV (phase FOV factor = 1.0).
MRI examinations were performed at Site 1 for patients with PTC (n = 5) on a 1.5T (n = 2) or 3T (n = 3) scanner (General Electric, Milwaukee, WI), with a neurovascular phased-array coil and consisted of standard T1W and T2W imaging scans followed by multiple b-value DWI data acquisition. This was a feasibility test for the MRI of patients with PTC, which was performed as part of an ongoing research imaging protocol. Data were acquired with reduced field of view (rFOV) DWI technique, using a 2-dimensional spatially selective excitation (42). The acquisition parameters of rFOV DWI scans with the SS-SE-EPI sequence were as follows: 10 b-values (ie, b = 0, 20, 50, 80, 200, 300, 500, 800, 1500, 2000 s/mm2), TR = 4000 milliseconds, TE = 80 (minimum) milliseconds, NA = 2, matrix = 128 × 64, FOV = 200–240 cm, NS = 8–10, slice thickness = 5 mm, and phase FOV factor = 0.5. The total time for rFOV DWI data acquisition was ∼5 min.
Repeatability measures were tested on the multiple b-value DWI data sets obtained from patients with HNSCC at pre-TX, and during intra-TX weeks 1 and 2 of standard chemoradiation therapy. Pretreatment DWI repeatability data were obtained for patients with PTC who underwent surgery.
DWI Data Analysis
All DWI data postprocessing and quantitative metrics map generation, detailed below, were performed using in-house–developed software entitled MRI-QAMPER (MRI Quantitative Analysis of Multi-Parametric Evaluation Routines). The MRI-QAMPER package includes the algorithm routines for DWI data analyses (ADC, diffusion kurtosis, IVIM, and NG-IVIM), implemented in MATLAB (The MathWorks, Natick, MA). The MRI-QAMPER tool is approved by National Cancer Institute/Quantitative Imaging Network (QIN) with pre-benchmark status, which facilitates its use by other QIN site colleagues for analysis of multiple b-value DWI data.
For NIST/QIBA DWI phantom data analysis, 3 distinct circular regions of interest (ROIs) were manually placed (9 mm in diameter) on the selected vials, with water only and PVP-20%, in ADC maps avoiding boundaries; the mean pixel value across the ROIs in each vial was used to measure repeatability.
For iDKI phantom data analysis, 2 distinct circular ROIs (12 mm in diameter, single-plane) were placed on vials with CA-BTAC solution and PVP-20% in the phantom images; the mean pixel value across the ROIs in each vial was used for the test–retest study. To guarantee model convergence, a bmax constraint value for fitting the kurtosis expression in the CA-BTAC phantom vial was set to 1500 s/mm2 (bmax × Dapp × Kapp <3) (43).
For DWI patient data, ROIs were manually delineated on the DWI images (b = 0 s/mm2) on the metastatic neck node in HNSCC, normal thyroid gland, and PTC. ROIs were placed on thyroid glands avoiding obvious cystic, hemorrhagic, or calcified portions, whereas for normal thyroid tissue, ROIs were placed on the selected contralateral side to the PTC. ROIs were contoured by an experienced neuroradiologist based on the clinical information and T1W/T2W images using ImageJ (44).
Multiple b-value DWI data sets were analyzed using the following models:
Mono-exponential (ADC): All b-value DWI signal intensity data obtained from each voxel in the ROI were fitted to a monoexponential model to calculate ADC (mm2/s) as follows (45):
DKI: The signal intensity versus b-value DWI data were fitted to non-monoexponential diffusion kurtosis imaging model (DKI) of the following form (43):equation (1).
The NIST/QIBA DWI phantom was analyzed using monoexponential diffusion model equation (1), the iDKI phantom using DKI model [equation (2)], and HNSCC (tumor), and PTC (tumor and normal) using DKI model [equation (2)] and extended NG-IVIM model [equation (3)]. Mean metric values of ADC, DKI-derived metrics (Dapp and Kapp), and NG-IVIM-derived metrics (D, D*, f, and K) calculated from each ROI were compared between repeated measurements.
Technical precision of QIBs was evaluated based on the framework proposed by the RSNA/QIBA (https://www.rsna.org/uploadedFiles/RSNA/Content/Science_and_Education/QIBA/QIBA_Process_05Jan2015.pdf). The within-subject coefficient of variation (wCV, %) was used as the measure of precision; it was estimated from the phantom and clinical data as follows (22, 47–49):
Statistical analysis for the data was conducted in R (50) and MATLAB (The MathWorks, Inc., Natick, MA).
Quantitative DWI Phantom
Mean ADC values obtained from the NIST/QIBA DWI phantom (scanned at 0°C) at all 3 different sites on 1.5T and 3T MRI scanners are displayed in a box-and-whisker plot (Figure 1). ADC values are reported for 2 vials only (water-only and PVP-20%). The mean wCV (%) for vial with water-only were ≤1.07% and ≤0.84% and that for vial with PVP-20% were ≤0.71% and ≤3.19% at 1.5T and 3T MRI across the 3 sites, respectively. Results of ADC wCV and 95% CIs are summarized in Table 2.
Figure 2A shows the representative plot of the DWI logarithmic signal intensity versus b-value, fitted by both monoexponential and DKI models obtained from the iDKI phantom ROI for the vials with CA-BTAC (V2) and PVP-20% (V4). The box-and-whisker plots show the mean values of Dapp × 10−3 mm2/s (Figure 2B) and Kapp (no unit) (Figure 2C) obtained from V2 (captures both in vivo tumor cellularity and tissue microstructure) and V4 (captures in vivo tumor cellularity but negative control for kurtosis).
The wCV (%) mean values of Dapp and Kapp for V2 were ≤1.41% and ≤0.43% on both 1.5T and 3T MRI. The wCV (%) mean values of Dapp and Kapp for V4 were ≤1.01% and ≤25.06% respectively, on both 1.5T and 3T MRI. Table 3 summarizes the Dapp and Kapp mean wCV and 95% CIs values for vials with CA-BTAC and PVP-20%. The absolute Kapp < 0.05 value observed for ROI in vial with PVP-20% samples indicates minor bias of the NG model for this monoexponential material.
The pre-TX tumor volume (mean ± SD) in patients with HNSSC and PTC were 9.13 ± 6.22 cm3 and 0.35 ± 0.39 cm3, respectively.
Figure 3, A–D shows a representative DWI (b = 0 s/mm2) image, ADC × 10−3 mm2/s, D × 10−3 mm2/s, and K metric maps for a patient with HNSCC. Figure 3E depicts a representative logarithmic DWI signal as a function of the b-value obtained from the metastatic node of the HNSCC patient. The DWI signal was fitted to the monoexponential and NG IVIM model. Figure 3, F–H also displays the box-and-whisker plots for pre-TX test–retest mean values of the same quantitative imaging metrics detailed above.
The wCV (%) mean values of Dapp and Kapp at Pre-TX were 5.62% and 5.18%, respectively. Table 4 summarizes the mean wCV (%) and 95% CIs for Dapp and Kapp at pre-TX and intra-TX weeks in patients with HNSCC.
The mean wCV (%) values for pre-TX ADC, D, D*, K, and f were 2.38%, 3.55%, 3.88%, 8.0%, and 9.92%, respectively. Table 5 summarizes mean wCV (%) and 95% CIs for ADC- and NG-IVIM-derived metrics (D, D*, K, and f) at pre-TX and intra-TX weeks in patients with HNSCC.
Bland–Altman plots are shown for selective quantitative imaging metrics, ADC, D, and K, obtained from the pre-TX neck nodal metastases of patients with HNSCC (Figure 4). In each panel, the differences in mean values of ADC, D, and K were plotted between the repeated MRI examinations against the combined mean values of ADC, D, and K.
The results from patients with PTC are part of ongoing feasibility testing in the research setting for thyroid MRI imaging using rFOV multiple b-value DWI. Figure 5, A–D displays a representative DWI (b = 0 s/mm2) image, ADC × 10−3 mm2/s, D × 10−3 mm2/s, and K metric maps for a patient with PTC. Figure 5E shows a representative logarithmic DWI signal as a function of the b-value obtained from the normal thyroid tissue and tumor of the patient with PTC.
The wCV (%) mean values of Dapp and Kapp for normal tissue were 12.87% and 17.46%, respectively, whereas these metric values in tumor tissue were 22.42% and 25.94% in patients with PTC. Table 6 summarizes mean Dapp and Kapp wCV (%) and 95% CIs for normal and tumor region in patients with PTC.
|Normal||Mean||(2.51 ± 0.32) × 10−3 mm2/s||(1.08 ± 0.19)|
|wCV (%)||12.87 (7.71, 37.00)||17.46 (10.46, 50.19)|
|Tumor||Mean||(2.52 ± 0.57) × 10−3 mm2/s||(1.14 ± 0.29)|
|wCV (%)||22.42 (13.43, 64.46)||25.94 (15.54, 74.57)|
ADC mean wCV (%) were 11.86% and 10.04%, respectively, for tumor and normal thyroid tissue ROIs. The wCV (%) for NG IVIM-derived metrics (D, D*, K, and f) from tumors were 14.98%, 4.31%, 11.09%, and 13.31%, respectively. Preliminary mean values for ADC, D, D*, K, and f are summarized in Table 7 for normal and tumor tissue in patients with PTC.
Bland–Altman plots are shown for ADC, D, and K, obtained from normal and tumor regions in the PTC patients (Figure 6).
In this preliminary study, we measured the repeatability of the quantitative diffusion imaging metrics for Gaussian and NG models using 2 phantoms (the temperature-controlled NIST/QIBA DWI phantom and a novel iDKI phantom at ambient temperature) in a multisite setting, as well as for a small cohort of patients with HNSCC and PTC using the DKI model and the extended NG IVIM model.
For the NIST/QIBA DWI phantom, repeatability of mean ADC wCV (%) and 95% CIs values was excellent for the studied phantom vials with water-only and PVP-20% (≤3.19% and ≤4.0% respectively), for all 3 sites. The results reported herein are comparable to results from similar test–retest repeatability studies (51, 52). The Dapp and Kapp wCVs (%) and 95% CIs from all 3 sites were comparable at both 1.5T and 3T MRI. The novel iDKI phantom has been designed and fabricated with the purpose to better understand the performance of the quantitative diffusion metric kurtosis (K) as a surrogate of tissue microstructure and the stability of K over time. Performing appropriate phantom testing is a prerequisite for the QIB pipeline for clinical trials that use quantitative NG diffusion imaging metrics (37). Our phantom results confirmed adequate baseline technical performance of the MRI scanner systems and multiple b-value DWI protocols used for the quantitative DWI studies for patients with HNSCC and PTC.
There is currently paucity of repeatability measures for quantitative Gaussian and NG DWI in cancers of the HN region, despite availability of ADC test–retest data for organs such as brain (wCV = 3.97%), liver (wCV = 9.38%), and prostate (wCV = 16.97%) (13–15, 53). Only a few studies have reported test–retest data for IVIM in organs such as liver (23).
The preliminary findings for test–retest data in HNSCC showed that the mean wCV (%) for ADC-derived metric, DKI-derived metric (Dapp), and NG IVIM-derived metrics (D and D*) were ≤6% for pre-TX, intra-TX weeks 1 and 2. For f, Kapp, and K, the mean wCV(%) were ≤10%. Both ADC and D, quantitative imaging metrics, are surrogate biomarkers of tumor cellularity, while f and D* are still exploratory in nature (23, 29, 54). There is keen interest in furthering description of tissue microstructure using the quantitative imaging metric K (31, 34, 55, 56). The uncertainties from clinical HNSCC data slightly exceeded baseline repeatability achieved for phantoms due to additional patient-related variability.
Our clinical repeatability measurements for normal thyroid tissue and PTC are preliminary findings. Lu et al. reported that the ADC mean wCV (%) for the normal thyroid tissue in healthy volunteers is ≤10% using rFOV DWI at 3T (42). The present study found consistent results for the normal thyroid tissue (ADC mean wCV (%) = ≤10%) acquired with rFOV DWI at 1.5T and 3T MRI (42). Kim et al. reported that mean ADC values obtained at 2 different MRI field strengths (1.5T and 3T) were not significantly different (19). A relatively high wCV was observed for DKI- and NG IVIM-derived metrics that may likely be related to the limited sample size and the biology of the tumors in the thyroid gland.
Establishing the technical performance of a QIB allows us to better understand a patient's measurement at a single time point, especially the changes in measurements over time, by constructing a CI for the true value or the true change. For example, suppose we measure ADC of 1.22 × 10−3 mm2/s for PTC. If we know from our technical performance studies that the measurements are made with negligible bias and precision of wCV (%) = 11.86%, then a 95% CI for the patient's true ADC value is (1.22 ± 1.96 × (0.1186 × 1.22) × 10−3 mm2/s or (0.94 to 1.50) × 10−3 mm2/s. The CI helps differentiate between the true change of the parameter value versus the measurement uncertainty. Now suppose that on a second visit, the patient's tumor has an ADC of 1.31 × 10−3 mm2/s. Has the ADC value truly increased or is the observed change attributable to measurement error? The 95% CI for the true change is [(1.31–1.22) ± 1.96 × ] × 10−3 mm2/s or [−0.32, 0.50] × 10−3 mm2/s. Thus, given the known imprecision in the ADC measurements, we cannot conclude that a true change has occurred with 95% confidence.
Once the technical performance of a QIB is known, investigators are better able to design their clinical trials effectively. For example, a measured change in a patient's quantitative imaging metrics (eg, D or K) must exceed 10% (ie, 2.77 × wCV) to be 95% confident that a true change has occurred (37). Thus, in a drug trial using changes in D or K (QIBs) as a measure of therapeutic effect, a ≥10% cut-point should be used to define whether a treatment effect should be used to define treatment success and determine when a change in treatment is warranted. Similarly, in planning a clinical trial where D or K values will be compared across treatment arms, the imprecision in the QIB values affects trial sample size by increasing it relative to its magnitude and the magnitude of the between-subject variability.
There are a few known limitations to this study. This is the first feasibility test–retest study of Gaussian- and NG diffusion-derived metrics from multisite phantom and single-site clinical data testing. A larger cohort of patients (>30) is necessary to confirm statistical significance of the preliminary findings (57). Susceptibility artifacts caused by SS-SE-EPI, voluntary and involuntary bulk motion, are still an issue in the HN region, limiting repeatability. Thus, rFOV DWI for incremental improvement may be an option, exciting only a limited FOV and not surrounding regions that potentially cause interference (42). For the test–retest data set, technically the patients should be scanned, removed from the scanner for a few minutes and scanned again, referred to as a “coffee break” study. Here, the patients were repositioned between scans on the table but not removed from the scanner (“coffee break”) owing to practical reasons relating to patient comfort and workflow at the MR scanner. The results reported here provide insights into what is needed and must be paid attention to in test–retest studies in clinical oncology trials. For example, the test–retest studies for ADC in brain tumors derived from monoexponential modeling of DWI data reports a wCV of 3.97% (53, 58, 59). A smaller wCV value (<5%) indicates less variation in repeatability measurements.
In conclusion, we have shown repeatability of measurements for quantitative Gaussian and NG diffusion imaging metrics using multiple b-value acquisitions for NIST/QIBA DWI phantom and iDKI phantom, across multisite MRI systems, and used in HNSCC and PTC clinical trials. The preliminary results for the repeatability measurement of NG IVIM-derived metrics in HNSCC and PTC show promise and need additional validation with a larger subject cohort. In short, the precision of QIBs must be established for oncology clinical trials to noninvasively monitor the effects of treatment, to identify subjects likely to benefit from treatment and define trial endpoints.