Neoadjuvant chemotherapy (NAC) plays an important role in the treatment of locally advanced breast cancer (1, 2). NAC enables down-sizing of the tumor to allow for breast conserving surgery and shows equivalent disease-free and overall survival when compared with adjuvant therapy (2). Breast magnetic resonance imaging (MRI) during NAC facilitates the evaluation of disease extent in the breast and the monitoring of tumor response to systemic therapy (3–8). Functional tumor volume (FTV) is a quantitative measure of tumor burden derived from dynamic contrast-enhanced (DCE) breast MRI. The I-SPY 1 (ACRIN 6657) TRIAL showed strong associations between FTV and pathological complete response (pCR) and recurrence-free survival after NAC (6, 7). The ability of FTV to predict pCR has the potential to advance personalized medicine by informing treatment redirection or deescalation based on patient response during NAC.
The I-SPY 2 TRIAL (Investigation of Serial Studies to Predict Your Therapeutic Response through Imaging and Molecular Analysis 2) is an ongoing phase II trial with adaptive randomization to multiple therapy arms. The aim of I-SPY 2 is to assess the efficacy of experimental agents alone or in combination with standard NAC in patients with breast cancer who are at a high risk for early recurrence (9–11). Patients have 4 MRI visits during neoadjuvant treatment, and FTV derived from each MRI examination is used to adjust patient randomization and estimate predictive probabilities for pCR. This multicenter clinical trial includes a total of 25 participating sites and >1600 patients randomized to treatment over 9 years. While there is a standardized MRI protocol distributed to all participating sites with instruction to use the same scan parameters for all sequential MRI visits for a single patient, adherence to protocol-specified scan parameters can be logistically and technically challenging given different magnetic resonance scanner vendors, models, configurations, and sequences across all participating sites and all sequential visits.
This study investigated the impact of MRI protocol adherence on the ability of FTV to predict pCR in the I-SPY 2 TRIAL.
This retrospective study is based on the review of MRI data from 990 patients with breast cancer randomized to 1 of 9 experimental drug arms completed by November 2016 or to a control arm (standard of care) in the I-SPY 2 TRIAL. We conducted this study in compliance with the Health Insurance Portability and Accountability Act. All participating sites received approval from their institutional review board, and all patients provided written informed consent to participate in the I-SPY 2 TRIAL.
Eligibility to enroll in I-SPY 2 was as follows: 1) women 18 years or older 2) diagnosed with locally advanced breast cancer (tumor size ≥ 2.5 cm) without distant metastasis. Eligible patients were classified according to breast cancer subtype, defined by their hormone (estrogen/progesterone) receptor (HR), human epidermal growth factor receptor 2 (HER2) status, and MammaPrint 70-gene signature (MammaPrint, Agendia, Amsterdam, The Netherlands). Patients with HR+/HER2− tumors that were assessed as low risk by the MammaPrint 70-gene signature were screened out from the trial. As neoadjuvant therapy, participants received 12 cycles of weekly paclitaxel alone or in combination with 1 of the 9 experimental agents, followed by 4 cycles of anthracycline–cyclophosphamide prior to surgery. Patients with HER2+ cancer also received trastuzumab. Each patient had 4 MRI examination visits: pretreatment (T0), early-treatment (3 weeks after treatment initiation, T1), inter-regimens (T2), and presurgery (T3). Figure 1 shows the study schema.
MRI Acquisition Protocol
Each site performed MRI examinations including DCE series by using a bilateral 3-dimensional, T1-weighted sequence with fat suppression on a 3.0 T or 1.5 T MRI scanner with dedicated breast coil. The standardized acquisition protocol in the ISPY-2 TRIAL, distributed to sites prospectively before study initiation, was as follows: repetition time of 4–10 milliseconds with minimum echo time; flip angle = 10–20°; field of view (FOV) = 26–36 cm; acquired frequency or read matrix = 384–512; acquired phase encoding matrix ≥ 256; in-plane resolution ≤ 1.4 × 1.4 mm; thickness ≤ 2.5 mm; temporal resolution = 80–100 second; axial orientation; and prone position. The standardized contrast injection rate was 2 cc/s with a 20-cc saline flush. DCE MRI was performed once before and multiple times after contrast injection using identical sequences, with scanning to continue for at least 8 minutes after contrast injection. Early and delayed postcontrast phases were selected from the postcontrast series at the time of analysis based on temporal sampling of the center of k-space closest to 2 minutes 30 seconds for early phase and 7 minutes 30 seconds for late phase. All sites submitted test cases showing their ability to comply with the standardized imaging protocol before study accrual began.
Protocol Adherence of Image Quality Factors
We defined image quality factors to investigate the impact of protocol adherence on the prediction of pCR. Image quality factors were categorized as confirmable or not confirmable based on whether adherence could be confirmed by metadata stored in the DICOM header.
DICOM confirmable factors (DC factors) included acquisition duration, early phase timing, FOV, and spatial resolution (in-plane resolution and slice thickness). Each DC factor was defined as adherent if it fulfilled the standardized acquisition protocol described above and remained within the defined ranges for change from baseline: acquisition duration ≤3-second change; early phase timing ≤5-second change; FOV ≤50-mm change; in-plane resolution ≤10% change; slice thickness ≤10% change (Table 1). The allowable ranges for change were not included in the standardized acquisition protocol, and thus these were defined based on the observation of the data distribution and clinical perspectives in this study.
Not DICOM-confirmable factors (nDC factors) included contralateral image quality (signal homogeneity, adequacy of fat suppression, presence of signal flare near breast coil or other artifacts), patient motion during scanning, and contrast administration errors (off-protocol saline flush volume or injection rate). The contralateral image quality was visually assessed and ranked by consensus of 3 trained readers in the I-SPY 2 Imaging Core Lab as part of the measurement of background parenchymal enhancement in a separate study (12). Readers reviewed 3 axial slices selected using an automated computer algorithm corresponding to the first, third, and fifth of the central 5 slices of the whole breast. Image quality of each examination was ranked using a 3-point scoring system (2, good; 1, adequate; 0, poor). The contralateral image quality was defined as adherent if the score was 2 or 1 and non-adherent if the score was 0. Patient motion was noted during the FTV calculation process, and cases with minimal or no motion were defined as adherent. Contrast administration errors were documented on case report forms completed by sites when the examinations were submitted to the Core Lab for FTV analysis. Examinations without reported errors were defined as adherent.
Functional Tumor Volume (FTV)
In this study, we focused on FTV at T0, T1, and T2 and disregarded T3, because prediction of pCR at T3, the presurgery time point after completion of NAC, does not influence treatment redirection during NAC. Using in-house software developed with IDL (Exelis Visual Information Solutions, Boulder, CO), FTV was calculated by summing all voxels with early percent enhancement (∼2.5 minutes post contrast injection) above 70% and signal enhancement ratio above zero within a manually delineated 3-dimenional region of interest as described in the literature (13). We defined FTV at T0, T1, and T2 as FTV0, FTV1, and FTV2, respectively. Percent change of FTV from baseline (T0, pretreatment) to T1 and T2 was defined as %ΔFTV0_1 and %ΔFTV0_2, respectively. FTV changes (%ΔFTV0_1 and %ΔFTV0_2) were then stratified into adherent or non-adherernt subsets. For FTV change to be stratified as adherent at a given time point, both examinations had to be adherent. For example, for %ΔFTV0_1 to be adherent, FTV0 and FTV1 both had to be adherent. This stratification was done for each image quality factor (DICOM-confirmable and not DICOM-confirmable) at each treatment time point. FTV changes (%ΔFTV0_1 and %ΔFTV0_2) that were adherent for all image quality factors were stratified as adherent in “combined factors.”
In the I-SPY 2 TRIAL, pCR was defined as the absence of residual invasive disease in the breast and axillary lymph nodes on surgical specimen after NAC.
Statistical analyses were performed using R Version 3.5.1 (R Foundation for Statistical Computing, Vienna, Austria) along with the R packages “pROC” (14) and “boot” (15, 16). P values <.05 were considered statistically significant.
The area under the receiver operating characteristic (ROC) curve (AUC) was used to evaluate the performance of FTV change for predicting pCR. For each image quality factor, the AUC was estimated independently for adherent and nonadherent FTV change subsets at early treatment (T1) and inter-regimen (T2) time points. In addition, associated 95% confidence intervals (CIs) for the AUCs were calculated based on 2000 bootstrap replications. For the statistical comparison of the AUCs between the adherent and nonadherent subsets for each image quality factor and each time point, the 95% CI and P value for the difference between 2 AUCs (AUC of the adherent subset minus AUC of the nonadherent subset) were estimated based on 2000 bootstrap replications.
Patient characteristics including age, menopausal status, race, molecular subtype, assigned chemotherapy, and treatment response for the 990 patients are listed in Table 2. The mean age was 48.8 years in the range of 23–77 years except for 1 patient with no record available. The whole cohort included 464 patients (47%) with premenopausal status, 33 patients (3%) with perimenopausal status, 291 patients (29%) with postmenopausal status, and 202 patients (20%) with no record available. The cohort comprised of 784 (79%) white, 121 (12%) African American, 68 (7%) Asian, 5 (1%) Native Hawaiian or Pacific Islander, 4 (0.4%) American Indian or Alaska Native, 7 (1%) patients with mixed race, and 1 (0.1%) patient with no record available. Of the 990 breast cancers observed in this study, 380 (38%) were HR+ HER2− subtype, 156 (16%) were HR+ HER2+ subtype, 89 (9%) were HR− HER2+ subtype, 363 (37%) were HR− HER2−, and 2 (0.2%) had no record available. Standard NAC was assigned for 211 (21%), experimental drug arms were assigned to 777 (78%) and no record was available for 2 (0.2%). In total, 324 (33%) of the 990 patients achieved pCR.
Of a possible total of MRI exams ), 113 exams were not completed or were rejected by the I-SPY2 TRIAL owing to errors in case report forms (n = 1), patient withdrawal of treatment consent (n = 43), patient illness (n = 13), MRI scanner or contrast injection issues (n = 4), early surgery after early discontinuation of NAC (n = 3), or missed patient appointments for unknown reasons (n = 49). In addition, 18 exams were excluded by the I-SPY 2 Imaging Core Lab owing to severe imaging issues such as inconsistent scanner being used between visits, patient scanned in the sagittal or coronal direction, repositioning after contrast injection, and corrupted data. In total, 2839 MRI examinations (T0, 989; T1, 952; T2, 898) were analyzed, and FTV and percent changes of FTV were calculated: %ΔFTV0_1, 952; %ΔFTV0_2, 898. Figure 2 shows the study flow chart.
Predictive Performance of pCR and Image Quality Factors
In the whole data set, FTV change (%ΔFTV0_1 and %ΔFTV0_2) led to estimated AUC values of 0.68 and 0.69, respectively (Figure 3, A and B). For each image quality factor, the AUC and associated 95% CI for adherent and non-adherent subsets of FTV change at T1 and T2 are shown with the number of pCR or non-pCR patients in Figure 3A and Figure 3B, respectively. Table 3 shows the 95% CI of the AUC difference between adherent and non-adherent subsets and the associated P value based on the bootstrap test.
|Image Quality Factor||Difference of AUCs for FTV Change at T1||Difference of AUCs for FTV Change at T2|
|Estimate||95%CI (LL, UL)||P Value||Estimate||95%CI (LL, UL)||P Value|
|Acquisition Duration||0.00||(−0.09, 0.10)||.966||0.05||(−0.05, 0.15)||.347|
|Early Phase Timing||0.04||(−0.05, 0.15)||.407||0.04||(−0.05, 0.13)||.444|
|FOV||−0.01||(−0.12, 0.11)||.904||0.06||(−0.05, 0.17)||.251|
|Resolution||0.03||(−0.06, 0.11)||.505||0.01||(−0.06, 0.10)||.725|
|Contralateral Image Quality||0.06||(−0.02, 0.14)||.146||0.04||(−0.04, 0.11)||.389|
|Patient Motion||0.08||(−0.12, 0.25)||.462||−0.18||(−0.30, −0.01)||.008*|
|Contrast Administration||−0.15||(−0.31, 0.10)||.135||−0.10||(−0.32, 0.18)||.493|
|Combined Factors||0.06||(−0.02, 0.13)||.149||0.04||(−0.04, 0.11)||.295|
i] Abbreviations: AUC, area under the curve; CI, confidence interval; FOV, field of view; FTV, functional tumor volume; LL, lower limit; UL, upper limit.
ii] Difference of AUCs between adherent and nonadherent subsets was calculated as AUC of adherent subset minus AUC of nonadherent subset.
DICOM-Confirmable Factors (DC Factors).
FTV changes with adherent image quality factors tended to have higher estimated AUC values than those with non-adherent image quality factors, although the difference between the AUCs did not reach statistical significance (Figure 3 and Table 3): AUC values for adherent vs. non-adherent subsets at T1 (acquisition duration, 0.68 vs 0.67; early phase timing, 0.68 vs 0.64; spatial resolution, 0.68 vs 0.65) and at T2 (acquisition duration, 0.70 vs 0.66; early phase timing, 0.70 vs 0.66; FOV, 0.70 vs 0.64; spatial resolution, 0.70 vs 0.68).
Not DICOM-Confirmable Factors (nDC Factors).
In terms of contralateral image quality, FTV changes with adherent image quality factor had higher estimated AUC values than those with non-adherent image quality factor, although the difference between the AUCs did not reach statistical significance (Figure 3 and Table 3): AUC values for adherent vs non-adherent subsets were 0.69 vs 0.63 at T1 and 0.70 vs 0.67 at T2. In terms of other nDC factors (patient motion and contrast administration), the non-adherent subsets had a relatively small sample size (T1 patient motion, n = 34; contrast administration, n = 21; T2 patient motion n = 29; contrast administration, n = 27) and AUC with a wide range of 95% CI (Table 3). In these 2 factors, the majority of the FTV changes with adherent image quality factors had lower estimated AUC values than those with non-adherent image quality factors, although the differences between the AUCs did not reach statistical significance except for patient motion at T2 (Figure 3 and Table 3): AUC values for adherent vs non-adherent subsets were 0.67 vs 0.83 for contrast administration at T1, 0.69 vs 0.87 for patient motion at T2, and 0.69 vs 0.79 for contrast administration at T2.
FTV changes that were adherent for all image quality factors (“combined factors”) showed higher AUC values than those that were non-adherent (T1, 0.71 vs 0.66; T2, 0.72 vs 0.68), although the difference between the AUCs did not reach statistical significance (Figure 3 and Table 3). Figure 4 shows the ROC curves of adherent or non-adherent FTV change subsets for combined factors.
In this study, we showed that MRI protocol adherence has an impact on the performance of FTV as a predictor of neoadjuvant treatment response. Importantly, these effects were found in the set of MRI examinations already meeting acceptance criteria for I-SPY2. The current MRI acceptance rate in I-SPY2 is >95%. These study results suggest that stricter protocol adherence requirements have the potential to increase predictive performance.
Neoadjuvant chemotherapy (NAC) enables the monitoring of tumor response in vivo, in addition to the downsizing of tumors before surgery. Because pCR is strongly associated with better outcomes, especially in more aggressive subtypes (17, 18), the ability to reliably predict non-pCR at the early-treatment (T1) and inter-regimen (T2) time points could impact treatment decision-making and allow for personalized redirection to more effective therapy. In addition, accurate early prediction of pCR could facilitate deescalation in patients who are responding well to treatment.
Breast MRI has shown greater accuracy than clinical examination or conventional imaging in predicting residual disease after NAC (19–25). A previous analysis of patients with breast cancer undergoing NAC also showed that FTV derived from DCE MRI was a stronger predictor of pCR after NAC than clinical assessment (6). Thus, MRI would be a reliable and clinically relevant modality to monitor breast cancer tumor response. Our study suggests that MRI protocol adherence is important to improve the prediction of pCR at the early-treatment (T1) and inter-regimen (T2) time points, when treatment redirection is possible. These analyses are continuing and will be used to refine quantitative imaging requirements in the ongoing I-SPY 2 TRIAL.
In multicenter trials involving MR quantitative imaging, there are many challenges to standardizing imaging sequences for different scanner manufacturers, field strengths, and coil configurations. In I-SPY 2, patients have 4 MRI examinations over the course of NAC, and using similar scan parameters for a single patient's visits over a 6-month period is logistically and technically challenging. Although a standardized MR imaging protocol with specific image acquisition parameter ranges was prospectively distributed to all participating sites, differences in clinical workflow and institutional infrastructure made protocol adherence challenging across sites. Also, while similar scan parameters for each patient's 4 MRI visits are crucial for accurate FTV calculation and assessment of FTV change from baseline, it was not always feasible to obtain adherence to baseline values across all 4 MRI visits. Our study showed that FTV change with adherent image quality in all factors showed higher AUC value than that with non-adherent image quality, although the difference did not reach statistical significance. Conversely, the nonsignificant difference may suggest that our FTV calculation method is somewhat robust in the context of minor protocol non-adherence. Future versions of the MRI protocol and standard operating procedure documents will emphasize the importance of protocol adherence for accurate calculation of FTV, and real-time remediation for sites with non-adherent data will be implemented.
DICOM-confirmable factors included DCE parameters related to scan timing and voxel size. FTV calculation is based on predefined criteria for signal enhancement, which is dynamic and sensitive to timing in DCE MRI (6). Thus, it is reasonable that FTV change had higher estimated AUC in the adherent subset than in the non-adherent subset in relation to acquisition duration and early phase timing, albeit the differences did not reach statistical significance. FTV calculation is also dependent on voxel size because it is calculated by multiplying the voxel size by the number of pixels fulfilling the criteria for signal enhancement. Therefore, voxel size–related parameters including FOV and spatial resolution may affect FTV calculation and its predictive performance.
Although DICOM-confirmable factors were relatively easy to verify, the not DICOM-confirmable factors were much more challenging to assess. We analyzed contralateral image quality, which was originally assessed in the process of analyzing contralateral background parenchymal enhancement in a separate study (12), as one of the not DICOM-confirmable factors. An automated method was used to select and review 3 representative slices, but it was still a time-consuming process to review and score all 2970 examinations. Thus, we hypothesized that the contralateral image quality might reflect the overall imaging quality of MRI. Although a scoring method reviewing just 3 slices of the contralateral breast is one limitation of this study, our results show that contralateral image quality impacts the performance of FTV change for predicting pCR. This result suggests the importance of verifying quality factors such as magnetic shimming, breast position, fat suppression, and artifacts in both breasts at the time of scanning (26). It may also indicate the technical challenges of FTV calculation for ptotic or large breasts, which are difficult to optimally position in the coil and prone to poor fat suppression or signal flare near the breast coil.
Other not DICOM confirmable factors, patient motion, and contrast administration had counterintuitive estimated results, with lower AUC for the adherent subset than the nonadherent subset. We can think of 2 possible explanations for these results. First, owing to the challenges in assessing and documenting these image quality factors retrospectively during FTV calculation, the adherent subsets might have included unknown or undetected patient motion or contrast administration errors. Identification of patient motion was subjective, and the presence of motion was not recorded accurately in real time. To the best of our knowledge, there is also no reliable method to quantify and correct for motion on breast MRI, particularly in the neoadjuvant biomarker setting. Changes to Imaging Core Lab postprocessing routines for image registration and recent modifications to the study database used to document the presence of motion will hopefully improve the available data on patient motion in the future. Verification of protocol-adherent contrast administration rate and saline flush volume relied on self-report from participating sites, and data entry errors compromised the completeness and reliability of this information. Recent improvements to the contrast injection case report form submitted by sites and modifications to the database used to store this information have made tracking of this important factor more robust.
Second, the small sample size in the non-adherent subsets leads to considerable uncertainty in the estimated AUCs. It is unclear which, if any, of these issues are responsible for the counterintuitive results. Given the limitations of the data collected on motion and contrast administration error and the small sample size in the nonadherent subsets, definitive statements regarding the causes of the counterintuitive results, or whether they are in fact genuine, cannot be made at this time.
This study has limitations. First, it was a retrospective study, and collecting data for not DICOM confirmable factors was particularly challenging. Second, confounding factors might affect our results. Possible confounding factors include 4 tumor subtypes, 10 drug arms, and 25 participating sites. Considering the statistical uncertainty owing to the small sample size when the whole cohort was divided into each subcohort, we could not adequately assess the impact of these confounding factors.
In conclusion, our results highlighted that MRI protocol adherence has an impact on the performance of FTV for predicting pathological complete response. This was a retrospective study of 2970 MRI examinations that were accepted for FTV calculation in the I-SPY2 TRIAL and met a quality standard of the prospective acceptance standards. Thus, our results reflect incremental improvements. The results of this study will be used to inform higher standards for personalized strategies.