For women receiving neoadjuvant chemotherapy (NAC) for locally advanced breast cancer, achieving pathologic complete response (pCR) after the treatment conveys a likelihood for excellent outcome (1, 2). Dynamic contrast-enhanced magnetic resonance imaging (DCE-MRI) can be used to determine the extent of disease before, during, or after the NAC for breast cancer. The functional tumor volume calculated using contrast kinetic enhancement thresholds was found to be predictive of both pCR and event-free survival (3, 4). However, tumor volume has limited ability to assess tumor heterogeneity, which can be an important prognostic factor and associated with NAC response (5, 6). DCE-MRI can also be used to assess tumor heterogeneity, which may reflect the underlying tumor biology and potential treatment response (7–9). Previous studies have shown that distinct imaging patterns of tumor morphology predict response to NAC (10, 11). However, it is time-consuming and subject to reader variability to manually assess MRI morphological patterns (MPs).
With recent technology advances in medical imaging, artificial intelligence has demonstrated promises in cancer diagnosis, assessment of treatment response, and prediction of disease progression (12). In particular, radiomics has been used to extract quantitative imaging features such as tumor size, shape, texture, to study tumor characteristics in breast cancer (13). According to the Image Biomarker Standardisation Initiative, sphericity (SPH) is one of the morphologic features to measure the shape of the tumor (14). Among all morphologic features, SPH may be used to differentiate tumor phenotypes, i.e. diffuse versus solid, independent from tumor size. Tumor SPH has been studied as a prognostic factor in oral (15) and lung cancers (16). To the best of our knowledge, no study of SPH has been reported in breast cancer based on MRIs. In this study, we propose to use an automatically derived SPH to quantify MRI MPs (17) by investigating and assessing the relationship between SPH and reader-assessed MP. We also investigated SPH as an imaging predictor for pCR, the endpoint of I-SPY 2 (Investigation of Serial Studies to Predict your Therapeutic Response with Imaging and Molecular Analysis 2) Trial.
A cohort of 990 women from completed or graduated treatment arms of I-SPY 2 were considered in this study. Subjects gave written informed consent before enrollment and again after being randomized to treatment. All participating sites received approval from an institutional review board.
Treatment and Pathologic Outcome
Patients were treated for 12 weeks with 1 of 9 experimental drugs completed in I-SPY 2 by November 2016 or standard care, followed by administration anthracycline–cyclophosphamide (18, 19). pCR was used as the endpoint of the trial, defined as the absence of residual cancer in the breast and lymph nodes at the time of surgery. pCR was treated as a binary outcome in our analysis (1: pCR; 0: non-pCR).
The primary tumor was categorized by hormone receptor (HR) and human epidermal growth factor receptor 2 (HER2) status (positive or absent/negative) at pretreatment, assessed from the core biopsy specimen by immunohistochemistry and Allred scores. The status of HR and HER2 define 4 breast tumor subtypes: HR+/HER2−, HR+/HER2+, HR−/HER2+, and HR−/HER2− (triple-negative).
MRI examinations were performed on 1.5 T or 3 T scanners across a variety of vendor platforms and institutions using a prospectively defined protocol. DCE-MRI was performed at multiple time points relative to treatment: before NAC (T0), after 3 weeks of NAC (T1), inter-regimen (T2), and presurgery (T3). Image acquisition parameters were as follows: repetition time = 4–10 milliseconds, minimum echo time (≤4.8 milliseconds), flip angle = 10–20°, and field of view = 260–360 mm to achieve full bilateral coverage, acquisition matrix = 384–512 with in-plane resolution = ≤1.4 mm, and slice thickness = ≤2.5 mm. Three-dimensional fat-suppressed T1 images were acquired before and after injection of a gadolinium contrast agent. Postcontrast imaging was started simultaneously with injection. Phase duration was 80—100 seconds with a minimum of 8 minutes of imaging following injection. Gadolinium contrast agent was administrated intravenously at a dose of 0.1 mmol/kg body weight, and at a rate of 2 mL/s, followed by a 20-mL saline flush.
Morphological Pattern Assessment
Pretreatment DCE-MRI MP was visually assessed and graded by a radiologist without knowledge of clinical or histopathological findings using a ranking of 1–5 scale corresponding to the decreasing degree of tumor containment (see online supplemental Figure 1) definitions of MP were as follows: 1 = well-defined single mass, 2 = well-defined multilobulated mass, 3 = area enhancement with nodularity, 4 = area enhancement without nodularity, and 5 = septal spreading (11).
SPH was derived from the existing tumor mask created by the functional tumor volume calculation from DCE-MRI as previously described (20). In brief, the segmentation method involved calculating early percent enhancement (PE) and signal enhancement ratio (SER) maps using the following equations:3, 20). For the SPH calculation the tumor mask was further filtered using an SER > 0.9 constraint to focus on plateau and washout characteristic enhancement. Surface area (SA) and volume were found for the resulting segmentation using a surface meshing analysis. SPH, defined as SA0/SAtumor where SAtumor is the area of the tumor mask and SA0 that of a sphere of the same volume, was calculated to quantify the similarity of the tumor morphology to a sphere (14). By the definition, with the similar volume, the scattered or multifoci tumor should have smaller SPH value than the single solid mass tumor. Processing was done using locally written software in IDL (Harris Geospatial, Broomfield, CO). SPH at T0 and T1 was calculated and tested for the prediction of pCR.
Mean and standard deviation (SD) were calculated for SPH values within each MP and by cancer subtype. The comparison of means between all MPs was performed using the Tukey HSD (honestly significant difference) test. Wilcoxon rank sum test was used to estimate the difference in SPH variables between pCRs and non-pCRs. The predictive performance of SPH was assessed by the area under the receiver operating characteristic (ROC) curve (AUC). Statistical analyses were performed using R version 3.4.1 (R Foundation for Statistical Computing, Vienna, Austria). The pROC package was used to analyze ROC curves and calculate AUCs (21). Statistical significance was defined as P < .05.
From the entire cohort of 990 patients, 2 subcohorts are referenced in the results of this study. Figure 1 shows patient selection in these cohorts. The first cohort consists of 220 patients with MP assessment at pretreatment, and SPH at T0 only was analyzed in this cohort. The mean and standard deviation of age in this cohort were 49 and 10 years, respectively. The second cohort consists of 935 patients with known pCR outcomes, SPH measured at T0 and T1, and HR (positive or negative)/HER2 receptor (positive or negative) status available. The mean and standard deviation of age in this cohort were 49 and 11 years, respectively. This cohort was used to analyze the association between SPH and pCR, in the full cohort and by HR/HER2 subtype. There was an overlap between the 2 cohorts (N = 210). Demographics of the first and the second cohorts are listed in Table 1. There was no statistically significant difference in age (P = .890), the frequency distribution of HR/HER2 status (P = .650), pCR outcome (P = .630), or SPH values at T0 (P = .890) between the analysis cohort and the cohort with MP assessment. Difference was found in menopausal status (P = .044). However, menopausal status was not considered in the analysis of this study.
|Cohort with Sphericity (n = 935)||Cohort with Morphologic Pattern (n = 220)||P-Value|
|Agea||49 ± 10||49 ± 11||.890|
|HR+/HER2−||365 (39%)||77 (35.0%)||.650b|
|HR+/HER2+||148 (16%)||40 (18.2%)|
|HR−/HER2+||80 (9%)||18 (8.2%)|
|HR−/HER2−||342 (37%)||85 (38.6%)|
|Premenopausal||442 (47%)||105 (47.7%)||.044b|
|Perimenopausal||32 (3%)||1 (0.5%)|
|Postmenopausal||271 (29%)||74 (33.6%)|
|Not applicable||125 (13%)||22 (10%)|
|Unknown||65 (7%)||18 (8.2%)|
|SPH at T0a||0.22 ± 0.099||0.22 ± 0.095||.890|
|0: non-pCR||625 (67%)||143 (65%)||.630b|
|1: pCR||310 (33%)||77 (35%)|
Examples of MRI morphologic reading and SPH values of 2 different patients are shown in Figure 2. The 2 cases chosen illustrate differences in SPH (0.32 vs 0.13) resulting from tumor morphological differences for patients with very similar tumor volume (14.5 cc vs 14.0 cc). Figure 3 shows boxplots of SPH values vs MP for the full N = 220 cohort, and boxplots for individual HR/HER2 subcohorts are shown in the online supplemental Figure 2 (HR+/HER2−: N = 77; HR+/HER2+: N = 40; HR−/HER2+: N = 18; HR−/HER2−: N = 85). Mean and standard deviations are listed in online supplemental Table 1. Overall, the mean and median SPH values decreased with the increasing order of MP; that is, when going from single solid mass to more diffuse tumors. The 95% confidence intervals (CIs) for the differences in means between each pair of MPs are plotted in Figure 4. The numerical values are listed in online supplemental Table 2. Statistically significant differences were found between patterns 1, 2, and 3. Pattern 1 was also different from patterns 4 and 5 with P-values of 0.0017 and 0.001, respectively.
In the cohort of 935 patients, 310 (33%) achieved pCR after NAC and 625 (67%) did not (non-pCRs). pCR rate varied among HR/HER2 subtypes. They were 18% (66 out of 365) in HR+/HER2−, 40% (59 out of 148) in HR+/HER2+, 61% (49 out of 80) in HR−/HER2+, and 41% (140 out of 342) in HR−/HER2− (triple-negative). Table 2 shows differences between pCR and non-pCR for median SPH at T0 and T1. The overall observation is that SPH is higher in patients who achieved pCR than in those with non-pCRs. The median difference in SPH was found to be statistically significant in the full cohort (P = .001) and in HR− (HER2+/−) cancer subtypes (P = .011 and P = .047 at T0 and T1 for HR−/HER2+; P = .001 at both time points for the triple-negative subtype. The difference in HR+/HER2− was statistically significant at T0 (P = .008) but not at T1 (P = .054). No statistical difference was found in HR+/HER2+ at either time point (P = .120 at T0, P = .550 at T1).
|N||pCR Rate (N)a||pCR||Non-pCRa||Differenceb||P-Value|
|Full||935||33% (309)||0.23 (0.16, 0.30)||0.18 (0.13, 0.26)||0.04 (0.02, 0.05)||.001|
|HR+/HER2−||365||18% (66)||0.22 (0.16, 0.31)||0.20 (0.14, 0.26)||0.04 (0.01, 0.06)||.008|
|HR+/HER2+||148||40% (59)||0.24 (0.16, 0.33)||0.19 (0.14, 0.29)||0.03 (−0.007, 0.06)||.120|
|HR−/HER2+||80||61% (49)||0.24 (0.17, 0.28)||0.16 (0.10, 0.26)||0.06 (0.01, 0.10)||.011|
|342||41% (140)||0.23 (0.17, 0.29)||0.18 (0.13, 0.24)||0.04 (0.02, 0.06)||.001|
|Full||935||33% (309)||0.22 (0.16, 0.32)||0.20 (0.14, 0.27)||0.03 (0.02, 0.04)||.001|
|HR+/HER2−||365||18% (66)||0.25 (0.15, 0.33)||0.21 (0.14, 0.27)||0.03 (−0.0005, 0.06)||.054|
|HR+/HER2+||148||40% (59)||0.21 (0.16, 0.32)||0.20 (0.15, 0.31)||0.01 (−0.03, 0.04)||.550|
|HR−/HER2+||80||61% (49)||0.23 (0.16, 0.30)||0.18 (0.12, 0.25)||0.05 (0.0007, 0.09)||.047|
|342||41% (140)||0.23 (0.16, 0.32)||0.19 (0.13, 0.25)||0.04 (0.02, 0.06)||.001|
i] Abbreviations: AUC, area under the curve; pCR, pathologic complete response; HR, hormone receptor; HER2, human epidermal growth factor receptor 2.
AUCs of using SPH to predict pCR in the full cohort and in individual subtypes are listed in Table 3, and corresponding ROC curves are shown in Figure 5. Overall, AUC was lower after the treatment started (at T1) than before treatment (at T0). In the full cohort, SPH at T0 and T1 was predictive of pCR with estimated AUCs of 0.61 and 0.58, respectively. Among the subtypes, the highest AUCs were observed in HR−/HER2+ with estimated values of 0.67 at T0 and 0.63 at T1. The lowest AUCs were observed in HR+/HER2+ with estimated values of 0.58 at T0 and 0.53 at T1. CIs showed that AUCs estimated for this subtype at both time points and for HR+/HER2− at T1 were not above 0.5 (nondiscriminatory test) with statistical significance.
In this study, we investigated using tumor SPH to quantify MRI MPs and to predict pCR. Previously it was noted that different tumor phenotypes in MRI responded differently to NAC treatments (10), and further evidence of this dependence was shown in a study using I-SPY 1/ACRIN 6657 data (11). In this current study, we tested the usage of the quantitatively measured SPH, a morphological measure derivable from the tumor mask generated from DCE-MRI tumor volume calculation, in distinguishing tumor phenotypes by comparing it to the MRI MP assessed by a radiologist. Mathematically, SPH is a measure of how closely the shape of an object is to a perfect sphere. Theoretical SPH values range from just >0 for a very diffuse object to 1 for a perfect sphere, but this range will be reduced for finite digitized 3D images such as MRI. Solid tumors with roughly spherical shapes have high SPH values, while scattered or multifoci tumors have low SPH values. Our results showed that SPH values for tumors classified with MPs 1, 2, and 3 in MRI decreased significantly with respect to the order, but the differences between MPs 3, 4, and 5 were not as obvious. A similar trend was observed in the HR/HER2 subtype subgroup. However, compared with the manual MP reading, SPH is an automatically calculated measure (aside from the manual selection of a bounding volume of interest box). It is highly reproducible. The lack of differentiation between the higher-order patterns may be partially due to the limit number of cases. However, the description of 5 MRI MPs can be ambiguous and difficult to reproduce, particularly among patterns 3, 4, and 5 (11). So the ability of any quantitative morphological metric to differentiate between these higher-order, more diffuse tumors may be severely limited.
We also tested SPH as a quantitative predictor for treatment response in patients who received NAC, and results showed that SPH at pretreatment appears to be higher in patients who had pCR than in patients who did not have pCR. The same trend was observed at T1, 3 weeks after NAC initiated. This result confirms previous finding that there is a higher chance of pCR for more circumscribed lesions (10, 11). AUCs also showed that SPH can differentiate pCR versus non-pCR at pre- or early-treatment time points. This finding is helpful in making clinical decisions, especially for HR+/HER2− cancer subtype, which has the lowest pCR rate among all subtypes for patients who underwent NAC. Alternative therapy could be used on patients who are not going to benefit from NAC. The fact that AUCs decrease from pretreatment to early-treatment time point showed SPH is the most predictive at pretreatment of NAC. Compared with other imaging technologies (mainly mammography and ultrasonography), MRI showed the highest accuracy in predicting pCR (22). However, very few studies report MRI features measured at pretreatment being predictive (23). Radiomics is a rising research tool in cancer imaging, which extracts a large number of imaging features and leverages machine learning algorithms to reveal underlying biology heterogeneity and predict treatment response. A recent published multicenter study showed that radiomics can be a potential tool to predict pCR for patients with breast cancer who underwent NAC (24). The current study focused on one feature that can be automatically calculated from DCE-MRI as a byproduct of functional tumor volume, an established biomarker having been used in NAC clinical trials for decades (3, 25).
Our study has several limitations. First, the radiologist read a subset of the full cohort, which may not represent the entire population. However, composition of the two cohorts was similar in terms of receptor subtype and pCR rates. Second, only one radiologist performed the reading, which has subjective components and will result in inter-reader variability that could affect the relation between SPH and phenotype. Third, SPH measurement requires a minimum number of voxels in the tumor mask. When tumor dissolves or shrinks to minimum residual, the remaining tumor may not contain enough number of voxels and thus SPH is not calculable in these cases. With our current processing, this could severely bias any analysis due to the resulting exclusion of patients showing rapid treatment effects, and thus SPH may be best applied to the index disease, and not as tumor responds.
In conclusion, this study showed the ability of an automated SPH measure to capture breast tumor phenotype in DCE-MRI. SPH at pretreatment alone predicted pCR. In future studies, other radiomics features that can be used as quantitative measures for tumor phenotypes will be explored. Furthermore, the possible combination of different selected features may also improve the prediction of treatment response.