Chronic obstructive pulmonary disease (COPD) is a leading cause of morbidity, mortality, and burden on the world's health and financial systems (1, 2). Advances in the clinical management of patients with COPD have led to an improved understanding of the multitude COPD phenotypes. It has been postulated that a spectrum of pathological processes may result in unique progression patterns among these patients. Extensive research has been devoted toward identifying surrogate biomarkers of disease progression with a strong emphasis on noninvasive imaging techniques and analytical approaches (3).
Parametric response mapping (PRM) is an analytical approach that, when applied to spatially aligned high-resolution computed tomography (HRCT) scans, allows both visualization and quantification of lung parenchyma affected by small airways disease (SAD), even when only emphysema is visibly observed (4). This technique quantifies a previously occult component of COPD and can be applied to retrospective HRCT data. Included in various NIH-funded clinical trials on COPD (5, 6), PRM of functional SAD (fSAD) has been demonstrated as an independent indicator of clinically relevant outcome measures (7). More recent studies have identified PRM as a surrogate of spirometric decline in COPD (7) and also a means for identifying and monitoring the onset of bronchiolitis obliterans syndrome in bone marrow and lung transplant recipients (8–10). In a preliminary study, PRM was evaluated as a marker for monitoring change in disease classification (ie, normal, fSAD, and emphysema) from subjects accrued as part of SPIROMICS (5). In this study, “voxel-based tracking,” a method for evaluating longitudinal changes in PRM classification at the voxel level, has been used. Although this approach when applied to PRM shows promise at providing local disease progression, variability in Hounsfield unit (HU) values from uncontrollable sources (eg, scanner noise, patient breathing level, and image registration) may result in shifts in voxel PRM classification that are not related to alterations in disease state (6).
Various studies have demonstrated the efficacy of PRM as a diagnostic and prognostic indicator of decline in pulmonary function and COPD severity. Nevertheless, the use of PRM to monitor COPD progression has shown a high sensitivity of voxel classification to HU variability between longitudinal CT examinations resulting in erroneous results. The purpose of this study was to present a strategy to mitigate the effects of nonpathological HU variability on voxel classification for analyzing COPD progression using PRM.
All clinical procedures were conducted under an institutional review board-approved protocol, and all subjects involved provided written informed consent. In total, 90 subjects (age range at baseline, 40–80 years), with paired volumetric inspiratory and expiratory HRCT scans and clinical examinations at a 30-day interval, were prospectively accrued as part of the Repeatability and Replicate Substudy of SPIROMICS (11). Subjects evaluated included smokers with a smoking history of ≥20 pack-years and GOLD (Global Initiative for Chronic Obstructive Lung Disease) scores across the scale including 0, 1, 2, 3 and 4 (11–13) and never-smokers (79 smokers and 11 never-smokers, respectively). Postbronchodilator forced expiratory volume at 1 second (FEV1) was determined by spirometry at each time point. In our set of subjects, exclusion criteria included intolerance to bronchodilators, body mass index (BMI) >40 kg/m2 at baseline, presence of non-COPD obstructive lung disease, diagnosis of unstable cardiovascular disease, lung surgery, or presence of metal in the chest that might affect chest CT interpretation. Further, 13 subjects from this cohort have been previously used to define thresholds that indicate disease-provoked changes in PRM metrics (5).
Whole-lung volumetric multidetector HRCT scans were acquired for all 90 subjects using the SPIROMICS imaging protocol (13). The current of 120 kVP was adjusted to meet the CT dose index volume targets for inspiration and expiration by making use of 3 settings—large (BMI > 30 kg/m2), medium (BMI, 20–30 kg/m2), and small (BMI < 20 kg/m2)—with vendor-specific reconstruction kernels (Standard, B, B35, FC03) (11). In this study, HRCT data reconstructed using the “GE standard” kernel were analyzed. Quantitative HRCT data were presented in HU values, in which stability of CT measurements for each scanner was monitored on a monthly basis by use of the COPDGene phantom (14). For reference, ambient air and water attenuation values should be −1000 and 0 HU, respectively. Of the 90 subjects, rescanning using a different scanner was conducted among 11 subjects and that using a different field of view (ΔFOV> 5%) from their original scan was conducted among 22 subjects. To reduce scanner noise, a 33 median filter was applied to all CT scans before processing and analysis.
Parametric Response Mapping
Lung segmentation and image registration to a single geometric frame (ie, baseline expiration CT scan) were performed on all paired CT data using Lung Density Analysis (LDA) software (Imbio, LLC, Minneapolis, MN). Classification of individual voxels was performed using in-house algorithms developed using MATLAB version 2015b (MathWorks, Inc, Natick, MA). Details on the PRM analysis have been previously reported (4). The nomenclature of these measures for normal lung parenchyma, fSAD, and emphysema includes PRMNormal, PRMfSAD, and PRMEmph, respectively. Additional details are provided in the online supplemental Methods.
Correction Strategy for Longitudinal PRM
After establishing that the difference in HU between interval examinations has a quasi-normal distribution (online supplemental Methods), we determined the variance using the serial inspiration and expiration voxel data. This approach is analogous to previous works on voxel-to-voxel therapeutic response assessment in cancer (15, 16). These data were plotted on a Cartesian coordinate system with the x and y axes denoted as the baseline and follow-up, respectively. Using principal component analysis, the data were transformed to the axes of primary and secondary variance (principal and secondary eigenvectors, respectively). Next, a linear fit along the principal eigenvector was performed with the subsequent residuals mapped into the second eigenvector axis. Residuals were used to calculate the 95% confidence interval of the fit (95% CI). The value of the confidence interval was transformed back to the original image space and defined as the index of measurement variability (IMV). This procedure was performed among all subjects and CT breath-hold examinations at both inspiration and expiration. To account for HU dependence on IMV, a cumulative exponential model was applied to all voxels in the CT data:
Figure 1 shows an illustration of the correction strategy. In step 1, we calculated the difference and average maps, ie, ΔHU and
Differences in subject age, height, weight, BMI, FEV1 (percent predicted), lung volumes, and percent lung volumes of PRM metrics at both interval examinations were assessed using the 1-way ANOVA controlled for multiple comparisons (Bonferroni post hoc test). Association between gender population and GOLD was assessed using the log likelihood ratio test. Temporal changes in lung volumes, relative volume of PRM metrics, and FEV1 were assessed using the Wilcoxon signed rank test. Differences in the percent agreement, ie, sum of voxels with like-PRM class normalized to total lung voxels, between uncorrected and corrected PRM at follow-up were also analyzed using the Wilcoxon signed rank test. Differences in percent agreement were also analyzed over GOLD stratums using the Kruskal–Wallis test. Percent agreement was computed using MATLAB version 2015b (MathWorks, Inc.). Statistical analyses were conducted with SPSS version 2.1 (IBM, Armonk, NY). All the results were considered statistically significant at the .05 level.
Study cohort population characteristics are provided in Table 1, and PRM results are displayed in Table 2. No significant differences in BMI, height, and weight were observed between strata. Never-smokers were found to be significantly younger than GOLD 1 subjects. As expected, lung volumes, FEV1, and PRM metrics at both interval examinations were found to be dependent on GOLD. No significant correlations were observed between FEV1 and PRM classifications within the stratum and at individual examinations. In addition, no significant relationships were obtained between subject gender and GOLD status. Finally, change in PRMNormal, PRMfSAD, and PRMEmph for each GOLD stratum was found to be insignificant over the 30-day interval (all cases with P > .07). The 95% confidence intervals in changes in PRM metrics over the 30-day interval are presented in the online supplemental Results.
i] Subject characteristics by GOLD stage. Values are mean (standard deviation) (except for gender). Characteristics not temporally disaggregated were recorded at 0 day.
HU Variability for Interval CT Scans
All subjects' generated histograms for ΔExp and ΔIns were similar to normal distributions. Of the 90 subjects, 90% were found to have histograms with a t-location scale probability distribution, whereas in the remaining 10%, histograms were generated with a logistic distribution. These functions, like Student t, are symmetric about its mean value with a leptokurtic shape. We found that the means of the fitted distribution functions over all 90 subjects and breath-holds were negligibly different from the expected value of 0.
Our procedure for applying a linear regression to the principal component analysis-transformed serial CT data at inspiration and expiration was found to generate consistent results irrespective of GOLD status or ventilation. To illustrate this point, Figure 2 presents voxel HU scatter plots at inspiration and expiration of subjects diagnosed with GOLD 1 and GOLD 3 COPD. Although the location of the voxel distribution varied between the 2 cases, the regression fits (red lines in Figure 2) generated consistent slopes and Y-intercepts (inserts in Figure 2). This consistency was also verified over the entire population (online supplemental Results). As expected, IMV was found to decrease with increasing COPD severity (Figure 2). As the disease progresses, lung parenchymal density approaches ambient air.
To address limitations in HU variability at low density, IMV was modeled as a function of HU. We assumed that the serial pair of modes was an adequate approximation of the centroid (center of mass) for the density distribution observed in the scatter plots. The dependence of IMV on HU values is particularly clear in Figure 3, in which IMV values show a nonlinear drop in value with decreasing HU average of modes. To strengthen the fit of IMV model to the data, expiration (blue markers in Figure 3) and inspiration (red markers in Figure 3) data were pooled. The optimal parameters obtained for the IMV model were V = 66.5 and r = −0.014 with a goodness of fit of NRMSE = 0.87 as defined as the normalized root mean squared error (online supplemental Results).
Application of Correction Approach on Follow-up PRM
Figure 4 shows representative sections from subjects diagnosed as GOLD 1 and 3 COPD at baseline; these are the same subjects shown in Figure 2. In the first case, PRMfSAD was found to drop by 2.8% at follow-up (Figure 4, top row fourth column) with no correction strategy implemented. Processing the serial CT scans using the strategy outlined in Figure 1 generated a corrected PRMfSAD that resulted in a drop from baseline of only 0.3% (Figure 4, top row fifth column). Consequent to the low relative volume of emphysema as determined by PRM, negligible benefits were observed when correcting the follow-up PRM. Nevertheless, the effect of our correction is evident locally near the apex of the left lung for subject GOLD 1 (Figure 4 top row black arrow fourth and fifth columns). The subject diagnosed with GOLD 3 COPD showed a decrease in emphysematous lung voxels, going from 28.7% at 0-day to 26.4% at the 30-day CT scan, yielding a short-term drop of 2.3% in emphysema (Figure 4, bottom row fourth column). Applying our correction scheme on the follow-up PRM resulted in a difference in PRMEmph of only 0.3%. A similar result was observed for PRMfSAD, with a change of 2.8% for the original PRMfSAD and 0.4% for the corrected PRMfSAD. Upon closer inspection of the bottom right lung, PRM voxel classification (w/o correction) at follow-up varies substantially from the baseline PRM for this subject (Figure 4 bottom row blue arrows).
The distributions of ΔPRMNorm, ΔPRMfSAD, and ΔPRMEmph separated by GOLD, for both uncorrected and corrected models, are presented in Figure 5. As expected, never-smokers and GOLD 0 subjects showed prevalence of voxels classified as PRMNorm with an interquartile range (IQR) of <2.4% for ΔPRMNorm and ΔPRMfSAD in both never-smokers and GOLD 0. As the disease severity increased toward more fSAD and emphysema, erroneous shifts in PRM classifications were more prevalent. We observed maximum classification variability on GOLD 2 and GOLD 3 subjects, yielding an IQR of >8.4% for ΔPRMNorm and ΔPRMfSAD. In the case of GOLD 3, we found the largest PRMEmph mismatch with IQR = 2.3%. Applying our correction strategy relieves the level of noise in PRM classification to more uniform ΔPRM distributions, with an IQR of <1.7% in all cases.
To assess the agreement in PRM classification, overall percent agreement scores were determined between baseline PRM and the 30-day CT scans either with or without corrections of PRM voxel classification. Figure 6 displays the distribution of the percentage agreement (%agreement) scores for follow-up PRM results over GOLD stages. We found significant differences between the uncorrected and corrected models, both globally (P < .0001) and within GOLD (P < .01 in all cases). Uncorrected follow-up PRM values were found to generate %agreement that significantly varied with increasing GOLD (P < .0001). This trend was not observed for %agreement using the corrected follow-up PRM (Figure 6).
We propose a strategy that addresses voxel-level measurement variability in serially aligned inspiratory–expiratory paired CT scans that affect voxel classification by PRM. CT data acquired at inspiration and expiration at a 30-day interval were used to quantify the variability of HU values owing to system noise and alignment imperfections during postprocessing. After verifying that the change in voxel HU values from baseline and at follow-up preserves the properties of a normal distribution, we defined an IMV that adjusts as a function of HU measurements. Our correction strategy allows voxel-level classification shifts to occur only when changes in voxel HU values exceed IMV(
Even with detailed spatial information present in CT imaging, the standard approach for assessing and monitoring disease by this modality has often been limited to calculating whole-lung or large tissue (ie, lobes) measures. Typically based on summary statistics (ie, mean), such as mean airway wall measurements (17) or percent air trapping using only an expiratory CT scan acquisition, variability in longitudinal CT scans has led to erroneous conclusions (18). For example, recently, Smith et al. (19) showed in a comparison of spatially matched airway segments of nonsmokers with a COPD population, that COPD subjects were found to have thinner walls, contrary to the data relying on a more lumped approach. The same whole-lung approach has also been applied to spatially aligned data, such as PRM, where individual classifications are presented as percentages of the entire lung volume (4, 20). In response to the findings from McDonough et al. where they postulated that SAD is an intermediate step toward emphysema (21); Boes et al proposed an approach for assessing voxel-level changes in PRM classification, which was referred to as “voxel-based tracking” (5). One-year interval paired CT data from SPIROMICS were spatially aligned to a single geometric frame, in this case the baseline expiration CT scan. As described in this study, each voxel in the lung parenchyma consisted on 2 temporally resolved PRM classifications. In a subject diagnosed with GOLD 2 COPD and found to have a decline in absolute FEV1 from 2.34 L to 2.12 L over 1 year, 48% of all voxels identified by PRM as emphysema at follow-up were fSAD at baseline. The ability to monitor COPD at the voxel-level could provide clinicians with a means to monitor local progression. Although promising, the effect of measurement variability on PRM classification was evident in the whole-lung measures analyzed in that study, where PRMEmph was found to decrease in a small number of subjects over the 1-year period (5), similar to our finding in Figure 4 for the GOLD 3 subject. Our observations concluded that fSAD measurements from uncorrected follow-up PRM, on average, showed moderate agreement to baseline PRM (Figure 5), even within the relatively short time frame of 30 days. The deterioration in PRMfSAD agreement with increasing GOLD status is attributed to the large number of voxels with HU values near the –950 HU and –856 HU thresholds for GOLD 1-3 subjects. Even a small deviation in the HU value between longitudinal CT scans ( Limitations in our approach deserve further attention. Although CT scans from 90 subjects were available for development of the approach, the composition of the population significantly varied across GOLD (Table 1). Quantitative CT values are highly dependent on scanner vendor, scanner type, acquisition parameters (eg, kV, mA, and FOV) and reconstruction kernels. In our study population, a subset of subjects underwent serial CT scans on different scanner types and acquired at different FOVs, with both limitations producing negligible differences in %agreement (details in online supplemental Results). In addition, we did not correct for inadequate ventilation at serial CT examinations (discussion in online supplemental Results). The effect of HU variability, consequent of CT acquisition, processing, and inadequate ventilation, on PRM quantification, has been previously reported, and techniques for alleviating their effect has been discussed (6). Although this approach does not address all errors associated with evaluating serial CT scans, our strategy for correcting PRM classification shifts is highly adaptable, allowing additional techniques that resolve more specific sources of error to be easily integrated in our workflow. Consequent to the impact of COPD to health systems worldwide, extensive research is being devoted to the development and evaluation of novel biomarkers. PRM has been shown in multiple studies to serve as an objective and quantitative measure of disease. Large-scale multicenter observational studies such as COPDGene and SPIROMICS provided temporally resolved HRCT to evaluate metrics, such as PRM, for monitoring COPD progression and, ideally, for therapeutic response assessment. Our methodology for correcting shifts in PRM classification due to variability in longitudinal HRCT scans may improve the clinical management of patients through more accurate monitoring of COPD subtypes.
Limitations in our approach deserve further attention. Although CT scans from 90 subjects were available for development of the approach, the composition of the population significantly varied across GOLD (Table 1). Quantitative CT values are highly dependent on scanner vendor, scanner type, acquisition parameters (eg, kV, mA, and FOV) and reconstruction kernels. In our study population, a subset of subjects underwent serial CT scans on different scanner types and acquired at different FOVs, with both limitations producing negligible differences in %agreement (details in online supplemental Results). In addition, we did not correct for inadequate ventilation at serial CT examinations (discussion in online supplemental Results). The effect of HU variability, consequent of CT acquisition, processing, and inadequate ventilation, on PRM quantification, has been previously reported, and techniques for alleviating their effect has been discussed (6). Although this approach does not address all errors associated with evaluating serial CT scans, our strategy for correcting PRM classification shifts is highly adaptable, allowing additional techniques that resolve more specific sources of error to be easily integrated in our workflow.
Consequent to the impact of COPD to health systems worldwide, extensive research is being devoted to the development and evaluation of novel biomarkers. PRM has been shown in multiple studies to serve as an objective and quantitative measure of disease. Large-scale multicenter observational studies such as COPDGene and SPIROMICS provided temporally resolved HRCT to evaluate metrics, such as PRM, for monitoring COPD progression and, ideally, for therapeutic response assessment. Our methodology for correcting shifts in PRM classification due to variability in longitudinal HRCT scans may improve the clinical management of patients through more accurate monitoring of COPD subtypes.