Lung densitometry by x-ray computed tomography (CT) is sensitive to the alterations in lung parenchyma as a result of the onset and progression of chronic obstructive pulmonary disease (COPD). Quantitative CT-based measures of lung disease are well characterized (1) yet have not fully transitioned into routine clinical use, where physiological assessment and clinical patient-reported parameters remain the standard of care for COPD. However, large clinical studies such as COPDGene (2) and SPIROMICS (3) have been undertaken using a standardized chest CT image acquisition protocol for disease phenotyping and for assessing progression. With longitudinal CT data collection in process, quantitative CT metrics must be fully characterized to minimize measurement variability between serial examinations.
Emphysema and air trapping are 2 components of COPD that are measured on CT images. Emphysema is measured as the percentage of lung with voxels below −950 Hounsfield units (HUs) (4, 5). Air-trapping measures, which quantify the extent of both emphysematous- and nonemphysematous-diseased tissue, include the percentage of lung with voxels below −856 HU on expiration scans (2) and the ratio of mean lung density on inspiration-to-expiration CT scans (6). In 2012, a voxel-based technique called parametric response mapping (PRM) was shown to discriminate between these 2 forms of air trapping when applied to paired and spatially registered inspiration and expiration CT scans (7). PRM applies previously defined thresholds for the emphysema index (< −950 HUs) and air trapping (< −856 HUs) on spatially aligned inspiration-to-expiration CT scans (8, 9). This allowed individual voxels within the lung parenchyma to be classified as normal (PRMNormal), emphysematous (PRMEmph), and nonemphysematous air trapping, referred to as functional small airways disease (PRMfSAD). Whereas PRMEmph is analogous to earlier density-based emphysema indices, PRMfSAD represented regions of air trapping that were nonemphysematous (7, 10). Like other quantitative CT measures, PRM may be influenced by differences among scanner types, patient compliance, scan acquisition protocols, scanner HU drift, and the selection of reconstruction kernels. Therefore, reducing CT-based metric variability by identifying, quantifying, and eliminating (when possible) the sources of variability (11, 12) is required to determine the detection limits for assessing true changes in COPD patient CT scans.
This study evaluates the sensitivity of PRM to sources of variability present in a standardized CT clinical acquisition and processing protocol that may affect quantitative CT measurements (Figure 1). Although many of the parameters investigated had a minimal impact on expected PRM values, this work demonstrates the effect of lung inspiration/expiration volume at CT acquisition on PRM measurements. These results reveal that establishing the correct lung inhalation and exhalation volumes during image acquisition is the most important quality control measure for longitudinal CT imaging.
Data used in this study were obtained under an institutional review board–approved protocol, and all participants involved provided written consent as part of the COPDGene clinical trial (2). Participants were separated into groups of 4 based on Global Initiative for Chronic Obstructive Lung Disease (GOLD) (13) stages 0 to 4 (Table 1). Baseline clinical and CT data were used from a total of 20 randomly selected participants along with participant characteristics and pulmonary function test results.
CT Data Acquisition
Quantitative CT data, measured in HUs, were acquired at full inspiration (total lung capacity [TLC]) and at relaxed expiration (functional residual capacity [FRC]) as defined in the COPDGene protocol (2). Scans used were obtained from different scanners involved with image acquisition in the COPDGene trial (Table 2). CT data were reconstructed using both sharp and standard kernels. All CT scans were linearly corrected using predefined mean HU values for blood (50 HUs) and air (−1000 HUs) (14, 15).
CT Data Analysis
The mean HU of air and aortic blood was measured from the paired CT data acquired from different vendor systems and reconstruction kernels. Volumes of interest (VOIs) were manually contoured at approximately a 10-mm radius or smaller as needed within the ambient air outside the patient, the blood within the aorta, and the air inside the trachea.
PRM was applied to paired inspiration/expiration CT scans as previously reported (7). In brief, the lung parenchyma was segmented from the thoracic cavity and airways. Inspiration and expiration image volumes were spatially aligned to a single geometric frame using a thin-plate spline with mutual information as an object function. Each parenchymal voxel (ie, the smallest unit of volume in an image data set) was classified using the emphysema index and air-trapping thresholds, defined by COPDGene as −950 HUs on the inspiration scan and −856 HUs on the expiration scan, respectively, and used for generating joint-density histograms of paired parenchymal HU values of the voxels. This allowed each voxel to be classified as normal (green), emphysematous (red), or nonemphysematous airflow obstruction (also referred to as functional small airways disease) (yellow). Relative volumes for each class were calculated by summing all voxels within a classification and normalized to the total lung volume.
Generation of Simulated CT Data
The impact of inadequate lung ventilation during CT acquisition on PRM measurements was evaluated using simulated CT data from 5 subjects, each representing GOLD 0 through 4. This was performed by spatially deforming the expiration lung scans using a mass-preserving, diffeomorphic transform to model a series of exhalation lung volumes in 10 steps from expiration to full-inspiration volumes acquired at TLC (16). Mass was preserved by adjusting the HU values for volume changes by multiplying each voxel by the local Jacobian of the warping transform. Simulations used linear control point trajectories between ∼50 control points per lung identified on both the expiration and inspiration lungs. PRM was applied to the paired original inspiration and simulated expiration CT data as previously described.
Simulated CT data were also generated to evaluate the impact of spaced thin-slice expiration CT data on PRM measurements. Gapped CT data were generated from contiguous whole-lung CT scans by subsampling the more widely spaced thin-slice images, resulting in an axial spacing of 0.625, 1.25, 2.5, 5, and 10 mm or 0.75, 1.5, 3.0, 6.0, and 12 mm depending on the original reconstruction. PRM was applied to the paired simulated spaced expiration and original inspiration CT data as previously described.
Simulations were performed to test the influence of misregistration on PRM metrics. A known high-quality registration solution was defined by 113 feature points that served as basis points for thin-plate spline transforms. Random perturbations of this feature set were generated and used to compute transforms that created misregistered solutions. Simulated target deformations ranged from 0 to 30 mm (the approximate distance the diaphragm moves between inspiration and expiration; a mean of ∼15 mm would be no registration), where any simulations with folding were removed. Included deformations had a mean feature displacement of 0 to 6.6 mm, resulting in features that moved up to a mean of approximately 10 voxels with a 40-voxel regional deformation maximum. PRM was calculated at each registration solution, and the results were analyzed to determine how local misregistrations contributed to PRM metric variability.
Ambient air, tracheal air, and aortic blood HU values between vendor systems were compared using an unpaired Student t test. A paired Student t test was used to compare differences in HU values in the 3 VOIs between reconstruction kernels. Bland–Altman analysis was performed to illustrate differences in the individual PRM metrics between sharp and standard reconstructed CT data. All statistical analyses were performed using IBM SPSS Statistics version 21.
Impact of Lung Ventilation Variability
Simulated expiration lung volume increase (Figure 2C–E) resulted in a drop in the mean HU density of the lung (Figure 2B). PRM analysis of the original inspiration and expiration at various simulated volumes resulted in a decrease in PRMNormal and increase in PRMfSAD. As shown in Figure 2, this trend resulted from a shift of the joint-density histogram toward less attenuation along the expiration axis (x-axis). Those cases with a large dynamic range between inspiration and expiration volumes, that is, GOLD stages 0 and 1, demonstrated the most sensitivity to insufficient expiration ventilation in PRM metrics (Figure 2F). Nevertheless, realistic variability in expiration volumes from FRC is typically around 20% (Figure 2F, solid lines). Deviation in expiration volumes within this range (0%–20% of FRC) resulted in only subtle changes in PRM metrics that were mainly observed in GOLD stage 1 through 3 cases. PRMEmph, classified primarily by the −950-HU threshold on inspiration scans, varies only slightly with exhalation volume changes (Figure 2F). Simulations that resulted in inaccurate PRMEmph measurements were observed from the anticipated effect of insufficient inspiration during a TLC maneuver (data not shown).
Impact of Reconstruction Kernel and Scanner Manufacturer
The impact of reconstruction kernel and scanner type on HU values is demonstrated in ambient and tracheal air and aortic blood. As shown in Figure 3, mean values of ambient air differed significantly between reconstructed kernels in both inspiration (vendor 1, P = .001; vendor 2, P = .001) and expiration (vendor 1, P = .001; vendor 2, P < .0001) CT scans. In addition, expiration tracheal air mean HU values were also found to vary between reconstruction kernels (vendor 1, P < .0001; vendor 2, P = .01). Negligible differences in mean HU values were observed for aortic blood irrespective of vendor. Differences in reconstruction kernels were evident in the PRM measurements from the same cases (Figure 4). Soft-tissue reconstructions (ie, standard) resulted in a tighter cluster of lung HU voxel-paired values (Figure 4A, C), whereas sharp bone reconstructions contained more noise, resulting in a broader distribution of the voxel joint-density histogram (Figure 4B, D). Bland–Altman analysis (Figure 4E) of the data revealed that PRM metric variability derived from standard and sharp kernels. Elevated levels in PRMNormal resulted in differences as high as 15% relative lung volume between reconstruction kernels, with the standard kernel generating larger PRM values than the sharp kernel as indicated by positive differences. In contrast, PRMfSAD and PRMEmph were found to have both positive and negative differences between reconstruction kernels. These results illustrate the complexity of the reconstruction kernel's impact on PRM measurements.
Impact of Slice Interval
The impact of noncontiguous CT scans with a 10-mm gap spacing on PRM were evaluated next (Figure 5). Figure 5A shows a schematic of the simulated distribution of CT slices with 10-mm gaps superimposed on a full-inspiration lung scan. Overall, mean differences in PRM values between gapped and full-resolution (ie, contiguous slices) CT data were generally small, with moderate-to-severe emphysema (<1%) being the least affected. Differences were found to be slightly elevated in GOLD stage 0 (PRMNormal = <3%) and GOLD stage 1 (PRMNormal = <2%; PRMfSAD = <1.5%) participants (Figure 5B). Nevertheless, differences in all PRM metrics were relatively low and likely resulted from the diffuse nature of COPD in the cases analyzed.
Impact of Image Registration
Simulations were performed to assess how sensitive the PRM metrics were to the misregistration of paired whole-lung CT data. Widespread local misregistration showed progressive variability of PRM metrics as a function of increasing deformations (Figure 6). The largest differences in PRM were observed in GOLD stage 3 and 4 participants, where all 3 classifications provide significant contributions to the lung volume. Overall, mean misregistrations of 2.5 mm (maximum deformation of 7.5 mm) yielded PRM differences of up to 1% of the lung. Furthermore, we also investigated and quantified the impact of registration direction on PRM because this can affect overall PRM results. Registering the inspiration to the expiration images resulted in higher percentages of tissue identified as air trapping (Figure 7). Because tissue compresses upon expiration, tissue with functional small airways disease compressed less and remained a larger percentage of the total lung volume (mean: PRMfSAD = 3.1; 95% CI: 0.8, 5.4) than if measured on the inspiration volume. Differences of PRMNormal and PRMEmph were close to 0, but PRMNormal had a slightly larger 95% CI (mean: PRMNormal = 0.1; 95% CI: −4.1, 4.2) (mean: PRMEmph = −0.4; 95% CI: −1.6, 0.8).
The consistency of PRM measurements for longitudinal application was evaluated for several prevalent sources of variability in the PRM-processing pipeline. Most of these steps introduced negligible variability in PRM. Nevertheless, significant effects were found as a result of inadequate lung ventilation during image acquisition. The clinical protocol for CT acquisition introduces various sources for HU variability that include proper lung ventilation and thin-sliced gapped CT acquisitions during expiration. The more significant of these 2 effects was the volume of lungs at image acquisition (TLC, FRC). PRM, with its use of inspiration and expiration CT scans, is susceptible to both volumes. A recent study has found that at lung volumes above 90% of vital capacity, emphysema Perc1 measures (first percentile of inspiratory attenuation distribution) varied negligibly (17). These results are in agreement with our PRMEmph values generated from simulations with slightly varying inspiration lung volumes from TLC (data not shown). Moreover, previous studies have delineated the interscan measurement variability caused by inspiration differences among scans (18) along with a proposal for using optimal protocols for CT surveillance of emphysema in a lung cancer screening environment (19). However, for expiration lung volumes, small corrections have been applied to low-attenuation analysis with a threshold of −856 HUs on the expiratory image (LAA−856E) by adjusting the thresholds based on the degree of deviation from the normal population in the difference of 90th percentile attenuation between inspiration and expiration images. However, this correction was sufficient only when volume differences were much less than the difference between residual volume and FRC (20). Spirometrically gated air-trapping measures have been shown to be more sensitive than FEV1 in response to treatment in a cystic fibrosis cohort (21). Nevertheless, a recent study has shown that air trapping in heavy smokers with 3-month repeat exams have variability that is incompletely explained by breathing level (22). It has also been found that spirometry monitored with biofeedback aids in scanning children with cystic fibrosis at consistent lung levels for measuring air trapping (23). Although clinical CT protocols tend to undertake shorter scans to minimize x-ray exposure to the patient, simulations of reduced-dose scans by more widely spaced thin-slice images showed increased variability, but no more than typical of emphysema measured from such scans (24). We would expect PRM to show a similar responsiveness to low-dose CT scans or those reconstructed with iterative reconstruction methods, but this will require confirmation.
Longitudinal quantitative CT measures are also challenged by the normal fluctuations found in CT scanners as well as the lack of consensus on the appropriate reconstruction kernel. The trade-off is well known between CT image noise and resolution, and the variety of smoothing/denoising reconstruction kernels used in clinical practice has a confounding effect on histogram-quantified metrics such as PRM. Variations in calibration, reconstruction, and changes over time in scanners can shift reconstructed HU values, resulting in changes in the histograms of voxels within the lung VOI and therefore in the joint-density histogram of paired inspiration/expiration voxels as well. Such shifts lead to inconsistent classifications of voxels as normal, emphysematous, and air-trapping lung tissues. A previous study has shown that when torso HU values were tracked over time, tracheal air measures were variable, whereas air outside the abdomen was more stable (25). Gradual drifts in HUs showed up in phantom-based monitoring (26) with occasionally larger (>10 HU) shifts observed for both air and blood HU values (11). Adjustments for air and water values have been proposed based on actual measured HU values of blood (14) and/or air (27) but can be difficult because of the axial variability of air measures, particularly in the trachea. Tracheal air adjustments were useful for correcting some CT attenuation-based measures across scanners in a normal cohort, but possibly not for expiration-based measures (28). Different CT reconstruction kernels have been found to result in significant differences in mean LAA−950I and LAA−856E values (12) and have different noise levels (29, 30), and we found similar results with PRM metrics. Although comparing metrics from different reconstruction kernels has been attempted (31), it is not advisable for PRM. This variability caused by differing reconstruction methodology, particularly for quantifying small airways disease, remains a significant problem (11).
Because PRM depends on paired voxels from coregistered inspiration and expiration scans, the final source of variability in PRM is image registration. Advances in imaging software and registration methods have provided multiple robust methods for registering lung data sets, with accuracy typically at the subvoxel level (32, 33). When a lung registration succeeds, remaining misregistration instances are primarily local and small. Our simulations show that, possibly unexpectedly, such localized misregistrations introduce a smaller (< ±2%) variability than contributions from other scan acquisition factors. We observed that PRM robustness may be the most challenged when faced with more difficult registrations that are found in both healthier patients and those with a highly heterogeneous disease resulting from unusual deformations of the lung that are hard to recover. We have also confirmed that PRMs from different registration directions (ie, register inspiration to expiration vs register expiration to inspiration) have a small nonzero bias in PRMfSAD values.
Some limitations to this study should be noted. The overall data presented for bias and noise characterization in the PRM CT methodology were acquired using a limited data set of 20 patients obtained from the COPDGene cohort. As such, future studies involving the use of larger longitudinal data sets (including zero-change/test-retest data) rather than the simulated data presented herein should be undertaken to demonstrate that corrective calibrations can be used to minimize nondisease-related variation. Furthermore, all potential sources of variation were investigated in isolation, whereas a multivariable analysis would be useful for evaluating their joint impact on PRM metrics.
As large COPD studies such as COPDGene and SPIROMICS begin to acquire longitudinal data, quantitative CT-based metrics must become robust enough to be able to analyze these data. To compare quantitative values, the sources of variability present in these measurements must be identified, and their effects on the quantitative measure must be further understood. Several factors that introduce variability into attenuation-based CT metrics can generally be avoided by standardizing CT protocols. Using the same CT parameters and reconstructions and registering these scans in a consistent direction each time lead to metrics with less variability (34). In summary, HU adjustment might not add sensitivity, standard reconstructions lead to less noisy PRM values, and registration direction should be consistent. Scans with fewer image slices do add noise, albeit small, to PRM metrics. However, PRM as a metric was shown to be most sensitive to lung volume, which may be influenced by changes in lung function as a result of disease progression as well as patient compliance. As such, care must be taken to ensure accurate ventilation during CT scanning because variability in lung volume remains difficult to correct.