Approximately 80 million computed tomography (CT) examinations are performed annually in the United States and ∼10% are conducted in pediatric patients (1). The use of CT has increased about eightfold since 1980 because of its diagnostic value in patient management (2). Further, ∼49% of the US population's collective ionizing radiation dose is caused by exposure obtained during diagnostic CT (3). Highly variable radiation exposure, varying by as much as a factor of 10 between institutions for comparable scans has raised major concerns regarding the risk of radiation-induced cancer, particularly in pediatric populations (4, 5). Therefore, reducing radiation exposure is of major importance. Dose reduction was accomplished with teamwork, stewardship (6), and a combination of device and case-specific measures such as flat and bowtie filters that reduce X-ray beams at angles deviating from the perpendicular and additional scanning units (7–9).
Depending on diagnostic goals, radiation exposure may be reduced through variations in acquisition time, patient size, voltage, section thickness, window, reconstruction algorithms used, and filter kernel (7, 8). Specific body mass index (BMI) guidelines are commonly used in reducing radiation dose so that the dose is increased proportional to the BMI (8, 10–12).
Most recent studies on image quality assessment of reduced-dose CT scans have reported qualitative findings rather than quantitative, or they have focused on image noise and/or spatial resolution as sole indicators of image quality (1, 4, 13–17). Here, we quantitatively analyze the image quality and diagnostic differences between reduced- and standard-dose CT scans. We compared organ- and tissue-level similarities of the target subjects, who underwent both standard- and reduced-dose CT scanning. We applied a comprehensive and complementary set of quantitative metrics to obtain quality differences that can have effects on radiologist's diagnostic interpretations. Figure 1 illustrates the quantitative comparison framework that was used in our experiments.
Materials and Methods
We retrospectively reviewed 50 standard-dose and 50 reduced-exposure CT exams (reduced as part of an overall program to reduced radiation exposures) to previous standard-exposure exams. In total, 50 patients (male, 22; female, 28; mean age 46.5 years [mean age male, 43.9 years; mean age female, 48.5 years]) underwent this CT imaging protocol (1 standard-dose and 1 reduced-dose CT examinations) between June 2011 and August 2014 (average time between 2 scans, 16 months). Table 1 details patient demographic and diagnostic information. The dose reduction included image reconstruction and elimination of the noncontrast through virtual noncontrast-enhanced (VNC) phase.
Multidetector CT Image Acquisition
The standard-dose CT consisted of precontrast phase, arterial phase, and nephrographic phase scans. The low-dose CT consisted of the following 2 phases: arterial and nephrographic phases, with the VNC processed from the nephrographic phase. The inherent noise for reduced-dose CT scans, induced by tube current reduction from 240 to 150 mA, was mitigated with iterative reconstruction (SAFIRE, Siemens Medical, Malvern, PA) with an iteration strength of 2 (out of 5). All VNC examinations were performed on Siemens Flash (Siemens Medical) in dual-energy mode and processed on Siemens PACS (Syngo.via, Siemens Healthcare, Erlangen, Germany). Imaging reconstruction parameters of both VNC (ie, low-dose) and standard-dose CT scanning procedures are listed in Table 2. Size-specific dose estimate and dose length product for standard triple- and dual-phase VNC CT scans are illustrated in Figure 2. In both measurements, size-specific dose estimate and dose length product rates were significantly higher in standard-dose CT scans, with a similar percentage as reported by Hara et al. (18).
Quantitative Image Features and Visual Evaluation
We used the following robust and powerful quantitative metrics describing image quality between standard- and reduced-dose CT images: structural similarity index (SSIM), gradient magnitude similarity deviation (GMSD), Hausdorff distance (HD), weighted spectral distance (WESD), and dice similarity coefficient (DSC). We applied these similarity indexes at both tissue and organ levels. In addition, 3 expert radiologists (board-certified and with >10, >15, >20 years of CT experience) evaluated standard- and reduced-dose CT scans in terms of qualitative (visual) comparisons. For this evaluation, inter- and intraobserver agreement rates were calculated. Appropriate statistical significance and comparison tests were conducted to assess local and global CT density differences and uncertainties in the segmentation of tissues and organs.
Computation of Similarities at the Tissue and Organ Levels
All standard-dose precontrast CT scans were coregistered to the VNC on the reduced-dose scans using an affine image registration method (19) to provide one-to-one mapping of each voxel for an unbiased comparison. Bone and fat tissues were segmented first using an appropriate (and fixed) Hounsfield unit (HU) interval for both types of scans (eg, −190 to −30 HU for fat). Segmented tissue volumes were compared through spatial overlaps. For organ-based comparisons, heart, liver, and spleen were segmented by 2 expert readers (blindly and independently) using semiautomated software tools, Amira (20) and 3D-Slicer (21). The 2 expert readers performing segmentation are independent from the 3 radiologists who did the qualitative evaluation. No binary union/intersection was applied across segmentations from different readers; instead, each pair of segmentations for standard and reduced dose scans was considered as an independent case in later statistical analysis. We also tested publicly available, robust and highly accurate generic pathological lung segmentation software for delineating lungs from both VNC and standard-dose CT scans (22). Our aim was to minimize human-induced errors when evaluating organ volume differences observed in those scans. In case of failures in semiautomated or fully automated segmentation methods, expert readers performed final refinements using interactive segmentation tools. Similarities of shape, intensity, and imaging patterns such as structure and texture were compared within and across the segmented organs and tissues. Quantitative metrics for tissue- and organ-based objective comparisons are listed in Table 3.
In particular, 5 metrics, namely, DSC, HD, WESD, SSIM, and GMSD, were used for evaluation. The first 3 metrics measure the structure similarities based on experts' organ delineations from low- and standard-dose scans: DSC measures the overlapping of 2 segmented regions based on area; HD measures the minimum distance between 2 boundaries; and WESD measures the shape similarity between 2 boundaries according to overall geometrical information. Although DSC and HD are widely used in radiological image analysis, their weaknesses were only recently addressed by the WESD to some extent. Similarity of segmented organ's (from low- and standard-dose scans) binary volumes and shapes are compared through these metrics. Higher DSC and lower HD imply high segmentation accuracies, indicating a close match between the appearance of both standard- and low-dose CT scans.
The remaining 2 metrics to compare low- and standard-dose CT scans are based on texture similarities. In computer vision, texture is referred to image analysis methods deriving intensity-statistics and density patterns of the images and their inter-relationships, both in local and global manners. Often, perception-based quantification parameters are used to describe the meaning of texture. In radiological image analysis, texture is generally used to describe the appearance of objects of interest (tumor, tissues, organs, etc.) such as dense/heterogeneous tumor regions. Applications of the texture-based image analysis in radiology are generally considered in computer-aided diagnosis systems, and it has been shown in many studies that texture helps in the identification of object boundary too. Herein, these appearance patterns (ie, texture) are described with statistical metrics such as covariance, standard deviation, and average intensity, and the relationships of those intensity patterns are used to describe similarities of low- and standard-dose CT scans. Texture-based analysis helps measure similarities and distinction of imaging patterns quantitatively without the need for segmentation operation. In this regard, we have used SSIM and GMSD for texture pattern analysis in this work. Although SSIM evaluates texture similarity based on the mean, variance, and covariance of the voxels within smaller (local) windows in the 2 regions (of interest), GMSD compares the edge details based on the gradient magnitudes of the images. These 2 metrics complement each other such that SSIM provides a global statistical similarity over intensity distribution, whereas GMSD captures local information that is often important for perception such as gradient and edge information (see online Supplemental Appendix for more detailed description of each method and their effect in radiographic image analysis).
Statistical Analysis and Visual Scoring
We used R (CRAN, Version 2.3) software to conduct statistical tests for quantitative and qualitative (visual) assessment of the scans, segmentation evaluations at organ and tissue levels, volumetric- and texture/density-based similarities of the organs and tissues, and image quality similarities between standard- and reduced-dose CT scans in the following ways:
Descriptive analysis was performed by calculating the means and standard deviations of similarity indexes on organs (lung/liver/spleen/heart) across different doses and observers.
Welch 2-sample t tests with a significance level of P = .05 and Pearson correlation coefficient were used to compare the intensity distributions of organs.
Two-sided Kendall Tau test was used to assess the agreements of image quality evaluation between standard- and reduced-dose CT from the same observers, as well as the agreements between multiple observers.
Expert radiologists were asked to evaluate each scan visually (blinded to their label of VNC and standard dose) on a 4-point scale. Score 1 was defined as substantial artifacts, excessive image noise, poor sharpness of anatomical structures, and inferior diagnostic acceptability. Score 2 was defined as obvious image noise and artifacts, suboptimal sharpness of anatomical structures, and average diagnostic acceptability. Score 3 was defined as moderate image noise and minor artifacts, good sharpness of anatomical structures, and above-average diagnostic acceptability. Score 4 was defined as minimal image noise and artifacts, improved sharpness of anatomical structures, and superior diagnostic acceptability.
Visual comparisons were made 1 week after the radiologists' scoring, and surface and volume rendering and axial section-by-section comparisons were conducted.
Three radiologists (with >10, >15, and >20 years of experiences) independently and blindly scored the VNC and standard-dose CTs for quality and similarity. The order of scans presented to radiologists for rating was randomized such that we can avoid the bias from presenting the VNC and standard-dose scans of a same case within a short period. Different observers' image quality assessments were similar (P < .050) using Kendall Tau test (23), giving paired scores of τ = 0.563 (P < 1e-16), τ = 0.193 (P < .048), τ = 0.194 (P = .047). For VNC and standard dose scans, we obtained the mean visual scores of 3.137 (std = 0.193) and 3.470 (std = 0.501), indicating that no “significant” visual difference was observed between VNC and standard dose scans evaluated by the same observer.
Fat and bone density histograms (HU) were first normalized for all patients to obtain mean and standard deviation HU, indicating level of radiation absorption (Figure 3). Welch 2-sample t test on normalized fat intensity distribution showed no significant differences (P = .17) between reduced- and standard-dose scans. In contrast, we found statistically significant differences between (dense) normalized bone HU density distributions (P < .01), but the 2 curves remained highly correlated (R = 0.92). This can be explained by the high attenuation of dense bone structures, which could be most sensitive to dose change among materials because of the discrepancy in absorption characteristics under different dose settings.
Figure 4 illustrates the results of shape metrics including DSC, HD, and WESD. High DSC (mean, 93.91%; standard deviation, 2.925 percentage points), low HD (18.30 mm, about 13 voxels; standard deviation, 10.66 mm), and low WESD (1.21 mm, about 1 voxel; standard deviation, 0.9194 mm) scores indicate higher agreement of the manually identified organ boundaries between low- and standard-dose images. Figure 4 also shows the statistics of intensity metrics, including SSIM and GMSD; high SSIM (mean, 0.9497; standard deviation, 0.0316) and low GMSD (mean, 0.1900; standard deviation, 0.06844) indicate higher appearance similarities between reduced- and standard-dose CT scans for each pair of segmented organs. Figure 5, in contrast, shows the same quantification metrics with respect to patients' BMI stratification. Because effective radiation dose is calculated in relation to patients' BMI, we explored VNC and standard CT image quality differences with respect to varying fat volume inside the body. Neither the presented image quality metrics nor the radiologists' readings revealed significant differences; hence, we concluded that radiation dose reduction does not have significant effects on image quality when BMI is considered as a control variable. Note also that there was significant correlation between the radiation dose reduction and BMI with R = 0.363 (P = .00957) for standard-dose scans and R = 0.821 (P < .005) for reduced-dose scans. This validates the previous research on associating BMI with the radiation dose as in the study by Mulkens et al. (24).
Volume agreements of segmented organs (lungs, liver, spleen) from VNC and standard-dose CT scans are shown in Figure 6. All agreement rates were found to be at least R = 0.89. Figure 7 illustrates surface rendering of segmented organs along with volume renditions for both VNC and standard-dose CT scans. Independent visual inspections of rendered surfaces by 3 expert radiologists did not reveal any significant differences.
In this study, we performed a comparative image quality assessment between VNC and standard multiphase chest, abdomen, and pelvis CT examinations. We showed that significant dose reductions on multiphase CTs could be achieved by replacing the noncontrast phase with a VNC scan, which did not cause significant changes in qualitative and quantitative evaluation of the images. Our assertion was supported by the radiologists' qualitative ratings, which, both overall and individually, did not yield statistically significant differences of image quality ratings between standard- and reduced-dose scans.
Differences between interobserver variability and dose-induced variability were not statistically significant, implying that there was no significant difference between standard- and reduced-dose CT scans in their ability to depict organ boundaries. Furthermore, the relative disparities between interobserver variability and dose-induced variability of structure-based metrics (DSC, HD, and WESD) and density-based metrics (SSIM and GMSD) show that the main differences between standard and VNC scans were the differences in appearance rather than shape information, and these differences were not significant.
Our choice of quantitative image analysis metrics was motivated by the incorporation of several relevant approaches to density and/or shape distribution. We sought to incorporate both well-established metrics (DSC, HD, histograms) and more advanced metrics for comparing targets analogous to our CT structures (WESD, SSIM, GMSD). There are numerous texture features that could be extracted from scans and be used for quantitative comparisons. However, exploring textural features for image quality assessment is outside the scope of this research. In contrast, we conducted in-depth analysis of certain tissue types. Because radiation affects different tissue differently, each organ and tissue was analyzed separately and systematically in our work. Bone and fat were selected as 2 particular tissue types in our evaluation framework because of their different radiation exposure rates. We intended to see how radiation dose reduction could affect CT density at fat and bone tissue types. Although dense bone regions were found to be considerably similar in VNC and standard-dose CT scans, additional care may be needed when tumor or abnormalities exist in those regions. Because our qualitative and quantitative judgments did not reveal anatomical differences in dense bones, small differences between bone CT density distributions may be related to type of the devices and materials in either one of the scans.
Two limitations of this study should be noted. First, the evaluations were performed on healthy organs with 5 different measurements over noncontrast CT images. Second, there may be some biases from the effect of VNC images being obtained an average of 16 months after the standard CT scanning. All quantitative metrics required 2 samples for comparison, making the comparison relative rather than absolute. A quality metric agreed upon to be specifically indicative of a single CT image's quality would not only facilitate analysis of image quality between doses but also allow evaluation of images that did not necessarily have corresponding scans in a short time frame. In addition, our population was specific to surveillance of renal cancer, rather than initial detection, so our experience may not be directly generalizable.
It should also be considered that low-, standard-, and high-dose definitions in Europe and USA may differ slightly (25). Although there is no constraint in our quantitative analysis regarding the amount of volume CT dose index, it may be necessary to extend the data set to explore the proposed methodology's robustness and feasibility in an additional validation framework.
In the literature, there have been many issues raised with regard to CT dose reduction studies spanning from size-specific dose estimation to CT dosimetry, and from the use of contrast material to the varying definition of low dose terminology (26). Herein, we confined the definition of “low dose” into “reduced dose” to follow the recent guidance on this terminology and avoid ambiguities (3).
In our experimental study, we did not compare VNC or quality difference on arterial versus nephrographic phases; perhaps this is an area of future study. In particular, in our experience and that of others (27), the arterial phase had exaggerated visual differences between real noncontrast and VNC, and therefore selected to process VNC on the nephrographic phase.
In summary, minor differences in image quality between standard- and reduced-dose chest, abdomen and pelvis multiphase CT examinations were shown to be nonsignificant. Both quantitatively and qualitatively, we have shown that density/intensity, shape, and textural patterns do not change from reduced-dose to standard-dose CT scans. We also showed that delineations of the organs (shape and volume) were obtained successfully from both dose types and there were no significant differences in quantifiable information, suggesting the routine use of reduced-dose CT scanning could be viable. As sophisticated CT dose reduction techniques are increasingly available, a reduction in the administered radiation dose is possible for many CT procedures without jeopardizing clinical diagnostic value. We believe that iterative reconstruction technology will continue to advance, and improve acquisition, and these will lead to further exposure reductions in the near future. For example, at the NIH Clinical Center, we have begun to explore photon scanning and have shown, for the first time, exposure reductions are possible with decreased noise (28). In addition to reduced dose and noise, photon scanning adds 2 additional energy levels for improved material characterization (29).