In oncology clinical trials and clinical practice, estimation of standardized uptake values (SUVs) of malignant lesions in positron emission tomography (PET) images can be used to assess response to therapy (1–6). Evaluation of response on a per-patient basis is central to the concept of precision medicine, which is prevention and treatment strategies that take individual variability into account (7). However, measured SUVs have a large degree of variability owing to physical and biological sources of error, as well as variations in image acquisition, processing, and analysis (8–10).
Important sources of variability are global shifts in SUVs due to scanner calibrations, operator error, or other reasons. During calibration, scanner sensitivity is typically measured by computing the number that scales PET images in arbitrary scanner units to match the known radiotracer concentration. SUV bias due to this scale factor, or calibration bias, is unstable even when measurements are repeated at a single site (11, 12). Further, biases of key factors in the computation of SUVs, from PET scanners and dose calibrators, are not correlated and thus do not cancel out (11, 13).
A second important source of variability in PET SUVs is a size-dependent bias caused by resolution loss, often called the partial volume error (or effect) (14, 15). This is due to a combination of the intrinsic resolution of the PET acquisition (typically leading to 5-mm full-width half-maximum [FWHM] image resolution) and smoothing applied during image reconstruction to suppress noise. In addition, this bias increases as object size decreases, leading to the well-known recovery coefficient curves (14).
Many methods have been proposed for correction of partial volume effects (15), but attempts to recover signal lost in the imaging process are often constrained either by noise amplification (if they aim to restore high spatial frequencies) or the requirement that the exact lesion geometry and the scanner's resolution be known, so that the fraction of the lost signal can be determined. In practice, resolution is often unknown because of its complicated dependence on both user-selected parameters, which vary widely in practice (16–18), and variations in the image reconstruction methods, which are both proprietary and scanner-specific.
Although best-case PET image resolution is on the order of 5-mm FWHM, the final image resolution in practice is typically on the order of 10-mm FWHM or more. This means that a homogeneous spherical lesion would need to be larger than roughly 30 mm in diameter to avoid SUV bias at the lesion's center. For objects <30 mm in diameter, it is not possible to tell if a measured bias is caused by resolution effects, global calibration effects, or a combination of the 2. These effects are illustrated in Figure 1 for the 20-mm sphere. This confounding mix of biases has likely hindered the use of small calibration sources in PET scanning, even though the idea has been proposed anecdotally for several decades.
These biases are important, as both scanner bias and image resolution are prone to vary, particularly in multicenter studies. Both will contribute to increased SUV variance if they are not carefully monitored. This can reduce study power in clinical trials that use SUVs as biomarkers (19).
In this study, we develop and evaluate a “pocket phantom” system using a source small enough to be imaged with a patient, which provides simultaneous estimation of the global bias and the final resolution of the image. “Pocket” connotes the compactness of the phantom—small enough to fit in one's pocket—compared with current quality control phantoms. First we describe the algorithm used to estimate global bias and resolution. We then use simulation and phantom studies to optimize the design and construction of the PET/computed tomography (CT) pocket phantom. We then evaluate the performance of the PET/CT pocket phantom in practice when imaged alongside an anthropomorphic phantom and propose a method for the correction of SUVs in biased images.
Pocket Phantom Estimator Method
The pocket phantom estimation process is summarized in Figure 2. The prototype, shown in Figure 3, borrows some features, including overall geometry, from a CT-specific phantom that also used spherical inclusions to estimate image properties (20). The phantom contains spherical radioactive regions of known size and activity. The phantom active regions contain solid epoxy infused with 68Germanium/68Gallium (68Ge/68Ga). This provides 2 advantages. First, the half-life of 68Ge, which decays to 68Ga, is 271 days. In turn, the 68Ga decays by positron emission with a half-life of 68 minutes. This decay scheme makes 68Ge/68Ga a useful long-lived reference source, with replacement typically needed every 1–2 years. The phantom was manufactured with accurately specified radiotracer quantities that are National Institute of Standards and Technology (NIST)-traceable (21). Second, although a small phantom could be readily filled with 18F in solution, there would be an additional variance added by difficulties of accurate calibration and operator variability in filling the phantom.
The overall algorithm models bias and resolution effects to produce a synthetic PET image from the known phantom geometry. The parameters of this model are then adjusted iteratively to match measured images of the pocket phantom. For the present study, images were converted from scanner-generated Digital Imaging and Communications in Medicine (DICOM) files or variables in MATLAB (MathWorks Inc., Natick, MA) to the Meta-Image format (22). Analysis was performed in VolView (23) and MATLAB.
The imaging system model used here is expressed in equation (1):
In words, the system produces an image that is a blurred and scaled version of the true PET tracer distribution in the phantom. If we can estimate the scale factor and the PSF, we can check for consistency between different imaging centers and in test–retest studies.
The estimator algorithm uses a synthetic sphere image generator function (xi, ri, ρi), where i is the index for a specific sphere. For example, if 1 pocket phantom is used, then there are 3 spheres. The location, radius, and activity for the i-th sphere are given by (xi, ri, ρi). The radius and activity of each sphere are known a priori, and an initial estimate of the location of each sphere is obtained by segmenting the spheres from the CT image. Using the sphere image generator function, the predicted PET image is generated using the following equation:equation (1) is then estimated as the average of the individual sphere scaling factors gi.
Pocket Phantom Design Study
Simulated and measured PET data were used to evaluate the performance of the algorithm. For a range of activity levels and sphere sizes, real and simulated phantom images were multiplied by scalars and smoothed with different filters to simulate variable scanner calibration and reconstruction settings. These tests led to the selection of design parameters for prototype pocket phantoms.
Design Study Using Simulated Data.
As a first test of the estimator algorithm, a synthetic test object containing 2 spheres 15 mm in diameter having an activity concentration of 5 kBq/mL was simulated. Noise-free emission data sets (sinograms) for this object were generated using the University of Washington's ASIM package (24). The detector configuration was modeled after a General Electric Discovery STE PET/CT scanner (General Electric Healthcare, Waukesha, Wisconsin). In MATLAB, the effects of detector parallax (25) and Poisson noise were added. Total detected coincidences were 4.8 × 105 (high noise) and 8.7 × 106 (low noise). Images were reconstructed with a fully 3D ordered-subsets expectation-maximization (OSEM) algorithm (26) or 3D filtered-backprojection (FBP) (27). For all OSEM reconstructions in this work, 4 iterations with 28 subsets were used. Voxel dimensions were 2.73 × 2.73 × 3.27 mm for both OSEM and FBP. Further, 3 Gaussian postreconstruction smoothing filters (transaxial FWHM of 4, 8, and 12 mm) were also applied to the OSEM images. The axial filter FWHM for all OSEM images was 4.6 mm.
The resulting images were then rescaled such that the maximum signal was the same in each, creating images in which a calibration bias and resolution effects were mixed in ways unknown to the algorithm. The algorithm was then used to determine the image resolution parameters (σX, σY, σz). As a check of the algorithm's accuracy, the width of the user-specified postreconstruction filter was calculated by comparison with the PSF from an unfiltered image. This was done by assuming that the intrinsic PSF and filter width added in quadrature, such that σadditional2 = σfiltered2 − σunfiltered2.
Fillable Testbed Phantom.
To estimate the effect of sphere diameter, a cast urethane disc with fillable spheres was constructed (Figure 4). The disc contained 3 spheres at each of 3 diameters (10, 15, and 30 mm) and was scanned on a General Electric Discovery STE PET/CT scanner. 18F-fluorodeoxyglucose (18F-FDG) was used as the radiotracer.
A single solution of 18F-FDG was used to fill all spheres, and the phantom was scanned with all sphere centers in a single transaxial plane. CT-based attenuation correction was performed using a 120-kV CT scan. Acquisitions and reconstructions varied as shown in Table 1. OSEM images were not filtered, while the FBP reconstruction used an 8.2-mm Hanning window. The axial voxel dimension, or slice width, was 3.27 mm for all images. The estimated scale factors gi for all 9 spheres were recorded without averaging, and the bias and variance as a function of size across all 24 parameter sets were evaluated.
|Reconstruction algorithm||OSEM, FBP|
|Transaxial voxel dimension (mm)||2.73, 5.56|
|Detected events (millions)||0.5, 0.8, 1.6|
|Activity concentrations (kBq/mL)||6.0, 32.0|
|Sphere diameter (mm)||10, 15, 30|
Testing of Pocket Phantom Prototypes
Based on the results from the simulated data and fillable phantom, 2 long-lived prototype pocket phantoms were constructed using epoxy infused with 68Ge/68Ga. Each phantom had 3 spheres of 15 mm diameter (Figure 3) in a rectangular 3- × 3- × 12-cm cast urethane block. The activity concentrations of the 3 spheres in the first phantom were 30, 74, and 118 kBq/mL. For the second phantom, the concentrations were 47, 109, and 190 kBq/mL.
The prototype long-lived pocket phantoms were measured alongside an anthropomorphic phantom that contained 3 different concentrations of 18F-FDG radiotracer in 3 regions corresponding to liver, lung, and background. Scan parameters are shown in Table 2. The duration was 5 min and the voxel size was 2.73 × 2.73 × 3.27 mm. The mean signal intensity was measured in regions of interest (ROIs) in the anthropomorphic phantom.
|Postreconstruction transaxial smoothing FWHM (mm)||3, 6, 12|
|Postreconstruction axial smoothing FWHM (mm)||4.6|
|Simulated global scale factor g||0.6, 0.8, 1.0, 1.2, 1.4|
Addition of Known Bias and Smoothing.
To simulate multicenter clinical variability of scanner calibration and image resolution, we systematically varied the global scalar bias and postreconstruction filtering of our measured prototype phantom images. As shown in Table 2, the images had 3 levels of smoothing and 5 scale factors applied. These scale factors were applied after the PET/CT scanner had applied all physical corrections to the data to generate correctly calibrated images. We denote the image for the scan with the j-th applied scale factor and k-th filter width as Ijk(x, y, z), and the estimated scale factor, after averaging over spheres, as gjk.
For each image in our test-space of reconstructions, we generated a bias-corrected image, Icjk, according to equation (3).
Pocket Phantom Data Rescaling.
It is known that scatter and attenuation correction can lead to bias in some solid phantoms (28). Our calculation of the scale factor g was therefore modified to use premeasured pocket phantom image data as a reference. As a test case, the reconstruction with a scale factor of 1.0 and a 12-mm post filter was used as a reference image. Scale factors gi from the spheres in this scan were used as normalization factors to calculate rescaled estimates of the scale factors for the corresponding spheres in all images.
Design Study Results
Profiles through a subset of simulated phantom spheres are shown in Figure 5. The profiles confirm that bias from either resolution losses or global scaling are not unique. In other words, the same recovery coefficient can result from different combinations of global bias and resolution bias.
Table 3 shows the true and estimated values of the applied scale factor and applied transaxial filter width [equation (1)]. Estimates of the filter width in the axial direction, which was 4.6 mm, had a distribution of 4.57 (0.16) mm over all simulated images. The performance of the pocket phantom system was similar over all simulated parameters, including variations in sphere size (data not shown). In other words, the estimator algorithm accurately predicted the applied global scale factor and image smoothing.
i] The “high noise” data correspond to the profile in Figure 5.
Fillable Testbed Phantom.
Figure 6 shows the distribution of gi scale factor estimates for all spheres in the reconstructions listed in Table 1. In some cases, the algorithm returned anomalously low gi values for the 10-mm sphere, indicating algorithm failure for this sphere size. For the 15-mm spheres, gi had a mean of 0.868 (0.025) across all reconstructions. Performance of the algorithm with 30-mm spheres was similar. Bias estimates were stable as the reconstruction method changed. For the 15-mm sphere, gi values were 0.872 (0.036) for OSEM images and 0.869 (0.014) for FBP.
For the 15- and 30-mm spheres, the distribution of resolution estimates are shown in Table 4. Here, reported statistics are over variations in image noise and activity concentration (rows 3 and 4 of Table 1). Changing the transaxial voxel sizes in OSEM images led to changes in transaxial resolution estimates. In the axial direction, for which voxel dimensions were the same for all reconstructions (3.27 mm), the agreement was better, with average estimates from OSEM images agreeing to within 0.8 mm as sphere size and voxel size varied. Resolution estimates from FBP images showed better agreement than OSEM.
Our testing indicated that the 15-mm sphere size was optimal based on its acceptable performance in simulated and physical testing and the ease of manufacturing versus 30-mm spheres in the final phantom.
Pocket Phantom Results
Figure 7 shows the scan configuration and representative data from the pocket phantom prototype measurements acquired with the anthropomorphic chest phantom. This scan roughly represents the intended clinical scan configuration with the pocket phantoms below the patient. The PET images and profile show that the pocket phantom images have excellent signal-to-noise properties and match the magnitude of signal in the anthropomorphic phantom.
Table 5 shows the PET signal measured in images created with the parameters of Table 2 before and after correction by equation (3). Expressed as a percentage of the range midpoint, ranges of mean ROI signal were reduced from 80% in uncorrected images to <5% for corrected ones, indicating that the pocket phantom system successfully compensated for the simulated scanner miscalibration in our test image set.
Figure 8 shows the measured ROI values (AROI) for the pocket phantom spheres after division by known activity concentration. The differing slopes for AROI show the dependence of partial volume effects on the variable image resolution. The square ACal markers represent the ratio of the applied scale factor (Table 2) to the estimated scale factor g. A value of 1 for ACal therefore corresponds to the accurate estimation of bias. After averaging over the 6 pocket phantom spheres in the images, ACal values ranged from 0.95 to 1.06, indicating that the bias-corrected images were accurate to within 6% regardless of the changes in image filtering or global image bias.
As the reconstruction postfilter width varied between 3, 6, and 12 mm, estimates of final transaxial, or transverse, resolution varied as in Table 6. These estimates of final image resolution include effects of both the postfilter and intrinsic PSF. The small standard deviations demonstrate that transverse resolution estimates are stable as global scaling varies. In addition, axial resolution estimates are stable as transverse resolution and global scaling vary.
We have tested and evaluated design parameters for small phantoms that allow the simultaneous estimation of scanner global calibration bias and reconstructed image resolution. We have constructed and tested a prototype phantom on the basis of these results, and have demonstrated the ability of the phantom and software to detect changes in the bias and resolution of measured images. For the prototype phantom, the 15-mm spheres were chosen based on their providing similar performance to the 30mm spheres while allowing the phantom itself to be smaller.
The algorithm succeeded in estimating global bias independently of resolution. In particular, Table 3 shows that the variations of parameters shown in Figure 5 have been successfully separated. Table 5 shows that the range of signal biases in our set of test images was reduced to <5% using the pocket phantom correction factors regardless of changes in the applied postreconstruction smoothing. Further, bias estimates did not show any dependence on the image reconstruction method. The global scale factor for the 15-mm sphere had a coefficient of variation of <3% over all instances of parameter variations shown in Table 1. The agreement of bias estimates for these very different reconstructions suggests that the Gaussian model used by the estimator algorithm can accommodate a range of resolutions and reconstruction methods.
The absolute accuracy of bias estimates is more difficult to evaluate. In the simulated data, for which bias was known, the pocket phantom system found the global scale factor to within 3% of the true value for all resolutions tested (Table 3). In PET/CT measurements of epoxy-based solid phantoms, the PET image value is known to be biased owing to attenuation correction that is not correct for synthetic materials (28). Although our scanner was carefully calibrated, Figure 6 shows global scale factor estimates were generally less than one. ROI measurements of activity in the centers of the largest spheres in the urethane fillable testbed phantom, which were not subject to partial volume effects, showed that this bias was real and not a failure of the algorithm. This prevents us from computing scanner calibration bias directly from the known radiotracer concentration. To correct this problem in our solid prototypes and future work, we have proposed and tested the use of a calibration prescan (see Section Pocket Phantom Data Rescaling.) where the algorithm is precalibrated to compensate for biases in the pocket phantom signal from physical effects such as attenuation and scatter correction. With this method, the impact of scatter and attenuation correction on the pocket phantom is assumed to be constant for a given scanner. The ACal data in Figure 8 show that for our initial tests, the precalibration led to accurate correction of our simulated global image bias.
Unlike calibration bias, resolution effects cannot be easily corrected. Partial volume correction methods have been proposed, but these have been shown to add bias and variance (15, 29). However, if changes in resolution can be detected, this information can help with quality control either for clinical practice or clinical trials in which the quantitative accuracy of PET images is relied upon. For example, in clinical trials, the removal of data with uncontrolled biases, including those due to resolution, can increase the study power even if the sample size decreases (19). In our measured data (Table 6), the pocket phantom system returned estimates that were well separated when resolution was varied, with standard deviations of 0.01 and 0.09 mm for the 3- and 6-mm postreconstruction filtering, respectively. Importantly, these results were stable even when global scaling was varied by up to ±40% (Figure 8).
Currently, efforts to reduce variability in PET mainly consist of accreditation procedures (30) and consensus documents on best practices (31–33). Scanner accreditation often involves “cross calibration,” in which dose calibrator and scanner measurements are required to concur, but this process may not ensure biases are stable over time (13).
Resolution may be addressed by specifying a range of acceptable signal bias for a range of lesion sizes (34) or by requiring visibility of specific features of a given size (30). Methods for quantifying resolution in the literature vary and may involve profiles through FBP images of point sources near the scanner's center (35), ROI signal from multiple sphere sizes in a large calibration phantom (36), or solving for the radial PSF in Fourier space (37). However, we note that none of these methods is compatible with a clinical scan with a patient in the field of view.
With its unique combination of software and manufacturing, the pocket phantom system aims to provide new capabilities in PET quality control. The long-lived phantoms provide a more stable signal than the manually-filled phantoms used in cross calibration. The spherical symmetry of the active regions allows estimates of resolution along 3 independent directions, regardless of the phantom orientation. In particular, the spherical design offers an advantage over line sources, from which axial resolution cannot be estimated. In addition, the software modeling allows the phantoms to be small enough to be scanned with patients, enabling quality control during patient scans.
Future work will address the practical requirements for translating our initial results into a more widely useable quality control system. We have already published the preliminary results on our user-facing software that will make the algorithm available to off-site imagers (38). In addition, a more detailed subsequent analysis of the phantom performance, including the dependence on scan configuration and radiotracer concentrations, will allow us to optimize the protocol for phantom scanning and finalize the manufacturing parameters.
Our study has some limitations. The global bias due to CT-based attenuation correction of the epoxy-based phantom, and the precalibration workaround, have already been discussed. The dependence of resolution estimates on voxel size seen in Table 4 is likely due to the way the model images are downsampled before the smoothing of equation (3). In cases where voxel dimensions approach the resolution, the effect of downsampling may become significant and lead to unreliable resolution estimates. We note that for the more heavily smoothed FBP images, this problem did not occur. Our initial evaluation of the pocket phantom system was limited to a single scanner. Future work will include repeated measurements on different makes and models of scanners.
The pocket phantom system can estimate and correct changes in calibration bias in measured PET images, and it can simultaneously detect changes in the reconstructed image resolution. Over the imaging scenarios tested, the system returned stable estimates of both bias and resolution, as long as voxel size was not too large. This suggests that the pocket phantom system is a viable method for quality assurance in PET, particularly in clinical trials. However, the robustness of the imaging model should be further investigated for multiple imaging systems.