In total, 1,688,780 new cancer cases and 600,920 deaths from cancer are estimated for the United States in 2017 (1). Positron emission tomography (PET) combined with x-ray computed tomography (CT) is a standard component of oncology diagnosis and staging (2–5). Quantitative PET/CT is a valuable tool for assessment of an individual's response to therapy and for clinical trials of novel cancer therapies, because it can measure metabolic changes, which are a better indicator of response than anatomical size changes (6). Success with this approach has been shown using the glucose analogue 18F-fluorodeoxyglucose (FDG) for evaluation of therapy-induced changes in metabolic activity in several studies, including lung cancer (7) and gastrointestinal tumors (8). The thymidine analogue 18F-fluorothymidine (FLT) provides accurate measurements of tumor proliferative activity and can be used to monitor tumor responses to treatment (9). In these cases, PET imaging provides a reliable predictor for treatment responses and patient outcome. This suggests that quantitative PET imaging has an enormous potential to boost the efficiency of evaluating clinical trials of new therapies (10).
However, the use of quantitative PET imaging in clinical trials is hampered by the large degree of variability arising from inconsistent and nonoptimized image acquisition, processing, and analysis (11–14). There are also sources of biological variability, but these are well characterized by test–retest studies to be ∼10% (15–17). In contrast, the additional variability introduced by inconsistent and nonoptimized practice has ranged from 18% (18) to >40% (19–21). Two of the largest PET scanner-dependent errors are calibration (20–22) and variable resolution losses (23). For clinical trials using quantitative PET imaging, the effect of this additional variability is dramatic. For example, as standardized uptake value measurement error increases from 10% to 40%, the number of needed patient scans increases by >10-fold for an effect size of 20% and a study power of 0.8(21) (Table 1).
|Trial Scenario||SUV Error (%)||Sample Size|
|Single Site (Good Calibration)||10||12|
|Multicenter (Good Calibration)||20||42|
|Multicenter (Poor Calibration)||40||158|
Despite this known variability of imaging parameters, there are currently no robust quality assurance techniques for clinical PET images. In this project, we use a PET/CT pocket phantom and associated analysis tools for quality assurance for 2 key image characteristics: the reconstructed image resolution and PET/CT scanner calibration. In our previous publication we introduced the pocket phantom (24) and briefly described web deployment (25).
The purpose of this work is to develop the technology infrastructure for enabling wider deployment and adoption of our calibration phantoms. We introduce an automated phantom detection module, an improved optimization module, and a web deployment module. We have rigorously tested this algorithm. The overarching goal of the project is to offer this automated analysis service as a Software-as-a-Service cloud application.
The patient scans were approved by the Institutional Review Board at the University of Washington Medical Center. Informed consent was obtained from all patients.
The overall imaging workflow is as follows:
A patient or phantom scan is acquired with the pocket phantom(s) in the field of view, but not in the patient's pocket.
Images are uploaded to the analysis software.
Pocket phantoms are automatically detected and their location determined.
Algorithm is run to generate estimates of bias and resolution.
The pocket phantoms used in this workflow, shown in Figure 1, contain spherical radioactive regions. The spheres are 15 mm in diameter and varied in their activity concentrations. To conduct physical experiments, we acquired data with water-filled phantoms containing Fluorine-18 (18F, 110-minute half-life) as well as solid epoxy-filled phantoms that contained germanium-68/gallium-68 (68Ge/68Ga, 271-day half-life).
Phantom Detector Algorithm
We implemented an image analysis algorithm to automatically detect the pocket phantoms in PET scans. The algorithm flowchart is shown in Figure 2. First, we threshold the input image using a minimum activity level. We then reduce the number of disjoint objects using connected component analysis. We then calculate the volume and center of area of the connected components. Based on the known size of the phantom spheres, we filter the connected components based on minimum and maximum threshold radii. Using radius-filtered blobs (typically numbering in dozens, rarely in hundreds), all possible 3-blob combinations are constructed. Calculated centers enable pruning the 3-blob combinations by inappropriate sphere distances and usage of noncollinearity as a combination cost function. The lowest cost combinations are declared “positive detections,” and other combinations that incorporate the same blobs (ie, conflicting combinations) are eliminated.
Imaging Parameter Estimation
We implemented an optimization algorithm to estimate the imaging parameters. The optimization step involves estimating parameters in a simple model of the PET scanner that is given in equation 1. We assumed that the scanner-generated image I(x, y, z) differs from actual activity distribution p(x, y, z) by a global scale factor g, by convolution with the 3-dimensional Gaussian kernel k(σx, σy, σz), and additive noise n(x, y, z). The constant g represents the global scanner sensitivity and is influenced by several physical factors, including the periodic scanner recalibrations. The function k represents the resolution, or point spread function (PSF), attributable to the acquisition and reconstruction, as well as any applied smoothing. It is assumed to be stationary and therefore fully characterized by its widths, namely, σx, σy, σz, which may vary independently.Figure 3). The mean-squared difference between the scanned phantom and the predicted phantom is computed as a cost function. This cost function is dependent on resolution estimates σX, σY, σZ, global scaling coefficients, and noise present in the scanner image σX, σY, σZ.
Our software allows determining σX, σY, σZ separately or a single σ for all axes. Similarly, global scaling can be estimated for each sphere independently, or as a single scaling (Figure 4, left). Internally, the software estimates the bias, which is expressed in kilobecquerels per milliliter instead of scaling factor expressed in percentages, simply because it requires less computation. All the parameters (σX, σY, σZ, biases, sphere centers) that characterize the imaging process are estimated jointly by minimizing the cost function using Powell's conjugate direction optimization method (26) with Brent's line search (27) and golden section search (28). This method minimizes the cost function by doing a line search for each scalar parameter in turn. We made this modification because Powell's method has shown better convergence properties than the Nelder–Mead simplex method (29), which we had used previously (24). The image analysis algorithm was implemented using the 3D Slicer platform (30, 31).
With the intention of supporting multicenter clinical trials, we deployed our image analysis pipeline as a Software-as-a-Service (SaaS) application. In the SaaS model, software is installed centrally and accessed as needed by (geographically) distributed users. This allows easy software improvements and bug fixes, because no intervention by the users is generally needed. Additional benefits include centralized data management, decoupled client-side software, and ease of access through a web browser.
Girder is a Python-based framework for building web applications that store, aggregate, and process scientific data. It is built on CherryPy and MongoDB and exposes data stored on a variety of backend storage engines (eg, Native file system, Amazon S3, GridFS, HDFS) through a unified RESTful API. Girder provides all essential data management functionality such as user/group authentication; fine-grained access control to data; custom metadata association; data provenance; intuitive UI to upload/browse/organize/download data and an extensible plugin framework for building web-based analytics applications. Girder consists of 2 key components, namely, the API layer and the single-page web application that serves as an example of that API's usage. Applications can use and extend the single-page app, while others may simply use the API and write customized user-facing frontends. The Girder framework is tightly integrated with Kitware's open-source tools for configuration and building [CTest, CDash, CMake (34–36)], visualization [VTK (37)] and image analysis [ITK (38, 39)]. Girder provides a unified interface to many distributed storage systems along with access control and extensible plugins.
In the Girder instantiation for this project, we implemented the following 3 core modules (Figure 5):
Data upload module: Quantitative imaging and clinical researchers and other users of the platform will upload their data sets using a DICOM transfer, and a review process will ensure that the data sets have been correctly uploaded (Figure 6).
Server-side image analysis module: The automated phantom detection and localization algorithms are encapsulated in this module.
Result reporting module: This module presents key PET image characteristic findings.
We separated our end-to-end architecture into several modules, each of which can scale independently of the others across any number of required physical machines. There are 2 major decoupled services in the deployment:
The web service for data management, user authentication and management, analysis setup, and displaying of results. This is provided by Girder, which itself can defer to scalable third-party data storage systems such as Amazon S3 or HDFS to securely persist the files it maintains.
The processing service that executes the analysis pipeline on the PET/CT data. Because this is often a computationally intensive task, it is critical that it can be run in parallel on other machines besides the ones serving the web front-end and communicating with the database.
Girder contains plugins that provide support for visualization, metadata extraction, and fast querying of the DICOM file format. The DICOM visualizer plugin allows users to navigate to a DICOM data set and view it section by section, including window and level controls, as well as showing the table of tags for each section. The DICOM metadata extractor plugin automatically inspects DICOM files as they are uploaded into the system and reads the DICOM tags from each file, recording them as structured metadata on the data set. Storing and indexing these metadata fields in Girder's database management system enables users to quickly search among a large collection of DICOM datasets based on the values of specific tags.
The processing service maintains a job queue which can be run in parallel across multiple machines to support load-balancing and minimize the response time of the jobs. The system is designed to be general enough to execute almost any task in a distributed environment by exposing several execution modes, including python scripting, R scripting, and even running arbitrary Docker containers. In our case, each job involves the execution of a Docker container with the analysis modules that can be maintained independently of the other parts of the system, so long as it conforms to a well-defined command-line interface. When finished, the resulting data is uploaded back into Girder for users to view.
Dockerized Slicer Modules
Docker is an open-source project that automates the deployment of applications inside software containers similar to virtual machines. That enables such a container to run on any Linux server, thus eliminating the issues of software libraries and their versioning. To build a dockerized container, one starts with one of the known Linux distributions and installs within it the required libraries. Then the custom software is added and compiled using the compiler suite contained within the container. For ease of use of the container, an entry point (a default program within the container to be invoked) can be specified. Our phantom detection and parameter estimation pipeline in Slicer were implemented in Docker containers.
Using image domain processing, we created test data with 5 combinations of σX, σY, σZ, 5 levels of bias (−30% to +30%) for each of the 3 spheres, 5 levels of noise-to-signal ratio (0% to 40%), and 3 repetitions totaling 9375 experiments. Separately, to characterize the dependence of parameter estimates on mismatches between the algorithm's model and the physical sphere sizes, systematic variations were introduced to the software's expected sphere radii. These tests were conducted with the measured images of water-filled pocket phantoms and with a 20-cm flood phantom in the center of the field of view (Figure 7, left). The scan duration was 15 minutes, and the activity concentration was ∼6 kBq/mL in the pocket phantoms and 4.3 kBq/mL in the flood phantom. The data were acquired on a General Electric Discovery STE PET/CT scanner (Waukesha, WI) and reconstructed in MATLAB (The MathWorks, Natick, MA) using reconstruction code equivalent to the manufacturer's.
Two patients underwent scanning with long-lived prototype pocket phantoms in the GE PET/CT scanner. Four iterations with 28 subsets per iteration were performed during the reconstruction. An 8-mm postreconstruction filter was applied. Time-of-flight information was not used. The scans were acquired separately from the clinically indicated PET scans that the patients were undergoing. The pocket phantoms were placed on the underside of the patient bed and a clinical-duration scan was performed (Figure 7, right). The activity concentrations were ∼13.0 kBq/mL and 3.1 kBq/mL in the 2 phantoms. Images were reconstructed with varying filter widths (3 mm to 9 mm in 1-mm increments). These images were then rescaled to have varying global scale factor between 0.85 and 1.15 in increments of 0.05. The equivalent bias is ±15% in 5% increments. This resulted in a 2-dimensional test-space of image parameters for the optimizer to estimate. The analysis of measured images was done with command-line invocations of detector and parameter optimizer modules generated automatically in MATLAB.
We conducted experiments using synthetic phantom data, physical phantom data and patient data with pocket phantoms. The results are described below.
Detection Algorithm Performance
In 9375 synthetic experiments, the phantom detector had no failures. The parameter estimator had an average relative error below 1% for each parameter. In reality, the spheres would all have the same level of bias and the noise would not approach 40%.
The detector module was tested on real images. All spheres were correctly detected in the test images, including patient images. The phantom detector takes <1 second to run on a typical PET image (matrix size, 256 × 256 × 47).
The phantom detector module can be invoked via Girder's web interface (Figure 6) or from within iMIQ (a customized 3D Slicer application), as shown in Figure 8. Figure 4 shows the parameter optimizer module that implements the algorithm shown in Figure 3. The estimation algorithm can be run individually for each detected sphere, and the corresponding imaging parameters exported and presented to the user in iMIQ.
Imaging Parameter Estimation Algorithm Performance
The parameter estimator takes some 10–30 seconds per 3-sphere phantom. These tests were conducted on an Intel Xeon E3-1220 processor (4 cores @ 3.1 GHz).
Figure 9 plots the cost function on y-axes with varying individual parameters in synthetic and real images. For a given subplot, 1 parameter is varied, while all the others are kept fixed at their minimum value. The y-axes express average voxel difference, in kilobecquerels per milliliter. This average is per-voxel root-mean square error between a realistic predicted image and a scanned PET phantom (Figure 3).
Figure 10 (top) shows that the iterative updates of bias, resolution, and sphere locations cause fluctuations in the value of the cost function. However, the software retains the lowest value achieved in the entire optimization (Figure 10 bottom), which shows good stability well before the algorithm stops.
Figure 11 shows that the mismatches spanning ±5% of the input sphere diameter led to modest changes in the bias and resolution estimates. Resolution estimates varied by <0.5 mm for any single postreconstruction smoothing filter size. Scale factor estimates varied by <0.015 (1.5%) for a single filter and by <0.031 (3.1%) across all filters.
Feasibility Study on Patients
Figure 12 (left) shows that the algorithm successfully ran and accurately estimated the bias in our patient data. Estimates of global scale factor were divided by the known scale factor to show stability (i.e., flatness) of the algorithm's performance as the applied bias and image resolution varied. Data were similar for the second set of patient images (data not shown). Figure 12 (right) shows the estimated resolution over the same test space of images. In contrast to the bias estimates, there is not a simple normalization that can make the plot more easily interpreted visually, but we note that the data show the expected independence of bias factor (with some wobble) and a smooth dependence on filter width as expected.
We have created a working software platform for the pocket phantom system and further tested its use in real-world conditions. By implementing our algorithm with the Girder infrastructure and dockerized containers, we have extended our previous work to create a server-ready automated analysis pipeline that makes the pocket phantom system available to geographically distributed users. The optimizer showed good tolerance of the real-world conditions under which it was tested.
We note that the resolution measurements discussed here may differ from those obtained by other measurement methods that use different phantom geometries. In particular, we expect that our resolution estimates depend on the source positioning in the field of view, the energy of positrons emitted by 68Ge, and the lack of background activity around our spherical sources. These constraints have been previously noted for methods involving solid uniform sources (40). However, we feel that for quality assurance, our measurements may still have value in their ability to detect changes in image resolution, which may indicate unstable biases in the image-based metrics.
The current phantom detection module differs from the previous version in that it operates on the PET image instead of the CT image. During early research, the detection sometimes required user intervention to successfully locate the phantoms. This should not be an undue burden on users, as the effort required was minimal in our experience. We note that because the optimization step in the algorithm refines the estimates of sphere centers, the detection step will not affect the overall effectiveness of the pocket phantom system except in cases of complete failure, which the user can detect visually in the interface.
Figure 9 shows that the cost function has global minima as a function of each variable returned by the algorithm. While this does not guarantee that the algorithm is convergent in the full multidimensional space, Figure 10 suggests that updates to the parameters estimates are small well before the algorithm terminates. The final value of the cost function is dependent upon several factors. Noise in the image means that even for a perfect physical model, the cost function will have some finite value for optimal parameters. In addition, the assumption that the PSF is isotropic in the transaxial plane may not hold due to detector parallax (41), meaning that even a noise-free image would not perfectly match the algorithm's optimal model image.
The algorithm performed with reasonable stability as sphere size mismatch varied as far as ±5% of the known radius. We note that while this test was intended to gauge the algorithm's tolerance to manufacturing variability with fixed “known” sphere size in the algorithm, we have instead used a fixed physical sphere size and varied the value in the software to avoid the need to manufacture 11 phantoms in a precisely wrong fashion. The modest dependence of the estimates on sphere mismatch suggests that the algorithm performs with acceptable accuracy if the spheres are manufactured to within 5% of their nominal radius.
The primary purpose of the patient measurements was to show that the detection and optimization could still run successfully with the additional constraints and challenges that clinical imaging presents. The system performed well on the clinical data, showing stability of global scaling (bias) estimates and smooth dependence on filter width as expected (Figure 12).
The pocket phantom software can also be run locally without the cloud-based architecture described above. This is advantageous where it is not desirable to send patient data off-site, for example, due to legal regulations.
Concluding Remarks and Future Work
The PET pocket phantom system has the potential to reduce PET measurement errors. The phantom and estimation software performed adequately in the tested scenarios and can be made available to sites by our software.
We are currently concluding an investigation into the numerous mathematical and physical effects of the imaging process that may affect the pocket phantom system, and further investigating how the physical phantoms may be reliably manufactured, which is a requisite to the deployment of our system.