Accurate correction for photon attenuation remains a challenge for quantitative positron emission tomography (PET)/magnetic resonance (MR) imaging. Owing to the absence of transmission imaging during PET/MR imaging, either through computed tomography (CT) or a transmission source, MR-based attenuation correction (MRAC) methods are needed to estimate the pixel-wise photon attenuation coefficients for quantitative PET reconstruction. Many MRAC techniques have been developed over the past decade (1, 2), with dual-echo chemical shift-encoded (2-point Dixon) imaging being used in most commercial PET/MR scanners. In Dixon-based MRAC, a single acquisition yields images that are separated into water and fat components and then assigned Hounsfield units (HU) for air, fat, lung, and water (3, 4). With Dixon-based methods, however, bone is not identified owing to the lack of signal contrast between bone and air (5). While ignoring bone in MRAC appears to have little impact on the diagnostic accuracy of PET/MR imaging (6), it can lead to quantitative PET errors exceeding 20%, depending on the location (7).
Numerous approaches have recently been proposed for localizing bone in MR images, including ultrashort echo time (UTE)–based methods, zero echo time (ZTE)–based methods, atlas-based methods, and PET-only methods (8). In general, these advanced methods have allowed for bone localization with acceptable accuracy, reducing quantitative PET errors to within 5%. However, most advanced MRAC methods have been developed specifically for brain imaging. MRAC outside of the brain, especially in the pelvis, is made much more difficult by the greater variety of tissue types, shapes, and tissue deformation present in the images. The few methods accounting for bone that have been tested outside of the brain include atlas methods (9–11), a UTE-based method tested in the neck (12), and PET-only methods tested in the upper body (13, 14). Despite encouraging results, each method has drawbacks, such as the need for additional dedicated MRAC scanning or reliance on image registration, and only a few methods have been tested in the pelvis. Further developments in MRAC methods for the pelvis are therefore needed.
Deep learning methods, particularly convolutional neural networks (CNNs), have recently achieved remarkable success in performing complex computer vision tasks and are now being adapted into a broad range of medical imaging applications (15, 16). We previously used a convolutional encoder-decoder (CED) network for PET/MR attenuation correction in the brain, where contrast-enhanced T1-weighted MR images were used as network inputs to achieve reconstructed PET errors of ∼1% in the brain (17). We also evaluated the same CED network but with a UTE image as input and using transfer learning to initialize the network weights (18). This method achieved even better results, with PET errors in the brain generally <1%. Recently, Leynes et al. used deep learning to synthesize CT images for PET/MR imaging in the pelvis. The model was trained with both ZTE and Dixon-based images from 10 subjects as inputs with coregistered CT images as ground truth, and significantly improved accuracy was achieved (19). A drawback of this method is the additional scanning time needed for ZTE imaging, which can substantially lengthen the overall scan time for a multibed position acquisition yet has little diagnostic utility. In addition, bowel gas was ignored (filled in with soft tissue HU) owing to the challenge in coregistering reference CT images to MR images. We hypothesize that clinically relevant MR images can be used as input to a deep learning model, eliminating the need for dedicated MRAC sequences such as UTE and ZTE. These sequences can then be used for both diagnostic and attenuation-correction purposes, improving clinical workflow for whole-body imaging, and may be easier to harmonize across scanners and body regions than complex UTE/ZTE imaging.
In this work, we assessed a deep learning–based attenuation-correction method (deepMRAC) in PET/MR imaging of the pelvis that uses diagnostically useful MR images as input and therefore does not require dedicated MRAC imaging. Also, owing to deficiencies that we observed in using CT-to-MR registration to generate reference attenuation maps, we developed a novel method for creating reference attenuation maps for training our network that allows for the segmentation of bowel gas. We evaluated our method's reconstructed PET error relative to the reference attenuation map and the scanner's standard MRAC method in subjects who underwent pelvic PET/MR examinations.
The retrospective study was approved by the institutional review board, and the need for written informed consent was waived. All research was conducted in accordance with the Health Insurance Portability and Accountability Act.
Subjects and Imaging
Subjects were eligible for inclusion in our study if they underwent a clinical 18F-fluorodeoxyglucose (FDG) PET/CT examination and an FDG PET/MR examination at our institution on a 3T Signa PET/MR scanner (GE Healthcare, Waukesha, WI) for evaluation of disease in the pelvis. The PET/MR scan must have been conducted immediately following the PET/CT examination so that a reference CT image was available for analysis. Subjects must also have had T2 MRI, T1 LAVA Flex, and system MRAC images acquired during the PET/MR scan with sufficient transaxial field of view (FOV) to cover the entire pelvis. In total, 18 female subjects (mean age, 63 years; range, 39–89 years), all with cervical cancer, met these criteria and were included in the study. Subjects were injected with 0.14 mCi/kg of FDG according to our institution's PET imaging protocol. Because PET/MR imaging was conducted after a PET/CT scan, PET imaging was conducted an average of 135 ± 25 minutes after radiotracer injection.
For each bed position of the whole-body PET/MR scan, the following images were acquired: PET (3 min/bed position), the system MRAC acquisition using the body transmit–receive coil as well as axial T2 fast recovery fast spin echo, and T1 LAVA Flex using whole-body array coils. In all subjects, gadolinium contrast was administered (gadobenate dimeglumine [MultiHance; Bracco Diagnostics, Princeton, NJ] at a dosage of 0.1 mmol/kg) during a dedicated single bed-position acquisition 17 ± 4 minutes before whole-body imaging, in which case, the contrast was mostly diluted by the time whole-body PET/MR imaging was performed. Each image was inspected and no fat/water swapping was observed in the MRAC acquisition. The following parameters for the T2 acquisition were included: FOV = 500 mm, pixel size = 0.94 × 0.94 mm2, echo time = 87–100 milliseconds, repetition time = 4500 milliseconds, and section thickness = 6 mm. The following parameters for the T1 LAVA Flex were included: FOV = 500 mm, pixel size = 0.94 × 0.94 mm2, TE1/TE2 = 1.35/2.04 milliseconds, repetition time = 5.4 milliseconds, and section thickness = 2 mm.
CT images used in this study were acquired as part of the PET/CT scan occurring immediately before the PET/MR examination. PET/CT scans were acquired on either a GE Discovery 710 or a Discovery VCT PET/CT scanner. CTs were acquired with the following parameters: voltage = 140 kVp, automatic exposure control with a noise index = 25, rotation time = 0.5 second, pitch = 0.516, and section thickness = 5 mm with intersection spacing = 3.27-mm. Note that these CT images were of higher image quality than typical low-dose attenuation-correction CT images, as these are read separately for diagnostic purposes at our institution. Note that it is unclear if typical lose-dose attenuation-correction CT images could have instead been used in the study with equivalent results.
Reference CT Generation
The validation of MRAC methods requires the availability of ground truth attenuation maps representing the anatomy at the time of PET/MR imaging. Nearly all previous studies evaluating MRAC methods have used coregistered CT images as ground truth (9, 19). The pelvis images used in our study posed a challenge for existing multimodal registration algorithms, as there were differences in body positioning (eg, from straight to bent legs, curved to flat couch), bowel gas location (eg, pockets of bowel gas), arm location, and organ location/shape (eg, bladder filling or bowel movement) between the times of MR acquisition and CT. We tested multiple commercial and open-source deformable image registration packages on our pelvis data, including algorithms that were used in previous studies. We found that although some algorithms performed better than others, all had minor and sometimes large soft tissue and bone misregistrations, and none performed well at registering bowel gas (Figure 1). Previous studies have ignored bowel gas by filling in air bubbles with tissue-equivalent CT numbers (19), although the effect of this technique on PET reconstruction error is unknown. We felt these misregistrations could compromise the deep learning network's ability to learn pixel-to-pixel MR-to-CT mapping. For these reasons, we chose to not use image registration as the sole means of obtaining ground truth attenuation maps, but to instead synthesize a reference CT by using a combination of different techniques for different tissue types (fat/water, bone, and air), as illustrated in Figure 2.
Fat and water localization was derived from the subject's MRI images acquired in the same session. The subject's fat-only and water-only images from the LAVA Flex acquisition were converted into a fat fraction image according to the following equation:20). The fat fraction image was then converted into Hounsfield unit (HU) using the following relationship:
Because of the rigidity of bone, we used image registration for bone localization. A subject's CT image was registered to the T2 MR image, first using manual registration followed by deformable registration (to account for differences in rotation/bending and MR spatial distortions). Registrations were performed using the CT-MR registration algorithm in Mirada XD (Mirada, Oxford, UK), which is based on mutual information with radial basis functions. In several cases, automatic registration resulted in inadequate bone registration. In these cases, the degree of deformation in the registration was reduced/smoothed (an option in the software) or the images were manually aligned in the software on the basis of visual assessment. For a few cases in which the previously described steps were deemed insufficient, bone-by-bone registration was performed, where bones were individually segmented and independently registered in Mirada. Following registration, the bones in the pelvic region (pelvis, spine, sacrum, coccyx, and femurs) were then extracted from the registered CT images using an in-house atlas-based CT segmentation algorithm (21). The bones were then transplanted to the reference CT image, resulting in a combined fat, water, and bone synthetic CT image.
Air, including bowel gas, was localized by using an intensity-based threshold on the axial T2 image followed by morphological closing and manual adjustment. The threshold was defined as 2 standard deviations below the mean intensity value of at least 3 regions of interest placed in different muscles. Morphological closing was performed in MATLAB 2016a (MathWorks, Natick, MA) using a 4-voxel-wide structuring element to remove noisy voxels that fell below the threshold. We corrected for differences in MRI intensity occurring between different bed positions by normalizing the images at each bed position by the mean intensity value in the muscle. Additional low-signal tissues that were below the intensity threshold (eg, large blood vessels, certain muscles) and consequently labeled as air were visually identified and manually labeled as tissue-equivalent. Bowel gas regions of interest were then translated to the reference CT image and assigned an HU of −1000. The resulting continuously valued substitute CT image, representing the anatomy at the time of MR acquisition, served as the reference CT (CTref) and was used as the ground truth to evaluate PET reconstruction error.
In this study, we used the open-source multiscale 3D CNN developed by Kamnitsas et al. (22) (Figure 3) as the core of our deepMRAC method. The network, known as DeepMedic, was originally developed to segment lesions in brain MR images. It achieved high rankings in the BraTS 2015 brain tumor segmentation challenge (23) and in the ISLES 2016 ischemic stroke lesion segmentation challenge (24). The network processes an image using an efficient patch-based method on the basis of image segments, but uses two 3D segments per voxel—a large-scale segment for contextual awareness and a small-scale segment for fine detail. The 2 segments are simultaneously processed by 2 independent CNNs, which get combined at the end via fully connected layers. The model has been made available at https://biomedia.doc.ic.ac.uk/software/deepmedic/.
Model Training and Testing
The 18 subjects were randomly split into 12 training subjects and 6 testing subjects. For each subject, the input to the model was the paired T2 and T1 LAVA Flex water-only images stacked as image channels, both of which are acquired for diagnostic use at our institution. The model was trained to produce a discretized version of the CTref (CTref-discrete), where the continuously valued CTref image was converted into a 4-class mask of air, fat, water, and cortical bone for training. The continuously valued CTref was discretized using the following thresholds: air <−200 HU, −200 HU ≤ fat <−20 HU, −20 HU ≤ water < 125 HU, cortical bone ≥ 125 HU. These thresholds were selected because they resulted in discretized classes with visual patterns that were consistent across subjects, which made them easier for the model to learn. Once trained, the model uses the LAVA Flex and T2 images to produce 4-class probability maps with the same dimensions as the input images. Tissue masks are created by assigning each voxel to the tissue with the highest class probability. All images were resampled to 256 × 256 (voxel dimensions = 1.95 × 1.95 × 3 mm3) before training/testing, with the axial FOV cropped to include only the pelvis. Training was performed on batches of 1500 3D image segments of 25 × 25 × 25 using the Theano library. Optimization was performed using RMSprop optimization with a learning rate of 0.001 and a momentum of 0.6 with a multiclass cross entropy loss function. Training was run for 35 epochs on an NVIDIA 1080 Ti graphics processing unit (GPU).
We experimented with various data augmentation techniques, such as enlarging the image by 10% (ie, making the voxel sizes smaller), flipping the images left to right, and rotating the images in various directions by 5°. Overall, results were improved by only flipping the images left to right. Rotation and image zoom were therefore not used during training. We also tested if an ensemble of 3 DeepMedic models trained using the same data and same network but with randomized training order and initial model weights could improve test results. Ensemble methods have gained in popularity and recently placed first in the 2017 BraTS brain segmentation challenge (25). The probability maps for each tissue class were summed across the 3 models, and the voxel was assigned to the class with the highest summed probability. In addition, we tested if training with a single input series (ie, T2-only or LAVA Flex-only) was as successful as training with both input series.
Testing was performed on images from 6 subjects not used in the training phase. The model's output for each of the 6 subjects was compared with the CTref-discrete images using Dice similarity coefficients (DSC) for each tissue class.
PET Reconstruction and Analysis
For the 6 testing subjects, we converted the model's output (4-class mask) into a substitute CT image (CTsub) so it could be used for attenuation correction in PET/MR image reconstruction. This was accomplished by assigning an HU of −1000 to air, −100 to fat, 42 to water, and 300 to cortical bone. Fat and water HU were determined by measuring their mean HU values in acquired CT images. Because of the large variation in HU of cortical bone and its nonlinear impact on attenuation, bone HU was empirically determined by testing several HU values (200, 300, 400, and 500) during reconstruction and finding the value that minimized PET reconstruction error.
PET images were reconstructed using both the continuously valued CTref (ground truth) and CTsub for attenuation and scatter correction. In addition, PET images were reconstructed using the system's default MRAC method on the basis of 2-point Dixon—a method that does not account for bone. Reconstructions were performed using an offline reconstruction toolbox (PET Toolbox, GE Healthcare) with the following PET reconstruction settings: iterations = 4, subsets = 28, transaxial postfilter = 4 mm, FOV = 600 mm, and voxel dimensions = 2.34 × 2.34 × 2.78 mm3. Two PET bed positions were reconstructed, which sufficiently covered the pelvis because of the Signa PET/MR's large axial FOV.
Sixteen FDG-avid soft tissue lesions were identified in the PET images of the 5 subjects on the basis of radiology reports (1 subject had no PET-avid disease), consisting of 7 retroperitoneal lymph nodes, 5 lesions along the vaginal cuff, 1 cervical lesion, 1 ovarian lesion, 1 periaortic lymph node, and 1 pelvis sidewall lymph node. Lesions were contoured using PET Edge in MIM (MIM Software Inc, Cleveland, OH), from which maximum standardized uptake values (SUVmax) and mean SUV (SUVmean) were measured. Errors in PET SUVs were determined for the deepMRAC method and those for the system MRAC by comparing SUVs against those reconstructed with CTref. Because PET SUV errors were both positively and negatively centered around 0, the variances of the distributions of SUV errors were compared using the Brown–Forsythe test, with a P < .05 significance level. We also computed the root mean square error (RMSE) of the entire PET image (excluding low-signal voxels < 500 Bq/mL) according to the methods described by Ouyang et al. (26), in which:
Because deepMRAC produces a discretized CT instead of a continuously valued CT for attenuation correction, we evaluated the effect of using a discrete CT instead of a continuously valued CT for PET attenuation correction. We reconstructed the 6 test subjects' PET images using CTref-discrete and calculated the PET error (RMSEdisrete) and lesion SUV error relative to using CTref.
We also evaluated the impact of ignoring bowel gas on attenuation correction. We filled in bowel gas with tissue equivalence (HU = 42) in the CTref, and we calculated the PET error (RMSEgas) and lesion SUV error relative to using the original CTref (note that RMSE is calculated only for voxels of >500 Bq/mL and therefore ignored the error within the gas pockets).
PET/MR attenuation correction with CTref was also compared against a reference CT generated using deformable registration alone, with the methods and results reported in the online Supplemental Material.
Training of the network took approximately 8 hours on a single graphics processing unit, while inference on a single subject took 30 seconds. Table 1 shows the DSC comparing the tissue masks segmented via DeepMedic and that of CTref-discrete in the testing subjects. When using both T2 and LAVA Flex images as model input, the DSC for fat (0.94 ± 0.01) and water (0.88 ± 0.02) was high. The DSC for cortical bone was slightly lower at 0.79 ± 0.03. The DSC for bowel gas was substantially lower at 0.49 ± 0.17. Using both MRI series as inputs to the model resulted in better DSC for soft tissue, gas, and bone than using only 1 MRI series as input. The ensemble of 3 DeepMedic networks produced, on average, almost identical DSCs as for a single network and, subjectively, did not reduce the presence of small islands of misplaced bone or gas. Figure 4 shows the generated CTsub in comparison to CTref-discrete, the input T2 series, and the system MRAC attenuation-correction map for an example subject.
Although a large majority of the segmented tissue masks closely resembled the ground truth tissue masks, there were a few apparent errors committed by DeepMedic. The most common error was overlooking very thin bones, which were often only 1–2 voxels in width after resampling. This sometimes resulted in discontinuous bone masks. Other errors were more noticeable, including misplacement of bone (occurred in 3 cases, as shown in Figure 5), mistaking parts of the bladder for air (occurred in 1 case), and mistaking MRI ghosting artifacts for tissue (occurred in 1 case). Most of these mistakes were minor and resulted in only 3–15 voxels being incorrectly classified, and had a negligible impact on the resulting PET images (Figure 5).
Figure 6 shows the distribution of SUV errors introduced by using either deepMRAC or the system MRAC in the 16 lesions. The variance of SUV errors in the lesions was significantly smaller when using deepMRAC than when using the system MRAC, for both SUVmax (P = .003) and SUVmean (P = .01). PET images and error maps are shown for an example subject in Figure 7.
The whole-image PET RMSE was found to be 4.9% using deepMRAC and 11.6% when using the system MRAC. This is illustrated for an example subject in Figure 8, where substitute CT images are shown next to their corresponding voxel-wise PET error maps.
The PET error (RMSEdiscrete) introduced by using a discrete CTref-discrete instead of continuously valued CTAC (CTref) was 4.4%. Lesion SUV errors were −3.0 ± 1.3%. When filling bowel gas with tissue equivalence, the resulting PET RMSEgas was 5.9%, and lesion SUV errors were 0.4 ± 3.5%.
PET/MR imaging is a relatively new yet quickly evolving field. The recent surge in PET/MR research has yielded a number of MR-based attenuation-correction methods that have produced impressive results, particularly for brain imaging (8). However, deep learning approaches have certain advantages over other advanced methods. A common criticism of atlas-based methods is the underlying assumption that a subject has a similar overall anatomy as a population model (or similar to one of the atlases for multiatlas approaches). Deep learning only requires that the local image features in a subject resemble features that were observed during training, thus not requiring global similarity in anatomy. This characteristic of deep learning methods also can result in unexpected artifacts (Figure 5), which can likely be mitigated with more and better training data.
Few MRAC methods have been evaluated in PET/MR imaging of the pelvis. Leynes et al. reported on a method based on hybrid ZTE/Dixon imaging, which they evaluated in pelvic images of 6 patients undergoing PET/MR imaging (27). They reported a PET RMSE of 4.18%, which is comparable to deepMRAC's RMSE of 4.9%. In their follow-up study using deep learning with ZTE/Dixon imaging, they reported a PET RMSE of 2.85% (19). A task closely related to MRAC is the use of MR images for radiotherapy treatment planning (MRRT), both requiring the derivation of photon attenuation maps from MR images. Several MRRT planning methods for the pelvis have been reported, including atlas-based methods (28), voxel regression methods (29, 30), and traditional machine learning methods (31). Many of these, however, are not fully automated and require user input (eg, manual bone segmentation) (29, 30). Furthermore, comparison across methods is challenging owing to the different evaluation techniques and definitions. For example, the tissue class of bone consisted of only cortical bone in our method (because bone marrow has attenuation coefficients similar to water), whereas MRRT studies often consider the total bone volume during evaluation (28).
A primary benefit of deepMRAC is that clinically relevant MRI images can be used as input to the model. This eliminates the need for dedicated MRAC acquisitions such as UTE and ZTE that have limited or no diagnostic value and yet can substantially lengthen scan times in multibed acquisitions. For example, a 2-minute ZTE sequence acquired over 6 bed positions (typical for whole-body PET/MR) would add 12 minutes to the overall scan time (27). Furthermore, more conventional MR acquisition sequences are likely to be more compatible between vendors and different scanner types as proprietary algorithms for postprocessing of Dixon sequences and computationally expensive regridding algorithms necessary for non-Cartesian sequences like ZTE would no longer be acquired. Although we used clinically acquired T2 and LAVA Flex images as input to our model, it is likely that other diagnostic sequences could be equally as effective in such a framework. T2 and LAVA T1 images were selected for our model because they were retrospectively available in all subjects and had a sufficiently large FOV. However, future studies are likely necessary to determine which MR acquisitions are optimal for use in deep learning–based MRAC.
Our deepMRAC method used discretized tissue classes instead of continuous voxel-wise HU estimation. Although we have found continuous mapping effective in MRAC for the brain (32), our initial attempts at training a network for continuous HU prediction in the pelvis resulted in poor performance (not shown). This may be because of the greater complexity in learning a continuous mapping function, or perhaps because of our limited data set. It may also be a limitation of using diagnostic MR images as inputs where bone and air have similar signal intensity, unlike in dedicated ZTE or UTE sequences (19). Whatever the cause, we found networks were less prone to error when learning discrete tissue classes; therefore, we used discrete MRACs in this study. Of course, a limitation of using discrete tissue classes is the assumption of uniform electron density in each tissue class, both within and across subjects. This approximation is clearly not factual, especially in bone, but may be sufficiently accurate for use in MRAC given that our SUV errors were <4% and our PET RMSEdiscrete was 4.4% when using CTref-discrete for attenuation correction instead of CTref. And while we determined that assigning cortical bone an HU of 300 minimized the PET errors in our data, different models or body regions may require different HU assignments. In future work, we intend to further explore networks that produce continuously valued outputs, such as we used in a previous study.
A primary limitation in our study is few subjects. On the other hand, this may also be considered a strength of such an approach: surprisingly, few training subjects were needed to achieve impressive results, a phenomenon commonly reported when using deep learning for medical imaging applications (18, 19, 33). It is always uncertain how many training subjects are needed for a given deep learning application, but unlike natural image applications, intersubject variability is often low for medical imaging applications and impressive results can be realized with few training subjects.
A further limitation of our method is the semiautomatic method we used to define bowel gas in the CTref images. We observed that pockets of bowel gas had a range of intensity values (between −400 to −1000 HU) and sizes, which made it challenging to assign the various air pockets to the appropriate class of tissue. Because of this uncertainty, we segmented only the largest and lowest-intensity air pockets. Owing to different degrees of air–tissue contrasts in different subjects (and even different bed positions), our bowel gas segmentations likely suffered from inconsistencies and hence the comparatively low DSC achieved for bowel gas. Previous PET/MR studies have generally ignored the impact of bowel gas on attenuation correction because of these difficulties (also because their methods relied on image registration for ground truth), which led some to replace gas with tissue-equivalent CT numbers (19). Given that the volume of gas present in the abdomen and pelvis can be sizeable, we aimed to include bowel gas. For example, in the 6 test subjects of this study, we found that the volume of bowel gas in the lower abdomen and pelvis (extending from the bottom of the liver to the bottom of the pelvis) was between 5% and 12% (50–130 mL) of the volume of bone in the same region, where we defined air as ≤200 HU and bone as >200 HU. In our sensitivity tests, we showed that while ignoring bowel gas did not create large systematic PET errors (5.9% RMSE), its effect was actually greater than using a discrete CTAC instead of a continuously-valued CTAC (4.4% RMSEgas). In future work, we plan to designate a separate tissue class (or classes) for bowel gas and assess if different MRI sequences can provide better and more consistent contrast for detecting air cavities. In addition, future work should include a direct comparison of our method with that of a UTE/ZTE method and validation of our method using an independent cohort at a separate institution, both of which were not possible with this data set.
We have shown that deep learning–based MRAC in the pelvis using only diagnostic MRI sequences is feasible and improves upon the current commercial solution.
Supplemental Material: http://dx.doi.org/10.18383/j.tom.2018.00016.sup.01