Prostate cancer is the most frequently diagnosed noncutaneous cancer in men in the United States, accounting for ∼1 in 5 new cancer diagnoses (1). Increased screening efforts, early aggressive therapy for high-risk disease, and the relatively indolent nature of the disease in most patients have resulted in an overall 5-year survival rate of 99% for organ-confined prostate cancer (1). The current paradigm for prostate cancer diagnosis centers on obtaining tissue diagnosis before definitive therapy, either through conventional 12-core systematic transrectal ultrasound-guided biopsy systems or newer magnetic resonance imaging (MRI)-fusion targeted biopsies. These data are typically combined with clinical information (ie, prostate-specific antigen [PSA], PSA density, and clinical T stage) and implemented into various nomograms to predict disease “risk” status.
While PSA screening has been shown to reduce mortality (2–4), PSA alone has relatively low specificity for prostate cancer diagnosis and is insufficient in stratifying disease risk status, leading to an abundance of low-risk patients undergoing an invasive biopsy (5). The conventional paradigm for prostate cancer diagnosis and staging has been challenged in recent years, with data showing that nontargeted biopsies can lead to under-sampling, inaccurate risk stratification, or missing the target cancer all together (6, 7). As a result, noninvasive imaging with multiparametric MRI (MP-MRI) of the prostate is increasingly being used as a tool for prostate cancer detection, preoperative staging, active surveillance, targeted biopsy, and guidance for definitive focal therapy. Several recent prospective trials have shown that using MP-MRI in the prebiopsy setting to identify target lesions for targeted biopsy outperforms systematic 12-core biopsy, leading to a higher rate of diagnosis for clinically significant cancers and a fewer clinically insignificant cancers (8, 9). The incorporation of MR-guidance, however, requires a radiologist to identify and label potential targets. With the clinical standard of care shifting toward image-guided biopsies, an increased burden will be placed on radiologists to correctly and efficiently identify prostate tumors for biopsy.
A typical clinical prostate MRI protocol includes T2-weighted imaging to delineate structure and zone anatomy, multi b-value diffusion-weighted imaging to identify areas of diffusion restriction, and dynamic contrast-enhanced (DCE) imaging to identify early or contemporaneous focal enhancement. The exams are interpreted by a radiologist according to the PI-RADS v2 (10) that assesses the probability of clinically significant cancer. The lack of specificity inherent to PSA screening and MP-MRI means that a definitive diagnosis still requires a biopsy procedure which often leads to the overtreatment of low-risk prostate cancer (11). Patients who do not necessarily have cancer undergo invasive biopsy procedures to mitigate the uncertainty in the screening tests. The clinical barrier to overcome is to appropriately stratify patients near the boundary between intervention and active surveillance before biopsy. The line between active surveillance and treatment is histologically Gleason 3 vs Gleason 4, which roughly correlates to PI-RADS 3 vs PI-RADS 4 on MP-MRI.
Computer-aided diagnostic tools informed by postoperative tissue may provide an opportunity to address this clinical barrier (12–14). Predictive models made from aligned whole mount tissue and in vivo imaging provide opportunity to bring additional information into image space, increasing the overall specificity of a nonspecific test. Radiomics and machine-learning based approaches have been a great success over the past decade (15–18) and improved sensitivity yet decreasing the specificity. However, there is a need for further improvement of this technology.
Radiomics provides a framework for quantifying tumor microenvironment by analyzing images as a mineable database. In addition, by creating a database of aligned pathology or genetics with clinical imaging, it becomes feasible to find radiologic patterns of tumor phenotype which may provide critical predictive information (19–21). Radiomics-based approaches have seen success over the past decade, proving successful across modalities (22, 23) and organ systems, (24–26) by providing a useful means for engineering features amenable to machine learning approaches.
This study uses an aligned rad-path data set to determine whether unique imaging signatures predict the presence of pathologist-identified prostate cancer. We present a method which learns a distribution of unique image characteristics associated with histologic annotations to create voxel-wise predictive maps on a naïve test set.
Forty-eight patients were recruited prospectively for this institutional review board (IRB)–approved study between June 2014 and February 2017. Written consent was obtained from all patients. Patients' age ranged from 45 to 71 years (mean, 60 years). Inclusion criteria for this study included a scheduled radical prostatectomy and clinical imaging with additional high b value DWI 2 weeks before surgery.
MP-MRI was acquired on a 3 T MRI scanner (General Electric, Waukesha, WI) using an endorectal coil. MP-MRI included field of view (FOV)–optimized and –constrained undistorted single shot (FOCUS) DWI, DCE imaging, and T2-weighted imaging. T2 acquisition parameters were as follows: repetition time (TR) = 3370 milliseconds, FOV = 120 mm, voxel dimensions = 0.23 × 0.23 × 3 mm, acquisition matrix = 512, and slices = 26. Diffusion images were collected with 10 b-values (b = 0, 10, 25, 50, 80, 100, 200, 500, 1000, 2000). The DCE images were collected during injection of a gadolinium contrast agent with acquisition matrix = 256, slices = 25, and FOV = 120 mm. All image contrasts used in this study were acquired axially.
The T2-weighted images were intensity-normalized to the standard deviation within a manually drawn prostate mask (26–29). The B = 0 image was aligned to the T2 using FLIRT (30, 31) and corrected manually if necessary using a freesurfer tool, tkregister2 (surfer.nmr.mgh.harvard.edu). ADC was calculated from 2 combinations of b-value for the purposes of this study, 0 and 1000 and 50 and 2000 (32). The DCE volume with maximal contrast influx was identified using a custom algorithm programmed in Matlab (MathWorks Inc., Natick, MA) and manually aligned to the T2-weighted image using tkregister2. The DCE was intensity-normalized as described above for the T2-weighted images.
Following surgery, prostate samples were sectioned using patient-specific tissue slicing molds created from the presurgical T2 images, as previously published (29, 32). Surface models were created from the manually drawn prostate mask using 3D slicer and subtracted from a template-slicing mold matching the T2 slice spacing using Blender. The slicing molds were then 3D-printed using a fifth-generation Makerbot. Tissue sections were paraffin-embedded, whole-mounted, and stained with hematoxylin-eosin. Slides were digitally scanned using a microscope equipped with an automated stage (Nikon Metrology, Brighton MI). The digitized histology was annotated by a fellowship-trained urologic pathologist (KAI) using color codes corresponding to the Gleason grading system. Annotations were drawn on a Microsoft Surface Pro 4 using a predefined color palette. The annotations were saved as a mask overlaid on the high-resolution histology.
Digitized samples were aligned to the T2-weighted images using custom software previously published (29, 32). Control points were manually placed on analogous points on both the histology and the MRI. A nonlinear transform was then calculated to warp the histology into T2 space using the imwarp command in the Matlab image processing toolbox (MathWorks Inc.). The annotations from our pathologist were likewise transformed into T2 space using the same transform and a nearest-neighbor interpolation. The pathologist annotations in MRI space are referred to as “deep annotations” throughout the manuscript.
Stratification by Tumor Volume
A 3-fold cross-validation approach was chosen to test generalizability. A custom sampling algorithm was required to distribute patients into cohorts with balanced tumor burden. Patient-wise tumor burden was calculated as the sum of the number of pathologist-annotated high grade (Gleason 4-5) and low grade (Gleason 3) voxels. Stratification by high grade and low grade was chosen in lieu of individual Gleason grade owing to the relatively limited amount of grade 5 tumor in the data set. The ideal cohort tumor burden was calculated as the total tumor burden divided by the number of cohorts. Random permutations of patients were created, and the difference between the randomly generated tumor burden and the ideal tumor burden was calculated. A separate error metric is calculated for each cohort (n) and then summed to produce a single error score for that permutation as seen in equation 1:
The cost function was empirically set at 3:1, favoring high-grade tumors; this results in a much larger error score if the high-grade tumor volume is unbalanced. There were a larger number of low-grade annotations than high-grade ones in the data set, and balancing high-grade tumor volume was deemed more important than balancing low grade. The permutation producing the lowest error in 10,000 iterations was used as the group assignment for this study. There are approximately 1021 possible combinations, thus sampling 10,000 limits bias but provides relatively balanced cohorts.
Global Thresholding and Segmentation
Each of the 3 cohorts was used as a test set for an algorithm trained on the other 2 cohorts. Three sets of global thresholds were established; for each cohort a global threshold was established using the 2 unused cohorts (32 patients). The contrasts used in this study were ADC (b = 0 and b = 1000), ADC (b = 50 and b = 2000), T2, and DCE. Global thresholds were created using Otsu's method calculated on all voxels in the manually drawn prostate masks for the entire training cohort of 32 patients; thresholds applied to the test set were not generated from these data (33). Global thresholding tests the assumption that all cohorts represent the same probability density function and additionally removes the constraint that each patient expresses each unique image characteristic. Images were segmented using the calculated thresholds into dark, intermediate, and bright intensities represented by values of 1, 2, and 3 respectively.
Generation of Radiomic Profiles
A unique code was created for each voxel by linearly combining the segmented image contrasts. Radiomic profiles were created by multiplying the segmented contrasts by ascending powers of 10 such that each digit represents the segmentation value of that individual contrast. With 4 image contrasts with 3 color values each a total of 81 radiomic profiles are possible; a voxel encoded with 1133 contains dark ADCshort, dark ADClong, bright T2, and bright DCE. A schematic demonstrating the generation of the radiomic profiles is shown in Figure 1 (26).
Training Set Independence
To determine the training set independence, imaging from an additional 5 patients not included in the previous analysis was processed. Three sets of radiomic profiles were generated using each of the 3 sets of thresholds calculated prior. Each pixel within the prostate mask was analyzed to quantify the overlap among the 3 sets of images. Pixels where the radiomic profile was identical across all images were labelled 3, and pixels where only 2 images matched were labelled as 2. A high overlap score (a large percentage of voxels with a value of 2 or 3) would indicate stable performance regardless of training set.
Gleason Probability Maps Generation
A probability table was generated by analyzing the distribution of each unique image signature within the pathologist-annotated regions. The distribution of each profile in the training set is recorded for low grade, high grade, and benign atrophy (ie, profile 1111 contains benign atrophy 75% of the time). The probability distribution is then propagated to the test set, where each profile is replaced by its respective percentage value, creating 4 maps depicting low grade, high grade, benign, and cancer likelihood. Figure 2 shows the generation of the probability table and Gleason probability maps.
The imaging signatures of prostate cancer are known to be zone-dependent. Lesions are evaluated via the PI-RADS scale using primarily the T2-weighted images in the transition zone and DWI in the peripheral zone. To test the robustness of the Gleason probability maps to tumor location, additional probability tables were created stratified by zone (ie, profile 3333 contains high-grade tumor in 3% of the voxels located in the peripheral zone and 0% of the voxels in the transition zone) and applied to the test set. These new maps were evaluated using a receiver operator characteristic (ROC) and compared to the nonzone-dependent maps.
Evaluation of Gleason Probability Maps and Comparison to Clinical Imaging
The Gleason probability maps were evaluated lesion-wise using a ROC analysis. High grade was compared to all other tissue, cancer to all other tissue, and high grade to low grade. In addition, the intensity normalized image contrasts were evaluated for their ability to distinguish high-grade cancer from all other tissue and high grade from low grade.
Patients were stratified pseudorandomly, attempting to match an empirically determined ideal tumor burden using a custom sampling algorithm. The high-grade tumor and low-grade tumor burdens were, on average, 7.3% and 19.4% off the calculated ideal tumor burden.
Training Set Independence
Radiomic profile maps generated using thresholds from 3 patients cohorts were compared for overlap on data from 5 patients not included in the study. The individual radiomic profile maps and the overlap map can be seen in Figure 3. At least 2 cohorts had identical radiomic profile values on 98.6% of the pixels in the additional subject (red and yellow areas), and all 3 cohorts agreed on 76.3% of voxels (yellow only).
The zone dependence of the technique was tested using a ROC. The resulting AUCs can be seen in Table 1. The algorithms' performance was nearly identical when zone information was included; however, the addition of zone information requires manually drawn zone masks. Figure 4 shows the high-grade Gleason probability map on the same patient with and without zone information included.
Receiver Operator Characteristic
The resulting ROCs can be seen in Table 2 along with the difference between the 2 conditions. The AUC of the clinical contrasts alone ranged from 0.55 to 0.78. The Gleason probability maps achieved an AUC of 0.79, distinguishing cancer from benign atrophy. Figure 5 shows ROCs comparing the Gleason probability maps to the raw image contrasts used in the analysis. Figure 6 shows the resulting Gleason probability maps on 3 true-positive and 1 true-negative case.
|Cancer vs Benign||High Grade vs Low Grade|
|Gleason Probability Map||0.79||0.56|
Image Signatures Unique to Low- and High-Grade Tumors
The 10 most common profiles by volume seen in both high- and low-grade cancer can be seen in Table 3. Seven of the top 10 most frequently seen profiles in high-grade cancer also frequently appear in low-grade cancer. Approximately 13,000 voxels containing high-grade tumor are explained by profile 1122, making it the most frequently seen high-grade image signature. That profile is the fifth most common low-grade profile, but the volume is nearly identical at 12,000 voxels, resulting in a similar probability that a voxel containing 1122 in the test set is high grade or low grade.
|Low Grade||High Grade|
|22 179||2222||13 572||1122|
|17 900||1123||12 146||1112|
The similarity in image characteristics between high- and low-grade tumors occurs independent of lesion size. Normal and benign regions, on average, exhibit image signature 2222. Lesions less than 200 contiguous voxels (<12.5 mm2 in-plane) exhibit an identical image signature—these lesions are indistinguishable from normal tissue. Large lesions exhibit profile 1122 regardless of final Gleason grade.
This study translated a technique developed as a risk stratification model in glioblastoma (26) to identify unique image signatures associated with prostate cancer. The technique outperforms the diagnostic capacity of each of the clinical images individually (Figure 5) and brings histologic data in the form of a learned probability distribution of unique image signatures into image space on naive data. Patients were successfully stratified pseudorandomly into cohorts with roughly equivalent volumes of high- and low-grade prostate cancer. Gleason probability mapping produces nearly identical results independent of training cohort and functions without requiring zone information.
Other prostate radiomics approaches have seen success in detecting prostate cancer using MP-MRI (34, 35). These tools rely either on confirmed pathologically radiologist ROIs or aligned, annotated whole mount slides and frequently combine first-order histogram features with texture and volume features to create a single risk score rather than applying global thresholds. Many volume- and histogram-based features require an a priori ROI in the test set, whereas pixel-wise approaches can provide both disease severity and detection. These tools have trended toward aligned whole mount histology which allows voxel-wise predictions and eliminates the need for a radiologist to manually draw ROIs. Reported AUC varies by technique and ranges from 0.76 to 0.99; however, all published techniques distinguish cancer (G3+) from benign or healthy tissue. Notably, to the best of our knowledge, no technique has been published to date that is capable of distinguishing G3 and G4 patterned lesions, which is where the clinical decision is often made, as it pertains to choosing active surveillance versus definitive therapy. The method introduced in this study, Gleason probability mapping, likewise performs poorly at distinguishing high-grade tumor from low-grade tumor, likely because of the limitations of the imaging techniques themselves. Future studies focused specifically on differentiating G3 from G4+ need to occur.
Transition zone lesions are identified primarily by the T2-weighted images in PI-RADS v2 because benign prostatic hyperplasia nodules often exhibit diffusion restriction and early/contemporaneous enhancement (mimicking cancer) but appear morphologically different from significant cancers, which may not be captured fully by a segmentation-based method. Currently, no prostate CAD tool reads a prostate exam like a radiologist—techniques are either contrast-based and well suited to identify peripheral zone lesions or texture-based and well suited to identify transition zone lesions. It is plausible that the most effective method of identifying prostate tumors distinguished by zone and uses vastly different techniques depending on the lesions location.
There are known sources of error associated with rad-path studies that may reduce accuracy. Our sample includes patients with cancer: other confounding diseases are unlabeled and may thus contribute to error. While we have previously validated our control point–warping technique, there is still error involved in the process. This technique used global thresholds generated from 1 set of acquisitions on similar magnets. Future studies should validate these thresholds on acquisitions from magnets by other manufacturers. This study is limited to endorectal coil images, but future studies may quantify the generalizability to a population imaged without an endorectal coil.
Gleason probability mapping stratifies cancer tissue from normal prostate tissue independent of zone and training set. The technique performs better than traditional image contrasts alone and provides a voxelwise map which may be potentially useful for biopsy guidance and reading clinical scans. Additional research is necessary to further classify regions of tumor among the different Gleason patterns.