Radiomics approaches provide quantitative image features computed from medical images and hold promise for improved computer-aided diagnosis, treatment selection, and response prediction (1–6). Radiomics features belong to 5 broad classes of descriptors, namely, size, shape, intensity, texture, and margin sharpness (7, 8). Recent findings strongly suggest that these image features may hold diagnostic and predictive information, some of which may not be visible to the human eye (3, 9, 10).
Several institutions have independently developed software packages for the generation of radiomics features (7, 11, 12). When different pipelines are run on the same imaging data, features may vary significantly across institutions and pipelines owing to differences in feature definition, software implementations, and/or parameter settings (13). This raises concerns about the reproducibility and repeatability of both the feature computation itself and the subsequent model building (2, 4). Phantoms with known characteristics should prove helpful for standardization across institutions.
Multiple physical phantoms have been developed for analyzing the effect of scanner variation on the reproducibility of quantitative image features (14–16) and for analyzing the computation of specific radiomics features (eg, shape phantoms) (17). However, physical phantoms must be designed for specific experimental questions, are difficult to share, and are subject to the variations introduced by the physical scanning and reconstruction process (eg, different intensity values across different devices).
Several radiomics standardization initiatives have contributed patient cohorts as digital radiomics “phantoms.” Some project teams asked several institutions to calculate radiomics features on a small number of patient images and then compared the computed values across institutions (13, 18, 19). Although results could be compared across pipelines, the underlying ground truth values of these features were unknown. One study used a digital reference object (DRO) with a known standardized uptake value to analyze variations in PET standardized uptake value computation across institutions (20). This study showed the utility of a “ground-truth” value in standardization across institutions.
In addition, while recent work has already shown that radiomics features can be dependent on voxel size, scanner model and acquisition/reconstruction settings, image rotation, and translation (15, 21–25), some features might be dependent on other radiomics features themselves (like object size and shape). One recent phantom design allowed for testing the effect of controlled changes in object size on the calculation of shape features (17). This work revealed that a number of shape features were unstable with respect to changes in volume. Because many classes of radiomics features (intensity, texture, margin, etc.) may also show these interdependencies, phantoms are needed to address these relationships in a controlled, hypothesis-driven fashion.
This paper presents a toolkit for the creation of DROs and a sample collection of DROs made using it for radiomics experiments and illustration. The DROs are mathematically defined and are output as DICOM image stacks with segmentations formatted as DICOM segmentation objects (DSOs). The DROs' mathematical definitions allow the derivation of “theoretical” radiomics values for some radiomics features, thereby allowing the accuracy of radiomics algorithms to be verified. They can be altered along 10 axes of variation to allow for investigations into the stability and robustness of extracted features to controlled variation of object construction. We present the calculation of several theoretical radiomics values for some DROs in our sample collection and compare them to the corresponding features extracted using a particular radiomics pipeline.
Definition of Objects and Software
Definition of Objects.
The toolkit creates DROs of chosen size, shape, intensity, texture, and margin sharpness, sampled and embedded in stacks of 300 512- × 512-Gy value images. As currently implemented, the images have a pixel spacing of 1 mm in the image (x.y) plane, a slice thickness of 1 mm, and a slice spacing of 1 mm. Therefore, voxels are 1 mm3. These image stacks are then saved as DICOM images and segmentation objects (26). These DROs do not attempt to simulate computed tomography (CT), magnetic resonance, or any specific scanner or modality behavior. Instead, we define these objects with known, continuous functions and then sample these functions as images. Each DRO is a variation on a sphere as defined by 10 parameters divided into 5 categories, as described in the following sections.
Shape (5 Parameters).
XYZ Deformation: The scaling of the object along each Cartesian dimension [x,y,z] defined as a scalar multiple of the radius. Therefore, [1,1,1] would be a perfect sphere, while [2,1,1] would be an ellipsoid with twice the radius in the x direction.
Surface Frequency and Amplitude: The frequency (ω) and amplitude (α) of a sinusoidal variation along the surface of the sphere, with frequency unitless and amplitude as a multiple of the radius. Together, in spherical coordinates, the shape parameters can be represented as:
Intensity (1 Parameter).
Mean Intensity: The mean internal intensity () of the object's gray values. Note that the intensity values can be converted to specific intensity values (eg, Hounsfield unit, or HU, for CT) when desired. For the provided objects, intensity values have been scaled to HU. That is, intensities are scaled linearly with air at −1024 and water at 0.
Texture (2 Parameters).
We conceptualize the texture as a 3-dimensional sinusoidal variation of the voxel value. Therefore, 2 texture parameters control aspects of this sinusoidal model.
Texture Wavelength and Amplitude: The wavelength (λ) and amplitude (α) of the sinusoidal variation of the intensity of the image, wavelength in (mm), and amplitude in intensity units. In the case of these objects, these units are scaled to HU. Together, in 3D Cartesian coordinates, we model the texture variation as:
Where μ is the intensity at any given coordinate. Note, as currently implemented, the wavelength is identical in all 3 dimensions. In addition, when α is set to zero, the object will have a uniform internal value of , the mean internal intensity.
Margin Sharpness (1 Parameter).
We conceptualize margin sharpness as the transition from the internal intensity value to the external intensity value. To smoothly transition the internal intensity to the background, we apply a Gaussian blur, with a single parameter, to the object.
Gaussian Standard Deviation: The standard deviation (σ) of the uniform, 3-dimensional Gaussian image blur applied to the image. The larger the standard deviation, the greater the blurring effect. A standard deviation of zero defaults to no blurring. This blur can be modeled continuously as:
Because the function is applied as a filter to the image, it is centered on each pixel. Therefore, the filter blurs not only the edges but also all internal intensity values. To repair the now-blurred internal intensities, we replace all voxels within the original segmentation map with their original values. Therefore, the region outside the object is the filtered version of the image while the interior of the object has the prefiltered intensity values.
Description of Code.
We have implemented a command-line-driven software package that allows the creation of objects exhibiting the parameters as described above. Using a YAML configuration file, all 10 parameters from the 5 categories described above can be set for user-driven generation of new DROs. Users can specify a range of values for each parameter, by providing a minimum, maximum, and number of values. The program will divide the range between the minimum and maximum into the number of requested values at equal intervals (eg, minimum radius of 20, maximum radius of 40, and 3 values produces: radii 20, 30, and 40). If the user desires just 1 value for a given parameter, the minimum and maximum should be equal and the number of values should be 1. If the user provides ranges for n different parameters, the program generates an n-dimensional matrix of DROs for each combination of parameter values. Each point in this matrix corresponds to a unique object produced by the code. For example, if the user requests 3 values for 3 parameters, (33), the toolkit will generate 27 DROs.
To provide unique and relevant names for every generated DRO, the DRO name (saved in the DICOM header “Patient Name” and “Patient ID” tags and as the enclosing folder name) is a “-” separated list of the values of all 10 settable parameters in the same order as listed in the “Definition of Objects” subsection. For example, if a DRO has a radius of 100, an x-, y-, and z-deformation of 1, a shape frequency of 9, a shape amplitude of 0.2, a mean intensity of 100, a texture wavelength of 10, a texture amplitude of 50.0, and a margin sharpness Gaussian standard deviation of 10, then the unique name of the generated object will be Phantom-100.0- 1.0- 1.0- 1.0- 9.0-0.2- 100.0- 10.0- 50.0- 10.0. Each number corresponds to each parameter in order.
Once the DICOM series has been generated, the software then produces a corresponding DSO file for each DICOM series. The DSO file is named with the unique DRO name. The software finally returns 2 zip files, namely, DICOMs and DSOs. The DICOM zip file is a folder of subfolders. Each subfolder is named after a specific DRO and contains the DICOM series for that DRO. The DSO zip file is a folder of files. Each file is the DSO for a specific DRO.
The command line tool is open-source and available for use from this GitHub repository: https://github.com/riipl/dro_cli.
Illustrative Use Cases
Generation of Sample Collection.
To offer out-of-the-box, ready-to-use DROs for immediate studies, we generated a collection of DROs; each DRO consists of a DICOM image series with accompanying DSO. For distribution, we also provide this specific collection of DROs in Neuroimaging Informatics Technology Initiative (NIfTI) format segmentations (27).
For each of the 5 classes of image features (size, shape, intensity, texture, and margin sharpness), we have selected 1 parameter from each class to have 2 values. We chose mean radius, shape variation amplitude, mean intensity, texture variation amplitude, and Gaussian standard deviation. All other parameters are held constant. Table 1 specifies the values we used for all 10 parameters and highlights the 5 parameters that have 2 values. Because each chosen parameter has 2 values, we generated 25 or 32 unique objects. Note that, while all objects have a texture wavelength of 10 mm, Table 1 specifies that the texture amplitude is zero for half of the 32 objects, resulting in 16 objects with uniform intensity.
Calculation of Theoretical Radiomic Values.
To show the use of the theoretical definitions of these phantoms, we generated the theoretical “ground truth” values for a subset of radiomics features. Acknowledging that there are many potential radiomic values to compute, we chose 8 features as defined by the Image Biomarker Standardisation Initiative (IBSI) (8, 28). These features are volume, surface area, 2D diameter, 3D diameter, sphericity, intensity mean, intensity standard deviation, and intensity kurtosis. Online supplemental Appendix 1 specifies their IBSI definitions. All of the theoretical values of the features in our DROs were computed by applying the IBSI definitions of the features to the mathematical definition of the continuous object. We provide the value of each of these features for 3 objects from the collection: uniform sphere with uniform intensity (Table 2 first bold, hereafter named “Uniform” for convenience), uniform sphere with texture variation (Table 2 second bold, hereafter named “Texture Variation”), and nonuniform sphere with uniform intensity (Table 2 third bold, hereafter named “Shape Variation”).
i] While every DRO has 10 parameters, we present just the 5 parameters we varied between 2 values in the sample collection. The unique name of each DRO is generated as described in “Description of Code” subsection. Each parameter is colored by value. Bolded objects: (1) referred to as “Uniform,” (2) Referred to as “Texture Variation,” (3) referred to as “Shape Variation,” in Calculation of Theoretical Radiomics Values.
Comparison of Theoretical Radiomics Values with Output of a Pipeline.
To show the utility of the theoretical values in comparison with pipeline output, we compared the theoretical radiomics values defined above against radiomics values produced by the Stanford Quantitative Image Feature Engine (QIFE) on the 3 DROs described above (7). Online supplemental Appendix 2 gives the configuration file parameters we used for running the Stanford QIFE. See Echegaray et al. (7) for definitions and implementation of all QIFE features.
Description of Sample Collection
The 32 unique objects generated for the sample collection have every combination of the 2 values for the 5 chosen parameters (Table 2). As the colors indicate, there are 16 objects with each value for each parameter. This diversity of objects allows users to explore each parameter in isolation and in the context of other parameter changes. Figure 1 presents 8 objects sampled from the collection. The entire collection is available in zipped folders in the project GitHub repository: https://github.com/riipl/dro_cli. The collection is also available in The Cancer Imaging Archive (https://doi.org/10.7937/t062-8262).
Comparison of Theoretical Computation of “Ground Truth” Radiomics Features to QIFE Calculated Values
We derived the theoretical values for the 8 IBSI-defined radiomics features described above for the Uniform, Texture Variation, and Shape Variation DROs defined above. These derivations serve as a model to researchers interested in comparing their radiomics pipelines to “ground-truth” values. Table 3 compares the derived theoretical radiomics values to those produced by Stanford QIFE. Across all values, there is <10% difference. Excluding surface area and sphericity, there is a <1% difference. Note that for the DROs with uniform intensity, QIFE appropriately returns NaN for kurtosis, as it is undefined.
Furthermore, as the Uniform DRO only differs from the Texture Variation and Shape Variation DROs by 1 parameter each, Table 3 also presents a simple demonstration of how QIFE output reflects changes in individual DRO parameters. For example, increasing the texture amplitude from 0 to 50 HU produces a change in intensity standard deviation and kurtosis but not intensity mean or any shape feature (Table 3). Similarly, increasing the shape variation amplitude from 0% to 20% of the radius leads to changes in all size and shape features but no change in any intensity features (Table 3).
Customizable DROs allow for valuable comparisons of existing pipelines. As presented in McNitt-Gray et al. (19), DROs can be used for large-scale, multi-institutional studies as a tool for comparing radiomics features across many pipelines. We narrow this work by focusing on the benefits of the DRO's theoretical values as a benchmark for features when no additional pipelines are available for comparison. Unlike MD Anderson's Credence Cartridge Radiomics (CCR) phantom (15, 16) or the American College of Radiology CT Phantom (ACR CT) (14, 29), our DROs have theoretically defined values that match closely to computed values from an existing pipeline (Table 3).
Notably, the QIFE calculated surface area at 8% higher than the theoretical value (with sphericity 8% lower as a result) compared with the theoretical value. Limkin et al. analytically computed shape radiomics features from the surface of mathematically specified objects and compared these values to the radiomics features computed from the images of CT scans of their 3D-printed versions (17). In their experiments, the surface area of the images also showed between a 5% and 15% difference from the value computed from the mathematical description. However, many more factors can influence radiomics values for a scanned object, including noise, partial volume, reconstruction parameters including voxel size, etc., which could cause differences even in repeated scans of the same object. Although these results are important, our DROs allow investigation of the accuracy of radiomic pipelines before considering these confounding effects.
One limitation of the theoretical values for our DROs is that they are computed from the continuous definitions of the objects and not the discretized version embedded in image space. For instance, the volume, surface area, and sphericity of a continuous sphere are different from the corresponding values computed for a discretized sphere with the same dimensions. Further work could investigate the impact of discretization (number and size of voxels) on the difference between the theoretical values and the pipeline-computed values. Nonetheless, the general agreement between the theoretical values and the QIFE values for the 8 features we studied confirms the theoretical values' accuracy and utility in checking the output of a radiomics pipeline.
Another limitation of this work is that, although we have scaled the image gray values in HU, these DROs do not attempt to simulate CT noise or any of its known artifacts. Therefore, these DROs could be argued as being “hyperidealized” objects with no relevance to feature computation in clinical contexts. However, their idealized nature is also their benefit because it allows for controlled studies in developing and/or fine-tuning radiomics pipelines and comparing their performance across institutions. In addition, not all radiomics features can be trivially computed from the mathematical definitions of the objects. Many second-order texture features are the main source of disagreement between radiomics implementations (13, 18, 19) but must be computed from the voxel-based embedding of the objects rather than their continuous definitions. Further work could consider theoretical estimations of these kinds of features from the object definitions.
This work presents the implementations of just 10 user-definable image features. We provide the code in this GitHub repository (https://github.com/riipl/dro_cli) with the hope that users will modify and/or implement new features to resolve specific questions of interest. There are many possible extensions of this feature set. For example, as implemented, the sinusoidal texture has equal wavelength in all 3 Cartesian dimensions. Future developers could easily extend the current code to implement different wavelengths in all dimensions allowing for a greater diversity in object textures. Similarly, the sinusoidal shape variation has equal frequency in both angular dimensions. This limitation could easily be removed to develop more complex object shapes. In addition, margin blurring is currently computed by blurring the object and then resetting all intensity values within the original segmentation. More sophisticated methods could attempt to create a smooth transition between the interior and exterior of the object.
We provide a collection of 32 DROs with a large combination of image features of interest. These objects allow for investigation of image features in isolation and in the context of other changes (eg, the impact of object shape on GLCM texture features). Although we only analyzed 3 DROs from this collection, we hope this collection of DROs can serve a diversity of experimental questions and inspire the generation of new DROs using the available command line tool.
The presented DROs address 3 major needs in the radiomics literature. These objects are the first digital, customizable reference objects with some mathematically derivable radiomics features. Not relying on physical phantoms or patient images democratizes experimental design and allows for faster image dissemination and project collaboration. Theoretical values for radiomics features provide pipeline-independent reference values. Finally, the multiparametric customizability of the objects allows for controlled studies of individual radiomics features and their stability to changes in other image features.