Renal angiomyolipoma (AML) containing smooth muscle cells, dysmorphic blood vessels, and adipose tissues (1, 2), is the most common benign solid renal tumor observed in clinical practice (3), most of which can be easily diagnosed by means of conventional computed tomography (CT) and magnetic resonance imaging (MRI) that can detect abundant macroscopic intralesional fat (4–6). Approximately 5% of renal AMLs have too little fat to be detected with either CT or MRI, making it difficult to differentiate fat-poor angiomyolipoma (fp-AML) from renal cell carcinoma (RCC) (7–9), a subtype of AML termed fp-AML or AML without visible fat. Fp-AMLs are the most frequent benign renal masses subjected to unnecessary surgery (10, 11). A variety of methods were proposed to differentiate fp-AML from RCC, such as angular interface, high attenuation of lesions at unenhanced CT, and strongly prolonged enhancement (12–14). However, these imaging findings showed insufficient specificity, inconsistent reproducibility, or inadequate prospective reliability (3, 15). The differentiation between benign and malignant tumors is of essential importance for the decision of proper treatment, but the diagnosis of fp-AML is challenging, time-consuming, and dependent much on the experience of individual radiologists.
As a branch of radiomics, quantitative texture analysis is an emerging technology that extracts and evaluates features from digital images, detects subtle changes and heterogeneity beyond human vision, and provides an objective method by analyzing the intensity, distribution, and relationship of pixel gray levels within a digital image (16). As an objective assessment of lesions, texture analysis assesses tumor heterogeneity and may reflect information about tissue characteristics (17, 18); this method was proven by a number of studies to be a potentially useful biomarker for the diagnosis, therapeutic response, and prognosis of colorectal, lung, esophageal, hepatic, and head and neck cancers (19–24). In recent years, several studies focusing on differentiating fp-AML from RCC have provided new approaches with high accuracy, sensitivity, and specificity by using CT texture analysis (CTTA) and machine learning as noninvasive methods. However, clinical urologists remain unfamiliar with the value of quantitative CTTA and machine learning in differentiating fp-AML from RCC, although CTTA is a promising biomarker. In this paper, the basic concept of texture analysis, its workflow, and application in the differentiation between fp-AML and RCC were provided, and we will discuss its current challenges and future development. This new technique may be beneficial to avoiding unnecessary surgical resection.
Texture Analysis: Basic Concept and Methodology
In general, the basic workflow of CTTA includes image acquisition, segmentation, feature extraction, feature selection, statistical analysis, and classification (Figure 1).
CT Technique and Image Acquisition
Images of patient can be collected by the computerized search of picture archiving and communication systems. Patient characteristics are listed in Table 1, and study characteristics are listed in Table 2. No limits are set on CT image acquisition protocols. The standardization of protocols across medical imaging centers is typically lacking, which, however, is not a problem in the conventional identification of radiologic features used in clinical practice (25). Digital Imaging and Communications in Medicine format images were more popular for storage and analysis by articles that we were interested in, and most of the software available can handle this format of data. CT is the first-line imaging examination for the characterization of renal masses because of its good sensitivity and specificity (26, 27). As a result, CTTA is more convenient for texture analysis of renal masses in clinical practice. It is worth noting that images from different scanners may increase variability in the values of features calculated from CT images, and consideration should be given to striking a balance between the sufficient number of patients and data homogeneity (28).
|Study||Patients||Renal Masses||Age (Year)||Tumor Size (mm)|
|No. of Masses||No. of Fp-AML||Renal Cell Carcinoma||AML||RCC||AML||RCC|
|Hodgdon et al. (5)||100||100||16||51||13||20||0||53 ± 12||59 ± 13||18 ± 13||24 ± 9|
|Takahashi et al. (49)||153||172||24||98||36||14a||53 ± 14||60 ± 12||15 ± 7||21 ± 8|
|Feng et al. (31)||58||58||17||31||2||6||2||48.7 ± 10.8||56.2 ± 12.3||28 ± 9||32 ± 7|
|Cui et al. (30)||168||171||41||82||22||26||0||48.56 ± 12.90||
55.27 ± 11.56 (cc)
49.27 ± 12.99 (p)
55.00 ± 11.80 (ch)
|<4 0||<4 0|
|You et al. (33)||67||67||17||50||0||0||0||47.53 ± 2.76||53.32 ± 1.62||21.06 ± 11.32||24.66 ± 1.14|
|Deng et al. (51)||377||385||31||249||49||56||0||NM||59 ± 13||NM||45 ± 35|
|Varghese et al. (50)||147||147||18||85||23||21||0||NM||NM||NM||NM|
|Yan et al. (3)||48||50||18||18||14||0||0||44.5, range 26–61||
53.9, range 36–79 (cc)
57.6, range34-77 (p)
|28.47 ,range 8–51||
33.22; range, 15–49 (cc)
33.09, range14–51 (p)
|Yang G. et al. (53)||58||58||32||0||0||24||0||50.38 + 8.66||52.88 + 10.86||NM||NM|
|Yang R. et al. (32)||163||163||45||95||10||13||0||48.6 ± 13.7||52.9 ± 13.1||25, range 21–33||29, range 24–33|
|Study Period||Institution||Pathology Method||Processors and Readers|
|Hodgdon et al. (5)||2015||January 2002–August 2013||The Ottwa hospital||Surgical resection||2||3/8||NM|
|Takahashi et al. (49)||2015||January 2003–January 2011||Mayo clinic||Surgical resection||1||13||Yes|
|Feng et al. (31)||2017||June 2013–September 2016||The third Xiangya hospital||Surgical resection||2||7/8||NM|
|Cui et al. (30)||2019||January 2008–September 2017||Jiangmen central hospital||Surgical resection||2||NM||NM|
|You et al. (33)||2019||November 2008–December 2010||Asian medical center||NM||1||6||NM|
|Varghese et al. (50)||2018||June 2009–June 2015||University of Southern California||Surgical resection||1||NM||NM|
|Yan et al. (3)||2015||January 2008–April 2014||Guangdong general hospital||Biopsy or surgical resection||2||16/36||NM|
|Yang G. et al. (53)||2019||June 2009–January 2018||The affiliated hospital of Qingdao university||Surgical resection||2||8/20||NM|
|Deng et al. (51)||2019||October 2005–October 2016||Mayo clinic||NM||1||15||Yes|
|Yang R. et al. (32)||2019||January 2012–December 2018||Guangzhou first people’s hospital||NM||2||3/14||Yes|
Tumor Segmentation and Feature Extraction
Segmentation is of critical importance for images to be analyzed because subsequent feature data are generated from the region of interest (ROI) segmented from surrounding tissues (25). Various kinds of open-source software were developed to compute ROIs. Although this is a time-consuming work and a semiautomated approach was proven to be a quick method that can reduce the interobserver variability (29), most studies chose manual segmentation delineated by experienced radiologists. Notably, ROIs should be drawn at a distance of 2–3 mm from the tumor margin to minimize the partial volume effects of paratumor renal parenchyma and perinephric fat (3, 30, 31). To be specific, some studies delineated lesions on enhanced CT images and applied and adjusted them in other phases to acquire the accurate ROI of each phase or took enhanced CT images as a reference (5, 32).
The core of radiomics is to extract feature data to quantitatively describe the attributes of ROIs. A variety of software packages, commercial and open-source, are available for researchers to extract features from delineated images. MaZda (3, 5), Pyradiomics (30, 32), in-house software MATLAB (33, 34), IBEX (35, 36), and TexRAD (37–39) were used to evaluate quantitative texture parameters in ROIs. Statistical-, model-, and transform-based methods were used for texture analysis; among these, statistical-based ones are the most commonly used to describe the relationship of gray-level values within an image (40).
This kind of features extracted from images is subdivided into 3 types, namely, first-, second-, and higher-order features. Specifically, first-order features evaluate the gray-level distribution from the pixel intensity histogram in an ROI, including mean intensity, skewness entropy, uniformity, threshold, kurtosis, and standard deviation (16). Second-order features focus on the image pattern of the spatial relationship or cooccurrence of pixel values in the ROI, including entropy, contrast, energy, and homogeneity. Gray-level cooccurrence matrix and gray-level run-length matrix are the 2 most common methods (40). Aiming to analyze the relationship between pixels (≥3), higher-order features are less analyzed and used in studies.
Feature Selection, Statistical Analysis, Modeling, and Classification
Features extracted from ROIs may be large in number, and they will not contribute equally and are not even relevant to differentiate fp-AML from RCC. Feature selection is of essential importance to select optimal features and avoid overtraining with a poor outcome. Generally, the number of texture features calculated from images is much larger than the sample size of patients. Hence, the reduction of dimensionality may be important to reduce the risk of type I errors and overfitting (41).
Different from traditional statistical methods, machine learning classifiers are performed to process data. Machine learning can be defined to enable computers to make predictions based on past experience. As a branch of artificial intelligence, it has advanced rapidly in the past decade with the development of computational resource. In some fields other than medicine, such as natural language processing and traffic volume prognosis, machine learning plays a central role. Various kinds of difficult tasks such as diagnosis, prognosis, and response of therapy have been solved using this new technology (42–47). This objective technique has no subjective disadvantage and can process tremendous medical data. Although it helps radiologists deal with complex problems, it has caught few urologists’ attention. The final goal of machine learning in these studies is to obtain an effective diagnostic model including multiple relevant parameters with high accuracy to differentiate fp-AML from RCCs. Support vector machine (SVM) is the most common method in these studies, along with logistic regression (LR), k-nearest neighbors, and random forest. Despite the various methods, the performance of machine learning was always evaluated by receiver operating characteristic curve and accuracy in clinical tasks (48).
The results of univariate and multivariate analyses are listed in Table 3.
|Study||Phases||Segmentation||Extraction||Machine Learning||Discriminative Features||Best Performance of Models|
|Hodgdon et al. (5)||UN||Manually||MaZda, version 4.6||
|Mean gray-level, angular second moment, gray-level entropy, sum entropy, and sum average||88% (LR)||75% (LR)||
|Takahashi et al. (49)||UN CE-CT||NM||Matlab (MathWorks)||LR||Entropy||50%||98%||NM||0.943|
|Feng et al. (31)||UN CMP NP||Manually||
|SVM||Skewness, mean, median, 10th, 25th, 75th, and 90th percentiles (UP), energy and entropy (UN, CMP, and NP)||87.8%||100%||93.9%||0.955|
|Cui et al. (30)||UN CMP NP||Manually||PyRadiomics (version3.6.5)||SVM||NM||89.23%||96.15%||92.69%||0.96|
|You et al. (33)||UN CMP NP EP||Manually||Matlab (MathWorks)||SVM||Mean (UN), SD, homogeneity, dissimilarity, energy, and entropy (CMP)||82%||76%||85%||0.85|
|Varghese et al. (50)||UN CMP NP EP||Manually||Matlab (MathWorks)||LR||NM||NM||NM||NM||0.95−0.98|
|Yan et al. (3)||UN CMP NP||Manually||MaZda, version 4.6||kNN artificial neural classifer||NM||NM||NM||90.7%-100%||NM|
|Yang G. et al. (53)||CMP NP EP||Manually||Radiomics cloud platform V2.1.2||LASSO||NM||93.75%||79.17%||87.5%||0.915|
|Deng et al. (51)||Portal venous phase||Manually||TexRAD, version 3.9||LR||Entropy, maximum positive pixel||33%||97%||NM||0.658|
|Yang R. et al. (32)||UN CMP NP EP||Manually||PyRadiomics||SVM, LR, Random forest, Bagging||90th percentile, mean, median, root mean squared, skewness, IMC1, IMC2, GLN, and SZN||0.83||0.82||0.82||0.90|
i] Abpbreviations: NM, not mentioned; UN, unenhanced; CE, contrast-enhanced; CMP, corticomedullary phase; NP, nephrographic phase; EP, excreory phase; SD, standard deviation; LR, logistic regression; SVM, support vector machine; kNN, k-nearest neighbor; LASSO, least absolute shrinkage and selection operator; SEN, sensitivity; SPE, specificity; ACC, accuracy; AUC, area under curve; IMC1, informational measure of correlation 1 of the GLCM texture feature; GLN, gray-level nonuniformity of the GLSZM texture feature; SZN, size zone nonuniformity of the GLSZM texture feature.
The univariate analysis was performed with traditional statistical method, for both first- and second-order features. Despite the variability in texture analysis, entropy showed promising results in differentiating fp-AML from RCC (5, 31, 33, 49–51). Hodgdon et al. conducted research on unenhanced CT images and claimed that RCC can be characterized by a higher level of entropy than fp-AML (P ≤ .01) (5). Similar result was reported by You et al., who found a higher degree of entropy dissimilarity and a lower degree of energy and homogeneity in clear cell RCC in the corticomedullary phase (33). Deng et al. observed that entropy >5.62 had a high specificity of 85.7% for predicting RCC but has a sensitivity of 31.3% (51). Previous studies suggested that higher lesion entropy was a strong predictor of RCC, and greater entropy was consistently observed in RCC compared with fp-AML. Entropy measured the complexity or disorder of images and represented the heterogeneity of tumors (31). In addition to entropy, RCC was labeled with a higher degree of dissimilarity and a lower level of lesion homogeneity.
Multivariate analysis was performed to accurately differentiate RCC from fp-AML. Machine learning, the core of artificial intelligence, is widely applied to achieve better outcomes and more accurate diagnostic ability, whose goal is to obtain a classifier or a model with high accuracy. LR is a popular classifier in these mathematical models because of its simplicity and popularity with researchers (52). Hodgdon et al. established an LR model by combining the top 3 texture features per session, resulting in an AUC of 0.89 ± 0.04, which was significantly >0.5. The sensitivity and specificity of identifying RCC ranged from 87% to 93% and from 63% to 75%, respectively (5). Yang et al. applied least absolute shrinkage and selection operator LR to develop a 2D texture model (AUC, 0.811; 95% CI, 0.695–0.927) and a 3D texture model (AUC, 0.915; 95% CI, 0.838–0.993), which showed good discrimination and calibration in distinguishing fp-AML from clear cell RCC (53). Takahashi et al. built a LR model with entropy, demographic data, shape features, and subjective heterogeneity factors and differentiated small fp-AML from RCC with a sensitivity and specificity of 50% and 98%, respectively (49). In addition to LR, SVM was widely used in the studies included in this paper. Feng et al. developed an SVM classifier with 11 features selected by the SVM-RFE (SVM with the recursive feature elimination) method and achieved the highest accuracy, sensitivity, specificity, and AUC of 93.9%, 87.8%, 100%, and 0.955, respectively, in differentiating fp-AML from RCC (31). Lee et al. showed that the model comprising relief feature selection and SVM classifier achieved an accuracy, sensitivity, specificity, and AUC of 72.1% ± 4.2%, 71.0% ± 5.1%, 73.2% ± 6.1%, and 0.717 ± 0.045, respectively (54). In addition, k-nearest neighbors, random forest, and nonlinear discriminant analysis were applied and proven to be reliable methods of differentiating fp-AML from RCC (3, 54).
Both unenhanced and enhanced CT images were incorporated into these studies. Hodgdon et al. restricted their study to the analysis of unenhanced CT images for the reason that little literature focused on the effect of iodinated contrast material on texture analysis (5). Textural differences extracted from unenhanced CT images were independent of contrast effects. It happened that a similar case appeared in the article of Cui et al. who found that unenhanced images performed the best in differentiating fp-AML from RCC in single-phase texture analysis and made significant contributions in the 3-phase group (30). In addition, some studies (31–33) found significant differences in univariate analysis during the unenhanced phase (P < .05). Furthermore, the models based on these unenhanced CT images can significantly decrease radiation exposure and benefit patients suffering from renal insufficiency.
Tumor heterogeneity, which is difficult to quantify with traditional imaging methods, was proven to be greater in malignant tumors than benign ones (55). Although the heterogeneity of renal mass is a crucial feature to differentiate RCC from fp-AML (7), the subjective analysis of heterogeneity depends too much on experience of readers and lacks reproducibility. Recent studies suggested that the objective quantification of heterogeneity evaluated by the methods of standard deviation, entropy, and uniformity was of help to differentiate AML from RCC (56), which was consistent with the results of univariate analysis included in this review. Hodgdon et al. claimed that lower lesion homogeneity and higher lesion entropy were biomarkers of RCCs (5). Yang et al. reported the 3 top-ranked texture features extracted from ROIs, and 2 of them were gray-level nonuniformity and size zone nonuniformity, which were markers of tumor homogeneity, showing that fp-AML was more homogenous than the RCC. Similar results were obtained in other articles (31, 33, 50). Tumor heterogeneity is a feature of malignancy, and a lesion with increased heterogeneity is likely related to tumor angiogenesis, cellular infiltration, and areas of necrosis (5, 55). Histological evidence that the inner components of fp-AML appear to be more regular than those of RCC in terms of cell proliferation and less-invasive potential supports these findings (32).
Intralesional fat on CT or MRI is the typical characteristic of AML. However, some tumors contain too little fat to be detected, which makes it difficult to differentiate them from RCCs (57). In the past decade, texture analysis, an emerging brunch of radiomics, has shown promising potential to distinguish between malignant and benign tumors. This kind of new technology has rapidly developed with the increasing digitalization in the hospital and progress in image acquisition protocols, along with easier access to the picture archiving and communication systems. It is an objective approach and automatic extraction of quantitative features from images, which differs from traditional radiology methods depending too much on the subjective visual interpretation and expertise of radiologists and urologists (58). In addition, texture analysis is capable of helping with the diagnosis of both common and rare tumors and even differentiating benign and malignant lymph nodes in patients with primary lung cancer (58–63). In the past 5 years, focus was put on this area to differentiate fp-AML from RCC. The articles included in this review used multiple feature extraction and classification methods, and all achieved relatively satisfactory results (AUC > 0.5).
Despite having a foreseeable optimistic and promising future, texture analysis and machine learning encounter problems to be solved in clinical decision-making (25, 64). The limitation of clinical implementation and use mostly results from the lack of standards and reproducibility. Regarded as one of the foundations for scientific research, reproducibility plays an important role. Nonreproducible consequences waste the time and money of researchers (65). The variety of CT scanners, different methods of delineating ROIs (manually, semiautomatically, automatically), and the inhomogeneity of software (commercial, open-source, or developed in-house) used to extract and process features may be responsible for this issue. Every step in the workflow of texture analysis should be standardized to achieve a better and more convincing outcome. Recently, there have been efforts to standardize the definitions and flow, and studies have been conducted on the reliability and stability of features to enhance the reproducibility (66–69).
Another limitation is the relatively small sample size. Without universal standardized workflows, large centralized data repositories, or image data-sharing methods, researchers always fight their own battle, which may give rise to limited data. Besides, type I error and overfitting may be unavoidable owing to the limited size of samples. It is suggested that statistical corrections such as Holm–Bonferroni sequential correction should be applied and sample size should be 5–10 times of texture features analyzed to reduce these problems (40, 70).
Most studies included were retrospective studies (case–control studies), which were known sometimes to overestimate the sensitivity and specificity of diagnosis and lead to biases (71). Severe biases can produce adverse consequences, such as results with errors and incorrect conclusions. Hence, a well-designed and prospective study should be conducted to clarify the results achieved by the articles focusing on differentiating fp-AML from RCC. Besides, most studies on texture analysis showed only the correlation between features and results, which, however, did not mean causation (72).
Despite the disadvantages that we discussed in the previous section, we can initially give the conclusion that CTTA can be useful for the differentiation of fp-AML from RCC on both unenhanced CT and enhanced CT. Texture features such as entropy that showed promising potential may be regarded as quantitative, noninvasive, and effective imaging biomarkers. Models made by machine learning–based methods performed with open-source software or algorithms with high accuracy are encouraging for the future imaging studies. However, deficiency and limitation exist, and universally accepted standards need to be established. Before the implementation into widespread clinical practice, this kind of new technology requires further validation on a larger scale.