Classification of Colon and Lung Cancer Through Analysis of Histopathology Images Using Deep Learning Models

Classification of Colon and Lung Cancer Through Analysis of Histopathology Images Using Deep Learning Models

Mallela Siva Naga Raju Battula Srinivasa Rao


Corresponding Author Email: 
msivanaga996@gmail.com
Page: 
967-971
|
DOI: 
https://doi.org/10.18280/isi.270613
Received: 
8 August 2022
|
Revised: 
1 October 2022
|
Accepted: 
13 October 2022
|
Available online: 
31 December 2022
| Citation

© 2022 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

In the last four decades, medicine and healthcare have made revolutionary advances. During this time, the true causes of many diseases were discovered and new diagnostic procedures were devised and new remedies were invented. Globally, cancer is one of the serious diseases, which has become a widespread medical issue. A credible and early finding is especially important to reduce the risk of death. In any way, it is a difficult task that relies on the expertise of histopathologists. If a histologist is unprepared, a patient’s life may be put in danger. Deep learning has gotten a lot of attention recently and is being used in medical imaging analysis. Artificial Intelligence (AI) can be used to automate cancer detection. To better classify and for quality improvement of histopathology images, visualization techniques GradCam and SmoothGard are applied. This objective can be achieved by evaluating histopathological images of five types of colon and lung tissues using MobileNetV2 and InceptionResnetV2 models. These proposed models have accurately identified cancer tissues to a maximum of 99.95%. These models will assist medical professionals in the advancement of an automated and authentic system for detecting different types of colon and lung cancers.

Keywords: 

histopathology images, visualization techniques, deep learning models, image processing techniques

1. Introduction

Cancer is caused by the formation of abnormal cells in the body as a result of random mutations. These abnormal cells are divided rapidly and spread throughout the body. It is the second leading cause of death worldwide, after cardiovascular disease. In 2020, approximately 18 million new cancer cases and 9.55 million deaths were reported worldwide [1-3]. The brain, lungs, breasts, liver, rectum, colon, stomach, skin and prostate are the most commonly affected organs in the most common cancers in both men and women [4-6]. Behavioral characteristics like a high BMI, cigarette and alcohol use, physical carcinogens like UV rays, radiation biological and genetic carcinogens are all known to cause cancer [7]. Incredibly, a solution for this issue goes under non-drug or prosperity sciences space advanced greater progression all through ongoing years appeared differently in relation to other sensible and mechanical disciplines. The machine learning has numerous applications in Pathology, ranging from disease diagnosis to intelligent systems that can prescribe traditional drugs based on a patient's symptoms [8]. The latter field is still in its infancy, and much more research is required before such applications may be used in clinical settings. Despite this, it establishes potential perfuses and AI as it can be used in the medical field in the future. Machine learning methods are used in the classification and prediction on variety of biological signals. Deep Learning algorithms (DLs) are used to process images and videos [9]. Deep Learning implements artificial neural networks to improve pattern recognition. Medical diagnosis has clearly taken on a new dimension as a result of AI. This research describes the results of a comparable effort. For innovative classification of five different types of colon and lung tissues, Convolutional Neural Network (CNN) pretrained models are used. The results show that the model can correctly classify lung and colon cancer.

2. Literature Review

In 2018, a method for predicting lung cancer using photos from numerous sources based on glow worm swarm optimization (GSO) is proposed [10], which achieved a maximum accuracy of 97% by using Recurrent Neural Network (RNN). Using a dataset of over 50,500 CT scan images, they developed and tested a CNN-based method for lung cancer detection [11]. The cancerous nodule in the lung was found using ResNet50, a CNN-based learning system [12]. Among the other learning methods like Transfer Learning, ImageNet, MobileNet, Xception and InceptionV3, a method for detecting and categorizing lung cancer stages using computer-aided diagnosis (CAD) is used [13]. The researchers used CNN and DFCNet models to test on six different datasets and developed an RF-based classification algorithm to predict colon cancer using histopathological images [14]. After converting RGB images to HSV plane, wavelet decomposition is used to extract features, which achieved an 85.4% classification accuracy. A Faster RCNN based system was used to detect colon cancer [15]. For classification and regression losses, an approximate joint optimization technique is employed, which achieved 96% accuracy for detecting polyps in colonoscopy images [16]. An automatic cancer screening of CT scan images by researchers in 2019 [17], used smoothing normalization and Wolf heuristic feature selection processes to de-noise the images. By using Discrete AdaBoost optimized and Ensemble Learning Generalized Neural Network (DAELGNN), a classification accuracy of 97% is achieved. FTIR spectroscopic data was used to classify the risk of colon cancer in one study [18], which achieved a classification accuracy of 95.71%. In 2020, a colon cancer diagnosis technique based on CNN-based and ROI feature learning is used. To increase the sample size, Generative Adversarial Networks (GANs) are used to create new CT scan images from the LIDC and IDRI databases, resulting in 93.9% accuracy [19] using CNN-based classification algorithms. On the basis of CT scan images, a light CNN architecture-based lung nodule detection algorithm was implemented [20]. Based on the above references, a system with higher histopathologic images, a greater number of features to extract, enhanced image quality with MobileNetV2 and InceptionV2 models for effective classification of colon and lung cancer is proposed.

3. Visualization Techniques

Activation maps are useful for highlighting important areas of an image where there is a lot of activity. It gives a peace of mind to the users when their work is being done correctly. Class activation and saliency maps are visualized using the GradCam and SmoothGrad visualization techniques. Saliency maps include the SmoothGrad and Vanilla [21]. Class activation maps use gradients of the output layer with respect to the input image to show how the output value changes to its corresponding inputs [22-24]. GradCam, Scorecam and other initiatives do not use gradients, but rely on the penultimate level to retrieve lost space information in dense layers. To classify cancerous and noncancerous images on the LC25000 dataset using the MobileNetV2 and InceptionResnetV2 models, Figure 1 depicts the visualization of colon and lung cancer images using GradCam and SmoothGrad gradient-based class activation and saliency maps. The MobileNetV2 and InceptionResnetV2 models are used to categorize cancerous and non-cancerous images for the colon and lung.

3.1 GradCam

GradCam makes use of target concept gradients that flow into CNN's final convolutional layer.

Calculating the gradient of the class ($y^c$) and the feature map ($A^k$) of a convolutional layer, i.e. $\frac{\partial \mathrm{y}^{\mathrm{c}}}{\partial \mathrm{A}^{\mathrm{k}}}$ for any class c, results in the generation of a localization map $\mathrm{L}_{\text {grad-cam }}{ }^{\mathrm{c}} \in \mathrm{R}^{\mathrm{UXV}}$, where U represents width and V represents height for any class c [25].

$\alpha_k^c=\frac{1}{Z} \Sigma_i \Sigma_j \frac{\partial y^c}{\partial A^k \mathrm{ij}}$       (1)

$L_{\text {grad- }{ }^{ }\text { cam }}{ }^c=R E L U\left(\sum_k \alpha_k^c A^k\right)$     (2)

3.2 SmoothGrad

Vanilla saliency or SmoothGrad can be used to visualize saliency maps. SmoothGrad was used for noisy images. It adds noise to the input image in order to improve the saliency maps.

The final classification class ($x$) is derived after computing class $S_c$ for each class $c \in C$. An input image $x$, the classification class ($x$) is:

$\operatorname{class}(x)=\operatorname{argmax}_{c \in C} S_c(x)$     (3)

Differentiating $S_c$ w.r.t. input $x$ yields the sensitivity map $M_c(x)$.

$M_c(x)=\frac{\partial S_c(x)}{\partial x}$     (4)

The $S_c$ gradient has been shown to cause rapid fluctuations. The neighbourhood average of gradient values was used to improve sensitivity maps after smoothing $\partial S_c$ with a Gaussian kernel. For an input image $x$, the smoothed gradient $\mathrm{M}_{\mathrm{c}}^{(\mathrm{X})}$ is represented by:

$\mathrm{M}_{\mathrm{c}}{ }^{(\mathrm{X})}=\frac{1}{n} \sum^{\stackrel{n}{i}}\left(M_c\left(x+N\left(0, \sigma^2\right)\right)\right.$    (5)

where, $n$ denotes the number of samples, $N\left(0, \sigma^2\right)$ denotes Gaussian noise with standard deviation, $\sigma$ and $M_c$ denotes an unsmoothed gradient.

Figure 1. Visualization of colon and lung cancer histopathology images using GradCam and SmoothGrad

4. Proposed Methodology and Results

The Convolutional Neural Network (CNN) is a Deep Learning system that prioritises various aspects of an input image. It is used to differentiate one image from another based on its characteristics. Two CNN convolutional layers are used in this system. For each convolutional layer, convolutional 2D is used. ReLu activation is used in both of the convolutional 2D layers. Two Dense Layers are used to achieve total connectivity. The first dense layer is activated with ReLu, while the second dense layer is activated with Sigmoid. Figure 2 depicts the architecture of the proposed methodology. Here, MobileNetV2 and InceptionResnetV2 are used to classify colon and lung cancer histopathology images. These two pretrained models are used on 25,000 images of colon and lung cancer dataset. The two CNN pretrained models achieved best classification accuracy. The performance evaluation of InceptionResnetV2 model is 99.56% for precision, 99.56% for recall, 99.65% f1-score and 99.86% of overall accuracy. The performance evaluation of MobileNetV2 is 99.75% for precision, 99.76% for recall, 99.75% f1-score and 99.96% of overall accuracy. Compared to existing methods, these two pretrained models generated best results.

Figure 2. Basic architecture of the proposed methodology

The MobileNetV2 classifier is used to categorize image data. The pooling method is the most important element from the filtered feature map area known as the highest-capacity pooling layer. Max-pooling reduces image dimensionality by reducing the number of pixels in the output. Figure 3 depicts our research model, the max pooling layer.

Figure 3. Architecture for max pooling

This pooling method selects the average element from the feature map filters for the covered region. All values are counted and average pooling is used to pass them to the next layer. Implying that in a thorough computation, all values are used. The proposed research model's foundation is the Average Pooling Layer whose architecture is depicted in Figure 4.

Figure 4. Architecture for average pooling layer

4.1 MobileNetV2 classifier

The first fully convolutional layer of MobileNetV2 model has 32 filters and only 19 bottleneck layers. It is used in classifying images. Two new blocks are available in MobileNetV2. i. A two-stretch downsizing block ii. a residual block of one stride. Each block has three levels. ReLU activation is used in the first layer of 1×1 convolution. With the exception of some nonlinearity, the second layer adds a depth wise and the third layer is also a 1×1 convolution. In the third layer, the ReLu activation mechanism is frequently used. The model's architecture is depicted in Figure 5.

Figure 5. Basic architecture for MobilenetV2

4.2 InceptionResNetV2

The ImageNet database was used to train the InceptionResNetV2 convolutional neural network. Using a 164-layer network, images can be classified into 1,000 different object categories. As a result, the network learns a diverse set of rich feature representations for various images. The network takes a 224×224 image as input and produces a list of estimated class probabilities as output, which is based on the Inception structure as well as the residual link. The Inception-Resnet block combines convolutional filters of varying sizes with residual connections. The use of residual connections eliminate the degradation problem caused by deep structures and also cuts the training time into half. The InceptionResnetv2 network architecture is depicted in Figure 6.

Figure 6. Architecture for InceptionResnetV2

The classification task in this experiment is performed using pre-trained CNN models MobileNetV2 and InceptionResnetV2, which include lung cancer subtype classification and identification of malignant and benign colon histological image classification. Test predictions, which make up 30% of the total data are made to assess the model's performance, Precision, Recall, F1-score, and Accuracy are used. The visual analysis of model performance while training the test, validation data, training loss, validation loss, training accuracy and validation accuracy with respect to epochs plotted is represented. The evaluation metrics for the MobileNetV2 and InceptionResnetV2 models are summarized in Table 1. According to the observed results, these two models have almost achieved 100 percent precision, accuracy, recall, and f1score classification.

Figure 7 and Figure 8 depict the performance estimation of different approaches in terms of evaluation metrics and accuracy. It is the comparison of existing and different strategies in terms of evaluation metrics for histopathology image dataset. The proposed MobileNetV2 and InceptionResnetV2 models achieved maximum percentage of accuracy than existing approaches, which is indicated in the above graph. Finally, the proposed MobileNetV2 and InceptionResnetV2 approach executes more efficiently when compared with existing approaches by obtaining the highest accuracy value of 99.95%.

Figure 7. Performance of evaluation metrics

Figure 8. Accuracy comparison of proposed MobileNetV2 and InceptionResnetV2 models with existing approaches

Table 1. Comparative results with other methods

Cancer type

Models

Precision

Recall

F1-score

Accuracy

Colon

cnn

90.25

74.32

68.52

71.3

Lung

restNet50+svm rbf

93.2

73.5

85.4

79.2

Colon

sc-cnn

82.2

77.6

83.2

80.4

Lung

msrc

88.2

85.2

91.2

87.3

Colon

Resnet50

94.1

95.65

96.22

96.37

Lung & colon

cnn

95.89

96.29

96.32

96.42

Lung

cnn

96.8

96.9

97.02

97.05

Lung

cnn

96.9

97.22

97.22

97.22

Lung

daelgnn

98.3

97.32

97.32

97.6

Lung

EM

95.2

96.8

97.6

97.9

Colon

fasterRcnn

96.5

97.8

96.9

98.32

Colon& lung(proposed)

MobileNetV2 and InceptionResnetV2

99.95

99.8

99.9

99.95

5. Conclusions

In the recent years, cancer has become a more prevalent disease around the world, with an increase in cancer-related mortality rates. According to various studies, colon and lung cancers have the lowest survival rates among cancer cases. Early cancer detection and treatment are critical in this case. This experiment proposed the MobileNetV2 and InceptionResnetV2 models for image-based early detection of lung and colon cancers. For visualization GardCam and SmoothGard were given a new perspective with the proposed approach. As a result, the accuracies of colon and lung cancers were 99.95% and 99.86% respectively. The proposed method has achieved an overall accuracy of 99.95%. In the classification of cancer images, the complementary rule in the set was used. It used GradCam and SmoothGard visualization techniques which contributed to the overall improvement of the performance of the proposed system. The proposed models obtained better results when compared to the existing approaches. The proposed approach will be improved in the future by incorporating structuring techniques along with optimization algorithms for various datasets.

  References

[1] Rawla, P., Sunkara, T., Barsouk, A. (2019). Epidemiology of colorectal cancer: incidence, mortality, survival, and risk factors. Gastroenterology Review/Przegląd Gastroenterologiczny, 14(2): 89-103. https://doi.org/10.5114/pg.2018.81072

[2] Selvaraju, R.R., Cogswell, M., Das, A., Vedantam, R., Parikh, D., Batra, D. (2017). Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, pp. 618-626.

[3] Smilkov, D., Thorat, N., Kim, B., Viégas, F., Wattenberg, M. (2017). Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825. https://arxiv.org/abs/1706.03825

[4] Das, S., Biswas, S., Paul, A., Dey, A. (2018). AI Doctor: An intelligent approach for medical diagnosis. In Industry Interactive Innovations in Science, Engineering and Technology, pp. 173-183. https://doi.org/10.1007/978-981-10-3953-9_17

[5] Selvanambi, R., Natarajan, J., Karuppiah, M., Islam, S.K., Hassan, M.M., Fortino, G. (2020). Lung cancer prediction using higher-order recurrent neural network based on glowworm swarm optimization. Neural Computing and Applications, 32(9): 4373-4386. https://doi.org/10.1007/s00521-018-3824-3

[6] de Carvalho Filho, A.O., Silva, A.C., de Paiva, A.C., Nunes, R.A., Gattass, M. (2018). Classification of patterns of benignity and malignancy based on CT using topology-based phylogenetic diversity index and convolutional neural network. Pattern Recognition, 81: 200-212. https://doi.org/10.1016/j.patcog.2018.03.032

[7] da Nóbrega, R.V.M., Rebouças Filho, P.P., Rodrigues, M.B., da Silva, S.P., Dourado Júnior, C.M., de Albuquerque, V.H.C. (2020). Lung nodule malignancy classification in chest computed tomography images using transfer learning and convolutional neural networks. Neural Computing and Applications, 32(15): 11065-11082. https://doi.org/10.1007/s00521-018-3895-1

[8] Masood, A., Sheng, B., Li, P., Hou, X., Wei, X., Qin, J., Feng, D. (2018). Computer-assisted decision support system in pulmonary cancer detection and stage classification on CT images. Journal of Biomedical Informatics, 79: 117-128. https://doi.org/10.1016/j.jbi.2018.01.005

[9] Babu, T., Gupta, D., Singh, T., Hameed, S. (2018). Colon cancer prediction on different magnified colon biopsy images. In 2018 Tenth International Conference on Advanced Computing (ICoAC), pp. 277-280. https://doi.org/10.1109/ICoAC44903.2018.8939067

[10] Mo, X., Tao, K., Wang, Q., Wang, G. (2018). An efficient approach for polyps detection in endoscopic videos based on faster R-CNN. In 2018 24th International Conference on Pattern Recognition (ICPR), pp. 3929-3934. https://doi.org/10.1109/ICPR.2018.8545174

[11] Urban, G., Tripathi, P., Alkayali, T., Mittal, M., Jalali, F., Karnes, W., Baldi, P. (2018). Deep learning localizes and identifies polyps in real time with 96% accuracy in screening colonoscopy. Gastroenterology, 155(4): 1069-1078. https://doi.org/10.1053/j.gastro.2018.06.037

[12] Akbari, M., Mohrekesh, M., Rafiei, S., Soroushmehr, S. R., Karimi, N., Samavi, S., Najarian, K. (2018). Classification of informative frames in colonoscopy videos using convolutional neural networks with binarized weights. In 2018 40th Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), pp. 65-68. https://doi.org/10.1109/EMBC.2018.8512226

[13] Shakeel, P.M., Tolba, A., Al-Makhadmeh, Z., Jaber, M.M. (2020). Automatic detection of lung cancer from biomedical data set using discrete AdaBoost optimized ensemble learning generalized neural networks. Neural Computing and Applications, 32(3): 777-790. https://doi.org/10.1007/s00521-018-03972-2

[14] Toraman, S., Girgin, M., Üstündağ, B., Türkoğlu, İ. (2019). Classification of the likelihood of colon cancer with machine learning techniques using FTIR signals obtained from plasma. Turkish Journal of Electrical Engineering and Computer Sciences, 27(3): 1765-1779. https://doi.org/10.3906/elk-1801-259

[15] Suresh, S., Mohan, S. (2020). ROI-based feature learning for efficient true positive prediction using convolutional neural network for lung cancer diagnosis. Neural Computing and Applications, 32(20): 15989-16009. https://doi.org/10.1007/s00521-020-04787-w

[16] Masud, M., Muhammad, G., Hossain, M.S., Alhumyani, H., Alshamrani, S.S., Cheikhrouhou, O., Ibrahim, S. (2020). Light deep model for pulmonary nodule detection from CT scan images for mobile devices. Wireless Communications and Mobile Computing, 2020: 8893494. https://doi.org/10.1155/2020/8893494

[17] Shakeel, P.M., Burhanuddin, M.A., Desa, M.I. (2020). Automatic lung cancer detection from CT image using improved deep neural network and ensemble classifier. Neural Computing and Applications, 1-14. https://doi.org/10.1007/s00521-020-04842-6

[18] Andrew, A.B., Marilyn, M., BrannonThomas, L., et al. (2019). Lung and colon cancer histopathological images. arXiv.

[19] Deppen, S.A., Blume, J.D., Kensinger, C.D., Morgan, A.M., Aldrich, M.C., Massion, P.P., Grogan, E.L. (2014). Accuracy of FDG-PET to diagnose lung cancer in areas with infectious lung disease: A meta-analysis. Jama, 312(12): 1227-1236. https://doi.org/10.1001/jama.2014.11488

[20] Sirinukunwattana, K., Raza, S.E.A., Tsang, Y.W., Snead, D.R., Cree, I.A., Rajpoot, N.M. (2016). Locality sensitive deep learning for detection and classification of nuclei in routine colon cancer histology images. IEEE Transactions on Medical Imaging, 35(5): 1196-1206. https://doi.org/10.1109/TMI.2016.2525803

[21] Kuepper, C., Großerueschkamp, F., Kallenbach-Thieltges, A., Mosig, A., Tannapfel, A., Gerwert, K. (2016). Label-free classification of colon cancer grading using infrared spectral histopathology. Faraday Discussions, 187: 105-118.

[22] Challa, R., Rao, K.S. (2021). Hybrid Approach for detection of objects from images using fisher vector and PSO based CNN. Ingénierie des Systèmes d'Information, 26(5): 483-489. https://doi.org/10.18280/isi.260508

[23] Saeed, R.S., Oleiwi, B.K. (2022). A survey of deep learning applications for COVID-19 detection techniques based on medical images. Ingénierie des Systèmes d'Information, 27(3): 399-408. https://doi.org/10.18280/isi.270305

[24] Mangal, S., Chaurasia, A., Khajanchi, A. (2020). Convolution neural networks for diagnosing colon and lung cancer histopathological images. arXiv preprint arXiv:2009.03878. https://arxiv.org/abs/2009.03878

[25] Hatuwal, B.K., Thapa, H.C. (2020). Lung cancer detection using convolutional neural network on histopathological images. Int. J. Comput. Trends Technol, 68(10): 21-24.