ScaledDenseNet: An Efficient Deep Learning Architecture for Skin Lesion Identification

ScaledDenseNet: An Efficient Deep Learning Architecture for Skin Lesion Identification

Bhavana Kanawade* Revati M. Wahul Archana P. Kale Jayshree R. Pansare Parth Patil Mayur Tungar Nikita Verma Anand Tarte

Department of Information Technology, International Institute of Information Technology, S.P. Pune University, Pune 411057, India

Department of Computer Engineering, M. E. S. College of Engineering, S.P. Pune University, Pune 411001, India

Corresponding Author Email: 
bhavkanawade@gmail.com
Page: 
977-983
|
DOI: 
https://doi.org/10.18280/ria.370419
Received: 
20 April 2023
|
Revised: 
15 May 2023
|
Accepted: 
20 May 2023
|
Available online: 
31 August 2023
| Citation

© 2023 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

This research introduces ScaledDenseNet, a proficient deep learning architecture developed for precise identification of skin lesions. The model amalgamates DenseNet with the compound scaling method derived from EfficientNet, thereby achieving enhanced performance without detriment to the speed of inference. A grid search was conducted for the optimization of hyperparameters α, β, and Υ, instrumental in controlling the scaling of network dimensions. For model training and testing, the HAM10000 dataset was utilized, encompassing seven categories of diseases, namely carcinoma, basal cell carcinoma, benign keratosis-like lesions, dermatofibroma, melanoma, melanocytic nevi, and vascular lesions. ScaledDenseNet exhibited a top-3 accuracy of 94.59%. In response to the pronounced class imbalance within the dataset, image resizing was implemented to adjust input resolution based on phi for each architecture. A comparative analysis revealed that ScaledDenseNet surpassed DenseNet-121 and EfficientNet-B0, which achieved top-3 accuracies of 93.618% and 92.078%, respectively. The research methodology entailed a grid search for hyperparameter optimization and an explicit labeling scheme for disease categories, underpinning the study's validity and repeatability. Through its combination of DenseNet's extensive connectivity and the efficiency of the compound scaling method, ScaledDenseNet emerges as a promising tool for automated identification of skin lesions. Its performance underscores its potential applicability in the early detection and diagnosis of diverse skin conditions, marking a significant contribution to advancements in dermatological image analysis.

Keywords: 

dermatology, convolution neural network, ScaledDenseNet, skin disease prediction

1. Introduction

Skin lesions can arise from various causes, including allergies, fungi, parasites, bacteria, viruses, genetic factors, weakened immune system, and microorganisms residing on the skin. They can be categorized into short-term skin issues and chronic skin diseases, with examples ranging from vitiligo and impetigo to melanoma and psoriasis. Early detection and prediction of skin lesions are crucial in minimizing their severity and preventing complications like skin cancer. However, common individuals often lack knowledge about different skin lesions and their severity, making accurate diagnosis and treatment challenging.

Automated methods for detecting and predicting skin lesions using dermatoscopic images can enhance disease prediction accuracy and aid dermatologists in making informed decisions. Techniques such as decision trees, artificial neural networks, support vector machines, and ensemble classification have been used for this purpose, relying on feature extraction and trained classifier models. However, these approaches have limitations in terms of interpretability, dataset requirements, feature engineering, sensitivity to noise, and limited generalization. To address these limitations, this study introduces ScaledDenseNet, a deep convolutional neural network model designed to automatically identify skin lesions from clinical skin images. ScaledDenseNet is inspired by EfficientNet and DenseNet architectures, incorporating innovative concepts such as compound scaling and feature reuse. EfficientNet's compound scaling approach allows for efficient scaling of network dimensions while maintaining performance, while DenseNet's feature reuse enables deep layer connections without the vanishing gradient problem.

By combining these ideas, ScaledDenseNet aims to effectively classify skin lesions. The proposed architecture scales the dense blocks, the number of kernels, and the input resolution of DenseNet while obtaining optimal values for hyperparameters α, β, and Υ for efficient scaling. However, network-based approaches in skin disease classification face limitations such as interpretability, dataset requirements, feature engineering, noise sensitivity, computational complexity, and limited generalization.

Overcoming these challenges necessitates ongoing research and development of techniques like transfer learning, active learning, model explainability, and uncertainty estimation. By addressing these limitations, ScaledDenseNet and future advancements in network-based approaches can enhance the reliability and practicality of skin lesion detection and prediction, ultimately reducing treatment time, financial burden, and patient suffering associated with late-stage diagnoses.

2. Related Work

CNN has been at the forefront in the field of object recognition and image classification since the creation of alexnet. The feature matrix is downsampled via max pooling after numerous convolutional layers are used to extract the features. Additionally, it makes use of overlapping pooling for improved feature extraction during downsampling and ReLU nonlinearity for effective training [1]. There have been numerous attempts to improve convolutional neural net accuracy by deepening it [2-6]. Increasing network depth was a priority for the VGG network. There was a total of 6 network architectures suggested. The accuracy findings were clearly improved by scaling the network depth [2].

Multiple networks have used the concept of inception in image classification. GoogleLeNet uses inception to achieve sparse connectivity. Inception also helps in reducing training complexity as only a part of the network is trained and reused. GoogleLeNet also acknowledges that increasing network depth is the most straightforward way to increase network accuracy. Inception net uses 1x1 convolutions as a way of dimensionality reduction and reducing training complexity. It uses inception modules, which are layers of parallel convolutional layers [3].

Simply increasing the network depth leads to training inefficiencies as well as a major roadblock known as the vanishing gradient problem. Due to the nature of the gradient descent algorithm, increasing network depth will lead to smaller gradients being passed down to the previous layers. As the gradient approaches zero, the layer will completely stop training. To overcome this problem, various residual networks have been proposed [4, 5].

ResNet uses identity mapping to overcome vanishing gradients. This allows the network to reach very deep. Identity mappings or residual mappings are also called skip connections or shortcut connections. In ResNet, the shortcut connections are just the identity mappings between the input and output of a layer. Going deeper allowed for an increase in the accuracy of the classification task [4].

Similar to ResNet, DenseNet also uses the concept of reusing features. DenseNet consists of multiple dense blocks and transition layers in between these dense blocks. In a dense block, the output of each layer is connected to the input of every other successive layer in the block. In order to avoid dimensional discrepancy, downsampling is only done after a dense block. This allows the network to go deeper without vanishing gradient. Feature reuse certainly helps in achieving better accuracy compared to residual mapping [5]. Increasing the network depth, width and resolution helps in increasing accuracy of the classification task. However, there hasn't been a clear method to bind these 3 factors together. Google’s EfficientNet introduces the concept of compound scaling to increase network depth, width, and resolution simultaneously. Increasing one of these parameters quickly leads to the saturation of results where the accuracy does not increase anymore. Compound scaling overcomes this by simultaneously being more efficient [6]. There have been attempts at using CNNs to predict skin disease, particularly, skin cancer, using histologic images. Höhn et al. [7, 8] used patient metadata such as gender, lesion location, age, etc in combination with images to predict the chances of a patient having a particular type of disease. For unbalanced types of data, metadata increased the accuracy. In both cases, two types of methods to combine metadata were used. One method was to concatenate the metadata vector to the feature vector of the input image before the classification layer. Another was to scale the feature vector using a metadata vector. The latter turned out to be better than the former method. Höhn et al. [7] used ResNext50 as the feature extraction CNN. Li et al. [8] tested the metadata combination method on multiple state-of-the-art CNNs. Kaur et al. [9] proposed a new CNN architecture that uses multiple parallel blocks of convolutional layers. The network has 11 primary blocks. Each block uses normalisation to improve accuracy and leaky ReLU to overcome vanishing gradient problems. The dataset was obtained from ISIC 2016 and ISIC 2017 challenges. The network classifies melanocytic lesions and benign lesions. Various data augmentation techniques such as rotation, scaling, etc. were used to overcome the imbalanced training dataset problem. Raza et al. [10] used ensemble technique for the classification of skin cancer. A stack-based approach for ensembling is proposed over the traditional approaches. The ensembled networks are trained using transfer learning before ensembling. VGG16, Xception, InceptionResnetV2, DenseNet121, DenseNet169, and DenseNet210 are fine-tuned for melanoma classification. Nithya Anoo et al. [11] proposed a new CNN architecture which has alternating convolutional and max pool layers. ALEnezi [12] used a pretrained AlexNet for feature extraction and SVM for classification of 4 types of skin diseases. Kumar and Kumar [13] used a machine learning approach to the feature extraction problem. The input images are first segmented using the active contour method and the features are extracted using mathematical equations to identify brightness, colour, lesion size, etc. The classification was done using an ANN. Mahbod et al. [14] used various pre-processing techniques such as colour standardisation and normalisation for better feature extraction. Various state-of-the art neural networks were fine-tuned for the melanoma classification task. An ensemble of these networks was used to classify the skin cancer images. Various machine learning techniques have been used for classification tasks as well. Pathan et al. [15] used color features of the dermatoscopic images as features for the classification problem. A total 15 features were extracted from each image. Based on these 15 features, the images were classified using a decision tree algorithm.

Methods explained as above are either using an ensemble of networks or baseline network architecture for the purpose of classification of dermatoscopic images. However, ensemble methods are multiple times more computationally intensive than plain baseline networks. The accuracy gains from ensemble methods are marginal. Most baseline networks are also limited by the fact that they only scale a single dimension, i.e., network depth. This makes the result saturate quickly when increasing the network depth. The proposed architecture makes use of efficient compound scaling and DenseNets property of feature reuse for efficient but an effective classification model.

3. Datasets

3.1 Equations

The HAM10000 dataset consists of a total of 10015 dermatoscopic images. All images have a resolution of 650x400 pixels. The dataset consists of 7 classes of skin-pigmented lesions. The lesions are largely confirmed through histopathology, expert consensus, or follow-up examination [16].

3.2 Data transformation

The images in HAM10000 dataset are 650x400 pixels in dimension. The images are resized to the appropriate input resolution for training. The input resolution varies based on the value of phi for each of the proposed architectures. The HAM10000 dataset also has a high-class imbalance.      

3.3 Dataset description

The HAM10000 dataset consists of a total of 10015 dermatoscopic images. All images have a resolution of 650x400 pixels. The dataset consists of 7 classes of skin-pigmented lesions. The lesions are largely confirmed through histopathology, expert consensus, or a follow-up examination [16]. The distribution of classes within the dataset is presented in Table 1.

Table 1. Disease identification: Images and names of common diseases

Image of Lesion

Type of Lesion

Number of Images

benign keratosis-like lesions (bkl)

1099

melanocytic nevi (nv)

6705

dermatofibroma (df)

115

melanoma (mel)

1113

vascular lesions (vasc)

142

basal cell carcinoma (bcc)

514

carcinoma (akeic)

327

 

Total

10015

3.4 Label encoding
The dataset is labelled into 7 different categories:
  1. carcinoma (akeic)
  2. basal cell carcinoma (bcc)
  3. benign keratosis-like lesions (bkl)
  4. dermatofibroma (df)
  5. melanoma (mel)
  6. melanocytic nevi (nv)
  7. vascular lesions (vasc)
4. System Architecture

4.1 System design

Datasets are divided as training and testing categories. Then, the training datasets were pre-processed by resizing the images to appropriate dimensions. Pre-processing training data is passed to proposed architecture. The proposed architecture trained on training data and classified each image to proper classes according to accuracy. After completing the model's training process, the proposed model was tested by passing images from the test data to check whether each image shows the correct class. The procedures for the Model ScaledDenseNet are presented in Figure 1.

Figure 1. Model ScaledDenseNet architecture procedure diagram

4.2 Compound scaling

Compound scaling, introduced through the EfficientNet concept, offers several benefits for skin disease classification. It balances the scaling of network depth, width, and input resolution, resulting in improved classification performance. Scaling the network depth enables the capture of complex patterns and prevents overfitting. Increasing network width enhances the model's ability to learn and model variations in the data. Scaling the input resolution allows for capturing finer-grained features and textures. By simultaneously scaling these dimensions using a compound scaling coefficient, the classification process is optimized, leading to a more accurate and reliable skin disease classification.

depth: $d=\alpha^\phi$    (1)

width: $w=\beta^\phi$    (2)

resolution: $r=\gamma^\phi$    (3)

The values of α, β, and γ are hyperparameters and ϕ is a user-defined scaling factor.

The increase in the number of operations due to scaling will be proportional to α×β2×γ2 [6].

The value of hyperparameters α, β, and γ is limited such that α×β2×γ2 is less than or equal to 2.

4.3 Dense blocks

The dense block is a concept introduced in DenseNet architecture. Dense blocks are used to overcome the vanishing gradient problem. Successive convolution layers in dense blocks take input from each of the previous layers. It establishes direct connections between layers, preserving information and gradients, leading to efficient gradient flow and improved learning. The dense connections enable extensive feature sharing, allowing the network to capture both low-level and high-level features. The information is preserved due to this. Dense blocks also use compression factor ө for parameter reduction. The value of compression factor ө lies between 0 and 1. The compression factor reduces the number of output layers on successive layers. Dense blocks also have a bottleneck layer before the actual convolution layer to reduce the number of features. Table 2 presents a detailed view of the architecture of ScaleDenseNet when ϕ=3.

Table 2. Architecture of ScaledDenseNet (ϕ=3)

Layers

Output Size

Layer Size

Number of Kernels

Input

160x160

-

-

Convolution

80x80

7x7 conv

67

Pooling

40x40

3x3 max pool

-

Scaled Dense Block X 3

40x40

1x1 conv

134

3x3 conv

33

Transition Layer I

20x20

-

-

Scaled Dense Block X 6

20x20

1x1 conv

134

3x3 conv

33

Transition Layer II

10x10

-

-

Scaled Dense Block X 13

 

10x10

1x1 conv

134

3x3 conv

33

Transition Layer III

5x5

-

303

Scaled Dense Block X 9

5x5

1x1 conv

134

3x3 conv

33

Global Average Pool

1x1

5x5 Global Avg Pool

-

Classification

 

7D fully connected

 

Table 3. Transition layers of ScaledDenseNet (ϕ=3)

Layers

Layer Size

Number of Kernals

Transition Layer I

1x1 conv

87

2x2 Avg Pool

-

Transition Layer II

1x1 conv

149

2x2 Avg Pool

-

Transition Layer III

1x1 conv

303

2x2 Avg Pool

-

Figure 2. ScaledDenseNet block diagram

4.4 Transition layers

This is also a concept introduced in DenseNet. Since the feature map size should not change across the dense block, the downsampling is done in the transition layer by using average pooling layers. Similar to dense blocks, the transition layers also have a bottleneck convolution layer before actual downsampling. Table 3 displays the comprehensive design of ScaleDenseNet's transition layer architecture.

The proposed architecture combines the compound scaling of EfficientNet with the feature reusability of DenseNet. It is observed that balancing the depth, width, and resolution of the network is essential to increasing accuracy [6]. EfficientNet provides a way to tie these three factors together. DenseNet uses feature reuse as a way to overcome the vanishing gradient problem. Figure 2 displays a block diagram of the ScaledDenseNet model. The diagram presents an overview of the model's building blocks, including the Convolutional layer, ScaledDense Blocks, Transition Layers, and Pooling layer. By visualizing the components of the model and their connections, Figure 2 provides a valuable resource for understanding the architecture and design of the model.

This allows DenseNet to go deeper with increased accuracy [5]. These features of EfficientNet and DenseNet are complementary to each other. Hence, we have chosen to apply the compound scaling technique to DenseNet. We will call this architecture ScaledDenseNet. Similar to EfficientNet, our proposed architecture used d, w, r as the scaling factors to scale depth, width, and resolution respectively [6].

where, $d=\alpha^\phi$ , $w=\beta^\phi$  and $r=\gamma^\phi$

The values of α, β, and γ are hyperparameters and ϕ is a user defined compound scaling coefficient. The values of α, β, and γ are found by setting ϕ=1 and performing a grid search. The values of hyperparameters were restricted such that α×β2×γ2≤2. Since the increase in the number of operations will be proportional to α×β2×γ2 [6].

For the base network, we used the DenseNet-121 [5] architecture. Using d, only the number of layers in each dense block is scaled. Using w, the number of filters in every layer is increased, while r is used to scale the input resolution only.

5. Results and Discussion

5.1 Hyperparameters

A grid search was performed to determine the optimal value of the hyperparameters α, β, and γ. The dataset was split into 70% training and 30% testing, and three-fold cross-validation was performed for every α, β, and γ where α×β2×γ2≤2. There is certainly a trend in the accuracy value with an increase in the hyperparameter value. By observing the results, the values α=1.15, β=1.05, and γ=1.25 are chosen. The detailed results are given at [17].

5.2 Classification

For the classification of skin diseases, the HAM10000 [16] dataset was used. The dataset was split into 70% training and 30% testing sets. Each network was trained for 80 epochs using stochastic gradient descent. Optimizer Adam was used with a learning rate 0.001 constant throughout the training. Five/fold cross-validation was used to calculate the accuracy of the networks. Due to technical limitations, the proposed network was scaled down to half of its depth. The input resolution of the network was also halved to 128x128 as well. The proposed network with compound scaling coefficient = 3 outperformed both EfficientNet and DenseNet-121 for the task of seven class classification. The detailed results are represented in Table 4. Also Figure 3 presents a comparison of the DenseNet-121, EfficientNet-B0, and ScaledDenseNet architectures for the classification of the HAM10000 dataset.

Using different values of the phi, network can be effectively scaled. It was observed during the experiments that the scaling the network did improve its ability to classify. We have tested for phi values from 1 to 3. Table 5 compares the accuracy of our ScaledDenseNet architecture using various scaling coefficients as shown in Figure 4.

Table 4. Comparison of DenseNet-121, EfficientNet-B0 and our proposed ScaledDenseNet architecture in classification of HAM10000 dataset

Architectures

Top 3-Accuracy (%)

Precision

Recall

F1-Score

DenseNet-121

93.618

72.353

73.372

71.996

EfficientNet-B0

92.078

68.302

71.221

69.299

ScaledDenseNet

(ϕ=3)

94.264

74.638

73.943

73.836

Figure 3. Comparison of DenseNet-121, EfficientNet-B0, and ScaledDenseNet architecture in classification of HAM10000 dataset        

Figure 4. Scaling coefficients and accuracy comparison of ScaledDenseNet model for HAM10000 classification

Table 5. Comparison of our ScaledDenseNet architecture accuracy using different scaling coefficients

ϕ value

Top 3-Accuracy (%)

Precision

Recall

F1-Score

1

94.264

72.155

73.901

72.252

2

94.312

73.632

73.717

73.625

3

94.590

74.638

73.943

73.836

6. Conclusion

The study introduces ScaledDenseNet, an efficient deep learning architecture for accurate skin lesion identification. ScaledDenseNet combines DenseNet with the compound scaling method from EfficientNet to achieve improved performance without compromising inference speed. The hyperparameters α=1.15, β=1.05, and γ=1.25 are optimized through a grid search to control network scaling.

The HAM10000 dataset, consisting of seven disease categories, is used for training and testing. ScaledDenseNet achieves an impressive top-3 accuracy of 94.59% and outperforms DenseNet-121 and EfficientNet-B0 with top-3 accuracies of 93.618% and 92.078% respectively. Image resizing is employed to address dataset imbalance by adjusting input resolution based on the $\phi$ value.

The studied methodology includes hyperparameter optimization through a grid search and explicit labeling of disease categories to ensure validity and repeatability. ScaledDenseNet offers a promising solution for automatic skin lesion identification, combining DenseNet's high connectivity with the efficiency of compound scaling.

The results highlight ScaledDenseNet's potential in dermatology for early detection and diagnosis of various skin conditions. Its accurate identification of skin lesions can contribute to advancements in dermatological image analysis, leading to improved healthcare outcomes. By aiding in timely interventions and treatment, ScaledDenseNet has the potential to save lives.

Overall, ScaledDenseNet presents an efficient deep learning architecture that overcomes challenges in skin lesion identification. Its superior performance, achieved with the optimized hyperparameters α=1.15, β=1.05, and γ=1.25, and potential for enhancing dermatological practice make it a valuable tool in improving the accuracy and efficiency of skin disease diagnosis and treatment.

  References

[1] Krizhevsky, A., Sutskever, I., Hinton, G.E. (2017). Imagenet classification with deep convolutional neural networks. Communications of the ACM, 60(6): 84-90. https://doi.org/10.1145/3065386

[2] Simonyan, K., Zisserman, A. (2014). Very deep convolutional networks for large-scale image recognition. https://doi.org/10.48550/arXiv.1409.1556 

[3] Szegedy, C., Liu, W., Jia, Y.Q., Sermanet, P., Reed, S., Anguelov, D., Erhan, D., Vanhoucke, V., Rabinovich, A. (2015). Going deeper with convolutions. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Boston, MA, USA, pp. 1-9. https://doi.org/10.1109/CVPR.2015.7298594

[4] He, K.M., Zhang, X.Y., Ren, S.Q., Sun, J. (2016). Deep residual learning for image recognition. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, Las Vegas, NV, USA, pp. 770-778. https://doi.org/10.1109/CVPR.2016.90

[5] Huang, G., Liu, Z., Van Der Maaten, L., Weinberger, K.Q. (2017). Densely connected convolutional networks. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, pp. 2261-2269. https://doi.org/10.1109/CVPR.2017.243

[6] Tan, M., Le, Q.V. (2019). Efficientnet: Rethinking model scaling for convolutional neural networks. Proceedings of the 36th International Conference on Machine Learning, ICML 2019, Long Beach, 9-15 June 2019, 6105-6114.

[7] Höhn, J., Krieghoff-Henning, E., Jutzi, T.B., et al. (2021). Combining CNN-based histologic whole slide image analysis and patient data to improve skin cancer classification. European Journal of Cancer, 149: 94-101. https://doi.org/10.1016/j.ejca.2021.02.032

[8] Li, W.P., Zhuang, J.X., Wang, R.X., Zhang, J.G., Zheng, W.S. (2020). Fusing metadata and dermoscopy images for skin disease diagnosis. In 2020 IEEE 17th International Symposium on Biomedical Imaging (ISBI), Iowa City, IA, USA, pp. 1996-2000. https://doi.org/10.1109/ISBI45749.2020.9098645

[9] Kaur, R., GholamHosseini, H., Sinha, R., Lindén, M. (2022). Melanoma classification using a novel deep convolutional neural network with dermoscopic images. Sensors, 22(3): 1134. https://doi.org/10.3390/s22031134

[10] Raza, R., Zulfiqar, F., Tariq, S., Anwar, G.B., Sargano, A.B., Habib, Z. (2021). Melanoma classification from dermoscopy images using ensemble of convolutional neural networks. Mathematics, 10(1): 26. https://doi.org/10.3390/math10010026

[11] Nithya Anoo, S., Pavithra, A., Poornamala, S., Siamala Devi, S. (2022). An efficient skin cancer classification approach using neural networks. Journal of Algebraic Statistics, 13(3): 4946-4957.

[12] ALEnezi, N.S.A. (2019). A method of skin disease detection using image processing and machine learning. Procedia Computer Science, 163: 85-92. https://doi.org/10.1016/j.procs.2019.12.090

[13] Kumar, M., Kumar, R. (2016). An intelligent system to diagnosis the skin disease. ARPN JEAS, 11(19): 11368-11373.

[14] Mahbod, A., Schaefer, G., Ellinger, I., Ecker, R., Pitiot, A., Wang, C. (2019). Fusing fine-tuned deep features for skin lesion classification. Computerized Medical Imaging and Graphics, 71: 19-29. https://doi.org/10.1016/j.compmedimag.2018.10.007

[15] Pathan, S., Aggarwal, V., Prabhu, K.G., Siddalingaswamy, P.C. (2019). Melanoma detection in dermoscopic images using color features. Biomedical and Pharmacology Journal, 12(1): 107-115. https://dx.doi.org/10.13005/bpj/1619

[16] Tschandl, P., Rosendahl, C., Kittler, H. (2018). The HAM10000 dataset, a large collection of multi-source dermatoscopic images of common pigmented skin lesions. Scientific data, 5(1): 1-9. https://doi.org/10.1038/sdata.2018.161

[17] https://github.com/Parth1045p/GridSearch_Value, accessed on Apr. 4, 2023