© 2026 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Early and accurate skin lesion classification is essential for the timely diagnosis of melanoma and other dermatological diseases. Existing skin lesion classifiers often exhibit limited generalization across heterogeneous clinical datasets and lack interpretability, reducing clinical trust. To address these challenges, this work proposes a privacy-preserving Federated Swin Transformer V2 (FSViTV2) framework integrated with Explainable Artificial Intelligence (XAI) for robust skin lesion classification. The proposed approach enables multiple healthcare institutions to collaboratively train the SViTV2 model by sharing only encrypted model updates, rather than raw dermoscopic images, thereby ensuring data confidentiality and minimizing the risk of data leakage. The hierarchical window-based self-attention mechanism of SViTV2 effectively captures both fine-grained local lesion patterns and global contextual information, improving feature discrimination under federated learning (FL) and non-identically distributed data settings. To enhance transparency and clinical reliability, an XAI module incorporating attention map visualization and gradient-weighted class activation mapping is employed to highlight diagnostically relevant lesion regions and provide human-interpretable explanations. Experimental results demonstrate that the proposed federated framework achieves competitive or superior classification performance compared to existing federated systems, offering enhanced robustness to data heterogeneity, robust privacy preservation, and meaningful visual explanations. This demonstrates its suitability for reliable, deployable skin-lesion diagnosis in real-world clinical settings.
skin lesion classification, federated learning, Swin Transformer V2, privacy preservation, Explainable Artificial Intelligence, medical image analysis, distributed deep learning, clinical decision support
Cancer is a severe and life-threatening disease, and currently, there are over 100 different types of the disease, affecting people in different parts of the world. Skin cancer is one of the most lethal and rapidly increasing malignancies in the world. The proliferation of pigmented melanocytes is the main cause of skin cancer. Excessive sunburns, the use of tanning beds, and high exposure to natural or artificial Ultraviolet (UV) radiation are major risk factors of the development of malignant skin cancer [1]. Based on the findings of skin cancer studies, almost 80 percent of skin cancer cases are fatal, provided that the disease is not diagnosed at an early stage. Early melanoma, therefore, it is very important, since it has been shown that it increases the survival rates of the affected patient. Melanoma is a cancerous growth that occurs when the cells that produce melanin in the skin, known as melanocytes, begin to multiply [2]. Not every melanoma is due to UV radiation, especially those found in areas of the body not exposed to sunlight. Squamous cell carcinoma most often develops as a hard knob on the skin that may have a rough surface, unlike basal cell carcinoma, which has smooth, shiny patches. In other instances, the cancer can appear as reddish, scaly patches rather than nodules [3].
Transformer pruning methods can be divided into four groups. The former type removes unnecessary items and filters based on structural similarities in the model architecture. Such techniques are not based on semantic knowledge but rather on architectural redundancy. The second category identifies the significance of image patches and several attention heads as measured by L2 norm, entropy, and saliency. In other studies, the L1 norm can be used to trim Multilayer Perceptron (MLP) layers; however, these methods fail to account for semantic relevance [17]. The third group exploits explainability methods, including those proposed to preserve features important for prediction. In contrast to the proposed method, which relies on simple statistical metrics such as distortion, the methods are based on more complex interpretability processes [18]. The fourth one considers relevance not on a per-layer basis but at the encoder block level. Though this methodology also accounts for block-level interactions, it differs in that it uses a block-wise training approach. This approach differs markedly from prior research by further compressing an already efficiency-optimized SViT architecture, whereas most prior studies leverage a highly over-parameterized ViT [19].
Hybrid CNN models with a Support Vector Machine (SVM) classifier to enhance diagnostic accuracy when classifying skin lesions. Introduced a computerized diagnosis framework of skin cancer based on dermoscopic image, with the implementation of adaptive snake and region growing methods of segmentation, which is then followed by ANN and SVM-based classification. Introduced the approach of combining AI with DL in skin cancer detection, where the Contourlet Transform is used to extract features and the state-of-the-art neural networks are used to train them [20]. Explored CNN and SVM classifiers, which involve pre-processing, segmentation, and feature extraction steps to forecast the progression of skin cancer. Proposed a detection method that can improve feature extraction and classification through a combination of Sand Cat Swarm Optimization and a ResNet50-based model [21]. The main idea of EfficientNet modeling is that a compound scaling methodically increases a baseline CNN to a desired model size while maximizing the accuracy gain. This strategy uniformly scales network width, depth, and input resolution, allowing EfficientNet to achieve high performance with far fewer parameters and lower Floating-Point Operations per Second (FLOPS). EfficientNet is not only an optimal classifier but also highly computationally efficient. Convolutional-Deconvolutional Neural Networks (CDNNs) were introduced for segmenting skin lesions in dermoscopic images, producing binary masks through pixel-wise classification of skin and lesion regions [22]. The CDNN architecture comprises 29 layers, each with hyperparameters optimized via a grid search. The up-sampling and deconvolution layers maintain and recover image resolution. A collection of CDNNs is used as the main segmentation architecture, which proves more efficient in terms of computational cost and lesion segmentation quality [23]. To produce coarse segmentation maps, two Fully Convolutional Residual Networks (FCRNs) are trained on augmented images, including the original, flipped, and rotated versions of the sample. These crude maps are then refined using distance maps generated by a Locally Invariant Convolutional Unit (LICU), which are upsampled and warped to achieve fine segmentation results. The average of the probabilities from the refined maps is used to obtain the final lesion classification results [24].
Segmentation, filtering, and lesion enhancement methods of extracting Regions of Interest (ROIs) in lesion images via pre-processing. DL based as well as handcrafted features were obtained. Deep features were learned using CNNs, and handcrafted features were learned using ABCD rules for shape, color, and texture [25]. A linear classification system that was trained on features extracted by CNN was used to classify skin lesions. A strongly trained multi-scale network of skin cancer prediction and segmentation based on dermoscopic images was proposed. Supervised DL models have also outperformed unsupervised methods in analyzing skin lesion images [26].
Used a DL framework using Restricted Boltzmann Machines (RBMs) to learn unsupervised features in the images of brain lesions. A Random Forest (RF) classifier with high Dice coefficient values on brain MRI data was then used to segment brain lesions. Similarly, they used a DL model based on Deep Belief Networks (DBNs) to predict autism spectrum disorders. For classifying histopathological breast cancer images, an RBM-based deep neural network (DNN) architecture was proposed, achieving competitive results on breast cancer image datasets [27]. Transfer Learning (TL) Applied to a CNN trained on ImageNet was used on skin lesion data, with NASNet-Large, ResNet-101, and GoogleNet fine-tuned. Proposed an intelligent diagnostic framework of multi-class skin lesion classification based on a hybrid DCNN and SVM model with Error-Correcting Output Codes (ECOC) [28]. An AlexNet architecture was used to extract the features, and the network was trained. Proposed a two-step defense system against poisoning attacks in FL environments. The approach will minimize false-positive rates in detecting poisoned models by using two decision thresholds to prevent the rejection of borderline models and then testing model safety over trends in past performance [29].
The majority of currently available studies on skin lesion classification are based on centralized DL models, which require access to large volumes of dermoscopic data. This not only causes severe issues with patient privacy but also constrains real-world clinical implementation due to confidentiality and security policies. Despite the development of privacy-preserving alternatives to FL, numerous existing FL methods rely on early versions of ViT, or on existing CNNs, which cannot handle intricate model-lesion patterns and tend to suffer under non-identically distributed clinical data [30]. Existing federated skin lesion classification studies primarily aim to improve predictive accuracy but fail to address model transparency, leading to black-box decision-making that undermines clinical trust. The combination of an advanced transformer architecture with XAI in the FSViTV2 model, in particular, has been little studied for its ability to successfully capture local and global lesion features and provide clinically useful visual rationale. There is a serious gap in research on creating a single, privacy-sensitive federated framework that integrates XAI with the SViTV2 architecture to achieve accurate, robust, and clinically reliable skin lesion classification across a variety of healthcare settings.
Let there be $K$ geographically distributed clients (healthcare institutions), each possessing a private dermoscopic dataset mathcal $D_k=\left\{\left(i_x^k, j_x^k\right)\right\}_{x=1}^{n_k}$ where $i_x^k$ denotes a skin lesion image and $j_x^k \in\{1,2, \ldots, C\}$ represents the corresponding lesion class (eg, melanoma, nevus, carcinoma). Due to privacy regulations, raw datasets $D_k$ cannot be shared with a central server. The objective is to collaboratively train a global Swin Transformer V2 model with parameters $\theta$ that minimizes the empirical risk across all clients without exposing sensitive data. The federated optimization problem is formulated as:
$\min _\theta L(\theta)=\sum_{k=1}^K \frac{n_k}{N} L_k(\theta)$ where $L_k(\theta)=\frac{1}{n_k} \sum_{x=1}^{n_k} l\left(f_\theta\left(i_x^k\right), j_x^k\right)$ (1)
and $N=\sum_{k=1}^K n_k$, denotes the total number of training samples across all clients, $f_\theta($.$)$ is the S Each client performs local model updates using stochastic gradient descent:
$\theta_k^{(t+1)}=\theta^{(t)}-\eta \nabla L_k\left(\theta^{(t)}\right)$ (2)
where, $\eta$ is the learning rate. To preserve privacy, only encrypted updates $\varepsilon\left(\theta_k^{(t+1)}\right)$ are transmitted to the server. A secure aggregation function $A($.$)$ computes the global model update. SViTV2 classifier, and $l($.$)$ represents the categorical cross-entropy loss.
$\theta^{(t+1)}=A\left(\left\{\varepsilon\left(\theta_k^{(t+1)}\right)\right\}_{k=1}^K\right)$ (3)
To ensure explainability, an explainable Al function $\Phi($.$)$, such as Integrated Gradients or attention-based attribution, is applied to the trained model: $\Phi(i)=\frac{\partial f_\theta(i)}{\partial i}$ providing visual explanations that highlight discriminative lesion regions influencing predictions.
Thus, the core problem is to design a federated optimization framework that jointly minimizes classification loss, preserves data privacy, and generates clinically interpretable explanations, while remaining robust to heterogeneous data distributions across clients.
Dermoscopic images were obtained at various distributed medical facilities, and each client had a local copy and processing of his/her skin lesion images to maintain patient confidentiality. Pre-processing of images was performed through resizing, color normalization, and data augmentation to minimize illumination variation and class imbalance. The SViTV2 model was initialized on the central server, and training was performed using the FL framework without exchanging raw data, as shown in Figure 3. Encrypted model updates were shared and averaged only through federated averaging. The hierarchical self-attention mechanism captured both global and fine-grained lesion features. Training and evaluation with non-identically distributed data were conducted across several rounds of federated performance, using standard performance metrics.
High-performance computing infrastructure and standardized software environments were employed to ensure efficient, stable, and reproducible training of the proposed privacy-preserving XAI-FSViTV2 framework. Each medical institution, modeled as an independent federated client, was equipped with 64 GB RAM, an Intel Xeon 32-core CPU, and an NVIDIA RTX 4090 GPU (24 GB VRAM) to process high-resolution dermoscopic images. The central aggregation server utilized identical hardware specifications to securely aggregate encrypted model updates without becoming a computational bottleneck. All experiments were conducted using a fixed and documented software stack, consisting of Python 3.11, PyTorch 2.1, CUDA 12.1, and Hugging Face Transformers (v4.38) for implementing the Swin Transformer V2 backbone. The Flower FL framework (v1.6) was employed to manage federated averaging, client orchestration, and secure aggregation. Explainability was implemented using Grad-CAM++ and SHAP libraries to generate visual and attribution-based explanations.
To guarantee experimental reproducibility, global random seeds were fixed to 42 for Python, NumPy, and PyTorch, and deterministic CUDA operations were enabled. All training scripts, configuration files, and hyperparameter settings are modularized and documented, enabling exact replication of experiments across institutions. Regarding training duration, each federated experiment was conducted for 100 global communication rounds, with 5 local epochs per client per round and a batch size of 32. On average, one global round required approximately 2.8 minutes, resulting in a total training time of ~4.7 hours for a complete federated run involving 10 clients. Controlled and synchronized network conditions (latency 10–20 ms, no packet loss) were used to simulate realistic multi-institutional FL environments while ensuring consistency across runs.
The proposed XAI-FSViTV2 uses carefully tuned hyperparameters to balance accuracy, efficiency, and privacy, as shown in Table 3. It employs 96-dimensional embeddings with four hierarchical layers, a 4×4 patch size, and a 7×7 window. Training uses AdamW (learning rate 1e-4, batch size 16, weight decay 0.05), five local epochs over 50 rounds, attention dropout 0.1, GELU activation, and privacy noise (σ = 1e-3).
Table 3. Hyper-parameter settings
|
Hyper Parameter |
Value / Setting |
|
Model Architecture |
Swin Transformer V2 |
|
Number of Transformer Layers |
4 |
|
Patch Size |
4 × 4 |
|
Window Size |
7 × 7 |
|
Embedding Dimension |
96 |
|
Learning Rate |
1e-4 |
|
Optimizer |
AdamW |
|
Batch Size |
16 |
|
Number of Local Epochs |
5 |
|
Number of Federated Rounds |
50 |
|
Weight Decay |
0.05 |
|
Differential Privacy Noise (σ) |
1e-3 |
|
Attention Dropout Rate |
0.1 |
|
Activation Function |
GELU |
The test image input that was used in testing is shown in Figure 11. Figure 12 demonstrates the level of relevance of lesion and no-lesion regions (RD) in three cases. Figure 12 (a) shows that the degree of relevance of 0.3 represents moderate distinction between the lesion and non-lesion areas. Figure 12 (b) has a low relevance of 0.04 meaning that there is very little differentiation and Figure 12 (c) has a high relevance of 0.72 meaning that there is a high correspondence between the predicted relevance and actual lesion regions. Such comparison proves the model to be good at recording and measuring lesion-specific characteristics, and hence it is very effective at differentiating lesions and normal tissue. All dermoscopic images are resized and pre-processed to 224 × 224 pixels to use the XAI-FSViTV2 framework in Figure 13. Pre-processing involves normalization of pixel intensities, contrast-enhancement and in some cases artifact-detection to minimize noise caused by hair, reflections or variation in imaging. This even distribution is compatible with the XAI-FSViTV2 input, making it possible to patch-code and attend hierarchically. It retains the features of lesions including boundaries, pigmentation patterns and textures produce dependable classification and features extraction across client nodes.
Figure 11. Sample input image
(a)
(b)
(c)
Figure 12. Comparison of relevance degree vs. RD lesion with No-lesion (a) 0.3 (b) 0.04 (c) 0.72
Figure 13. FSViTV2 with patch pixels
Data augmentation is used to enhance the pre-processed images, thereby improving model generalization and handling the limited data of skin lesions, as shown in Figure 14. The methods such as flipping, resizing, color jittering and random cropping increase training diversity and maintain important features of lesions in terms of shape, color and texture that allow the XAI-FSViTV2 to learn robust, invariant representations using no extra patient data. Figure 15 shows the accuracy of training, validation, and testing of privacy-preserving XAI-FSViTV2 at various epochs. Accuracy grows quickly in the initial epochs as the model acquires basic skin lesion characteristics, then converges, with validation and test performance roughly in line with training performance. This shows that learning is stable, overfitting is minimal, and the ability to generalize to unseen, not identically distributed data is high. The findings demonstrate the model's successful adaptation, strong performance, and stability in distributed FL settings.
Figure 14. Data augmentation
Privacy-Preserving XAI-FSViTV2 is a suitable model in this field because it addresses the challenge of safe, collaborative skin lesion classification across distributed healthcare facilities. The framework will ensure that sensitive patient data remains private and enable decentralized model training by incorporating concepts of differential privacy and secure aggregation. XAI integration enables the interpretability of model predictions, allowing clinicians to understand and rely on them when making medical decisions, which is essential. The experimental results indicate that the proposed framework achieves a classification accuracy of 94.2%, surpassing current FL systems, including CNN-based (88.5%), ResNet-based (90.1%), DenseNet-based (91.3%), and ViT-based (92.5%). Measures of privacy indicate the strength of the scheme; for ε-privacy, ε = 1.0, and the toleration range is up to four colluding clients in the process of secure aggregation. Although the communication overhead (28 MB/round) and the computation overhead (14 seconds/round) are slightly higher than those of most other systems, they are worth it for the increased privacy, security, and interpretability. Overall, the XAI-FSViTV2 framework demonstrates a strong balance between model performance, privacy preservation, and explainability, making it a promising approach for practical deployment in sensitive clinical environments where both accuracy and data protection are critical.
[1] Hanum, S.A., Dey, A., Kabir, M.A. (2026). An attention-guided deep learning approach for classifying 39 skin lesion types. International Journal of Imaging Systems and Technology, 36(1): e70269. https://doi.org/10.1002/ima.70269
[2] Larasati, S.S.A., Rahmaniati, A.F., Ms, F.I.S., Utomo, Y.C., Ariadi, F., Utaminingrum, F. (2026). Advancement in deep learning for skin detection: A comprehensive review. Biomedical Signal Processing and Control, 112: 108738. https://doi.org/10.1016/j.bspc.2025.108738
[3] Kuntal, Y.A., Bhat, A. (2026). Advancing skin lesion cancer detection: A systematic literature review. Artificial Intelligence and Sustainable Innovation, 510-516.
[4] Furqan, M., Katuk, N., Hartama, D. (2026). Multiclass skin lesion classification algorithm using attention-based vision transformer with metadata fusion. Journal of Applied Data Sciences, 7(1): 203-217. https://doi.org/10.47738/jads.v7i1.1017
[5] Singh, J., Gill, J., Kumar, Y. (2026). Automated detection and diagnosis of bacterial skin infections using deep learning with segmentation techniques. Cognitive Computation, 18(1): 5. https://doi.org/10.1007/s12559-025-10538-7
[6] Al-Yousef, A., Al-Shannaq, M.A.A., Al-Shannaq, A., Saifan, A.A., Mohawesh, R. (2026). Enhancing melanoma detection through multiple datasets integration and robust deep learning. Cluster Computing, 29(1): 62. https://doi.org/10.1007/s10586-025-05884-y
[7] Singh, S., Rai, D., Hazela, B., Singh, V., Tiwari, A., Pandey, A. (2026). Leveraging machine learning techniques for enhanced skin cancer detection. In Recent Advances in Computational Methods in Science and Technology, pp. 392-398.
[8] Thaljaoui, A., Yousafzai, S.N., Nasir, I.M., Saidani, O., Fadhal, E., Saidani, T. (2026). Explainable skin cancer diagnosis with parallel attention mechanism for segmentation and classification. Biomedical Signal Processing and Control, 113: 109159. https://doi.org/10.1016/j.bspc.2025.109159
[9] Prabu, P., Ganeshkumar, P., Parikh, S.M., Parhi, M., Murugan, R., Alluhaidan, A.S. (2026). Optimizing deep learning with attention techniques for improved detection of human monkeypox lesions. Biomedical Signal Processing and Control, 113: 108902. https://doi.org/10.1016/j.bspc.2025.108902
[10] Singh, S., Singh, P., Srivastava, A., Srivastava, J.K., Gupta, S., Dwivedi, V.K. (2026). Real-time skin disease diagnosis using image classification techniques. In Artificial Intelligence and Sustainable Innovation, pp. 673-679.
[11] Mubeen, A., Dulhare, U.N. (2025). Enhanced skin lesion classification using deep learning, integrating with sequential data analysis: A multiclass approach. Engineering Proceedings, 78(1): 6. https://doi.org/10.3390/engproc2024078006
[12] Aksoy, S., Demircioglu, P., Bogrekci, I. (2025). Deep learning-based web application for automated skin lesion classification and analysis. Dermato, 5(2): 7. https://doi.org/10.3390/dermato5020007
[13] Dillshad, V., Khan, M.A., Nazir, M., Saidani, O., Alturki, N., Kadry, S. (2025). D2LFS2Net: Multi-class skin lesion diagnosis using deep learning and variance‐controlled Marine Predator optimisation: An application for precision medicine. CAAI Transactions on Intelligence Technology, 10(1): 207-222. https://doi.org/10.1049/cit2.12267
[14] Shakya, M., Patel, R., Joshi, S. (2025). A comprehensive analysis of deep learning and transfer learning techniques for skin cancer classification. Scientific Reports, 15(1): 4633. https://doi.org/10.1038/s41598-024-82241-w
[15] Yang, G., Luo, S., Greer, P. (2025). Advancements in skin cancer classification: A review of machine learning techniques in clinical image analysis. Multimedia tools and applications, 84(11): 9837-9864. https://doi.org/10.1007/s11042-024-19298-2
[16] Vuran, S., Ucan, M., Akin, M., Kaya, M. (2025). Multi-classification of skin lesion images including Mpox disease using transformer-based deep learning architectures. Diagnostics, 15(3): 374. https://doi.org/10.3390/diagnostics15030374
[17] Haque, S., Ahmad, F., Singh, V., Mathkor, D.M., Babegi, A. (2025). Skin cancer detection using deep learning approaches. Cancer Biotherapy & Radiopharmaceuticals, 40(5): 301-312. https://doi.org/10.1089/cbr.2024.0161
[18] Muthuraja, M., Shanthi, N., Aravindhraj, N., Nimesh, S.V., Ponsudhan, V., Prateeksha, V. (2025). Skin lesion classification using deep learning technique. In 2025 3rd International Conference on Advancement in Computation & Computer Technologies (InCACCT), Gharuan, India, pp. 237-242. https://doi.org/10.1109/InCACCT65424.2025.11011438
[19] He, F., Wu, R., Zeng, X., Song, H., Li, G., Wei, Z. (2025). Skin lesion classification network based on improved MobileViT. Engineering Applications of Artificial Intelligence, 159: 111726. https://doi.org/10.1016/j.engappai.2025.111726
[20] Nazeeruddin, E., Latif, G., Mohammad, N. (2025). Monkeypox and chickenpox skin lesions classification using hybrid deep learning features. In 2025 International Conference on Inventive Computation Technologies (ICICT), Kirtipur, Nepal, pp. 1005-1010. https://doi.org/10.1109/ICICT64420.2025.11004913
[21] Rey-Barroso, L., Vilaseca, M., Royo, S., Díaz-Doutón, F., Lihacova, I., Bondarenko, A., Burgos-Fernández, F.J. (2025). Training state-of-the-art deep learning algorithms with visible and extended near-infrared multispectral images of skin lesions for the improvement of skin cancer diagnosis. Diagnostics, 15(3): 355. https://doi.org/10.3390/diagnostics15030355
[22] Tran-Van, N.Y., Le, K.H. (2025). A multimodal skin lesion classification through cross-attention fusion and collaborative edge computing. Computerized Medical Imaging and Graphics, 124: 102588. https://doi.org/10.1016/j.compmedimag.2025.102588
[23] Shaik, A., Dutta, S.S., Sawant, I.M., Kumar, S., Balasundaram, A., De, K. (2025). An attention based hybrid approach using CNN and BiLSTM for improved skin lesion classification. Scientific Reports, 15(1): 15680. https://doi.org/10.1038/s41598-025-00025-2
[24] Kumar, V., Shanthi, D.L., Babu, T.R., Kumar, N., Godi, R.K. (2025). Advanced skin lesion segmentation and classification using adaptive contextual GLCM and deep learning hybrid models. Egyptian Informatics Journal, 30: 100706. https://doi.org/10.1016/j.eij.2025.100706
[25] Badr, M., Elkasaby, A., Alrahmawy, M., El-Metwally, S. (2025). A multi-model deep learning architecture for diagnosing multi-class skin diseases. Journal of Imaging Informatics in Medicine, 38(3): 1776-1795. https://doi.org/10.1007/s10278-024-01300-w
[26] Patil, A., Mehto, A., Nalband, S. (2025). Enhancing skin lesion diagnosis with data augmentation techniques: A review of the state-of-the-art. Multimedia Tools and Applications, 84(22): 25325-25364. https://doi.org/10.1007/s11042-024-20145-7
[27] Chu, C.Y., Lin, C.H. (2025). Deep learning-based skin lesion classification with ensemble stacking and data augmentation. In 2025 1st International Conference on Consumer Technology (ICCT-Pacific), Matsue, Shimane, pp. 1-4. https://doi.org/10.1109/ICCT-Pacific63901.2025.11012773
[28] Bhaskar, R.K., Kumaraswamy, B. (2025). Early detection of melanoma through deep learning-based skin lesion classification using VGG16 and inceptionV3. In 2025 International Conference on Automation and Computation (AUTOCOM), Dehradun, India, pp. 1333-1339. https://doi.org/ 10.1109/AUTOCOM64127.2025.10957419
[29] Zhang, X., Liu, Y., Ouyang, G., Chen, W., Xu, A., Hara, T., Wu, D. (2025). DermViT: Diagnosis-guided vision transformer for robust and efficient skin lesion classification. Bioengineering, 12(4): 421. https://doi.org/10.3390/bioengineering12040421
[30] Ajabani, D., Shaikh, Z.A., Yousef, A., Ali, K., Albahar, M.A. (2025). Enhancing skin lesion classification: A CNN approach with human baseline comparison. PeerJ Computer Science, 11: e2795. https://doi.org/10.7717/peerj-cs.2795