Efficient Segmentation Approach for The Traceability of Breast Cancer Tissues to Improve Diagnostic Accuracy in Ultrasound Images

Efficient Segmentation Approach for The Traceability of Breast Cancer Tissues to Improve Diagnostic Accuracy in Ultrasound Images

Prathamesh Suhas Uravane Vedant Vinay Ganthade | Adityaraj Sanjay Belhe Mamoon Rashid* Shakila Basheer Mariyam Aysha Bivi

Science Academy, College of Computer, Mathematical, and Natural Sciences, University of Maryland, Maryland 20742, USA

Ira A. Fulton Schools of Engineering, Arizona State University, Arizona 85281, United States

Department of AI and Engineering, Wednesday Solutions, Pune 411013, India

School of Information Communication and Technology, Bahrain Polytechnic, Isa Town 33349, Bahrain

Department of Information Systems, College of Computer and Information Science, Princess Nourah bint Abdulrahman University, Riyadh 11671, Saudi Arabia

Department of Computer Science, College of Computer Science, King Khalid University, Abha 61421, Saudi Arabia

Corresponding Author Email: 
mamoon873@gmail.com
Page: 
2913-2922
|
DOI: 
https://doi.org/10.18280/ts.420540
Received: 
19 August 2025
|
Revised: 
26 August 2025
|
Accepted: 
22 September 2025
|
Available online: 
31 October 2025
| Citation

© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

Breast cancer continues to be a leading concern in global health, reaching across diverse populations, and requires correct detection through early intervention. This is especially the case considering the complexity of breast tissue analysis and the increasing data volumes. In this connection, emerging data aligns with the urgency in the transformation of rapid, precise interpretation of complex ultrasound images using Artificial Intelligence (AI) to advance in diagnosis and therapy. This research provides a new approach to applying segmentation in healthcare for the traceability of every breast tissue to improve diagnostic accuracy. The latest innovations of this study are in the new preprocessing pipeline with advanced image preprocessing techniques of normalization, CLAHE, Gaussian Blur, and augmentation to handle noise, artefacts, and muscle regions that may lead to high false favorable rates. The two state-of-the-art deep learning-based instance segmentation frameworks are used, i.e., U-Net, MultiResUNet, and DeepLabV3 with a ResNet-50 encoder-decoder. The overall accuracy of the study achieved is 96% for all algorithms. Furthermore, the segmentation results showed good agreement with Jaccard indices consistently achieving 70%. Integrating the segmentation technique into our preprocessing pipeline allows for providing better clinical insights, speeding up diagnosis, and elevating patient care.

Keywords: 

machine learning, feature extraction, deep learning, breast cancer, preprocessing, segmentation

1. Introduction

Breast cancer is a serious global disease, and it has raised concern over several nations and communities with an alarming overall statistic of more than 2.3 million new cases, with 685,000 deaths from breast cancer alone in the year 2020 [1]. Though medical science has made certain strides to orchestrate hope, the immediacy of this crisis looms large, more so in regions where access to healthcare resources is disproportionately skimpy. For example, India is a billion-plus country that is suffering from an acute shortage of medical professionals. With only over 2,000 oncologists serving 10 million patients [2], the skills shortage is conspicuous. Similarly, less than 10,000 radiologists for the whole country point to the towering task of making diagnoses on time and correctly [3]. But the diagnosis of breast cancers is complex and requires full acquaintance with the basic sciences of the imaging modalities, particularly ultrasound. It plays a very important role in breast imaging, where a radiologist together with a sonographer is actively involved in the successful capturing of an ultrasound image, as the reflected waves detail the anatomy of the breast tissue in numerous ways. This therefore creates a different kind of perspective compared to X-rays or MRIs. While X-rays and MRIs depend on different forms of radiation and magnetic fields respectively, ultrasound utilizes sound waves to create images with limited risks associated with using it on patients [4-6]. Oncologists, representing the first line of treatment against the diagnosis of breast cancer, very often represent hope and counsellors, needed by patients combating this terrible disease. The magnitude of the problem is serious. The ratio is too imbalanced for the number of patients is concerned with the number of oncologists and sonographers. The demands for diagnosis and care of breast cancer are so high, yet availability is at an all-time low. In such a case, the role of a sonographer, an expert conducting ultrasound examinations, becomes highly important. However, the imbalance in patient and workforce numbers underlines how imperative it is to look for newer ways of bridging the gap.

In this research, ultrasound images and their corresponding masks, that are referred to as annotations or ground truth, are useful. Ultrasound is the most common imaging modality in which to probe for breast cancer because it can be used non-invasively in real-time. But these images can have a subjective interpretation, which is where masks come in. Masks, which are generated via fine segmentation, indicate various regions of interest (ROI), such as tumors in ultrasound pictures. In this work, they were used as a reference or gold standard for the development and validation of deep learning algorithms for automated tumor detection and analysis later in this paper with the assistance of deep learning. This means that the AI model can not only correctly characterize and delimit dubious areas in ultrasound images but also provide an exact location on what is being primarily concerned. Deep learning may automatically reveal the hidden important information from an ultrasound image beyond what a human observer would be able to distinguish. Extracted features include complex texture patterns that indicate malignancy and forms. By exploring these characteristics, the model can provide a comprehensive analysis. Particularly in medical diagnoses, the consistency of deep learning-based models is very important. It rapidly helps analyze images and decreases inter-radiologist variation, elevating a higher degree of accurate diagnosis for oncologists to promptly act upon.

In this work, we suggest a novel data preprocessing pipeline to facilitate segmentation for breast cancer ultrasound imaging. An efficient and accurate data preprocessing pipeline unlocks the powerful application of different deep learning algorithms, including DeepLabV3 with ResNet-50, U-Net, and MultiResUNet for segmentation. Our study is based on ultrasound images, as they serve as a safe and real-time modality for imaging. It involves several key steps, such as noise reduction using Gaussian blur, applying CLAHE for contrast enhancement, data augmentation to increase the size of our dataset for generalization purposes, and image normalization. Starting from the raw ultrasound images up to the format consumable by the algorithms mentioned above, each step in this pipeline deals with one of the issues pertaining to making sense out of ultrasound imaging. These segmentation masks, precise outlines of the tumor edge, provide utility for improving diagnostic accuracy and clinical decision-making in breast cancer.

Diagnosis of breast cancer segmentation is mainly relied on the precise interpretation of ultrasound images. However, manual delineation of tumor boundaries consumes more time, subjective, and prone to the inter-observer variability among radiology experts. To underline these challenges, our study focuses on the segmentation of breast tissues, which allows automated identification of tumor regions with high accuracy. Precise segmentation assists to reduce the diagnostic inconsistencies and assists oncologists in planning personalized treatment, including surgery, chemotherapy, and radiotherapy. By connecting the technological advances of our preprocessing pipeline and deep learning models with clinical results, our approach bridges the gap between computational research and real-world medical solutions, ultimately improving diagnostic accuracy and enhancing patient outcomes.

The main contributions of this research are:

(1) Proposed a segmentation approach using deep learning algorithms to generate better segmentation masks.

(2) This research provides a thorough explanation for every step-by-step technique used in our novel data preprocessing pipeline.

(3) Integrating our pipeline with custom-tuned algorithms DeepLabV3 and ResNet50, U-Net, and MultiResUNet and comparing the results based on the Jaccard Index Comparisons.

The rest of the paper has been organized into the following sections: Section 2 provides a detailed literature review. Section 3 provides the detailed methodology of the proposed segmentation technique. The experimental test and results are presented in Section 4. Finally, the paper concludes in section 5.

2. Literature Review

In this section, the existing works in the field of ultrasound images and their involvement with AI, segmentation, and computer vision techniques used in medical imaging, are been discussed. Several papers in this section used advanced medical imaging techniques to handle complex ultrasound images. The use of an end-to-end integrated pipeline for the classification of breast cancer ultrasonography images has been used here, and the methods that are used are K Means++, SLIC and have also used four different transfer learning models such as VGG16, VGG19, DenseNet121 and ResNet50 [7, 8] A framework with a stepwise approach for data augmentation has been proposed along with some pre-trained DarkNet-53, transfer learning, two RDE and RGW optimization algorithms, probability-based methods and finally, some machine learning-based classifications [9, 10]. The solution to the problem of limited ultrasound labelled data has been solved here by producing a novel asymmetric semi-supervised GAN (ASSGAN), utilizing two generators and a discriminator. These generators create reliable segmentation guidance without labels, leveraging unlabeled data for effective training. Compared with fully supervised and semi-supervised methods on diverse datasets, including a new collection, ASSGAN excels with limited labelled images, showing promise in addressing data scarcity challenges in breast ultrasound image segmentations [11].

The authors have created a completely automated and multi-layer process for segmenting and classifying breast lesions from ultrasound pictures. They have also compared the performance of different convolutional neural network architectures combining network performance with the help of an ensemble, and they are presenting a unique step of cyclic mutual optimization that helps utilize classification step results to improve segmentation outcomes [12]. The next research emphasized more on ultrasonic image segmentation's noise and contrast challenges. Traditional methods struggle, but local phase-based approaches, like level set propagation using local phase and orientation, show promise. Cauchy kernels improve feature extraction over log-Gabor filters. Results confirm noise handling and precise boundary capture capabilities. The prevalence of breast ultrasound (BUS) for cancer detection has highlighted the significance of accurate tumor segmentation to assist doctors and AI diagnosis systems. While U-Net is a popular choice, it often produces false-positive mass predictions in normal scans, a concern for routine AI-based screening. Current studies center on designing fine-tuned U-Net architectures, fusion of multiple-modal data, and alternatives to machine learning techniques such as CNNs and random forests. It addresses issues relating to increasing the accuracy of segmentation and to minimize false positives in BUS images, especially for automated screening applications. The manuscript introduces an adaptive region segmentation algorithm within a Bayesian framework that processes noisy images. It is based on a multiresolution wavelet approach, applicable to 2D and 3D data [13].

The authors of this study introduced a geometric model and computational algorithm for ultrasound image segmentation. A partial differential equation-based flow was formulated for maximum likelihood segmentation using grey-level density probability and smoothness constraints. The classic Rayleigh probability distribution models grey-level behavior in ultrasound images. The flow's steady state yields optimal segmentation. A finite difference approximation was developed and validated through some numerical experiments, and demonstrated on fetal echography and echocardiography ultrasound images. This study developed a computer-aided diagnosis (CAD) system for breast mass classification using ultrasonography. The system showed high-performance classification from the use of CNN ensemble with VGG19 and ResNet152 models. The dataset consisted of 1536 breast masses: 897 malignant, 639 benign. The CAD system based on CNN offered an opportunity for clinical breast cancer diagnosis. Importantly, CNN architecture was not focused on masses themselves that proved crucial for accurate classification [14].

Recent works related to breast cancer imaging tend to apply deep learning to the segmentation and classification of tumors automatically. A host of techniques involves the use of CNNs, automation of full-image analysis, to enhance image analysis, for ultrasound and MRI examinations. Such models aim at improving diagnostic accuracy by effectively segmenting breast masses and providing support to classify them, focusing on real-time and large-scale data processing. In addition, benchmarks for segmentation and the development of preoperative assessments are indicative of an increasingly embedded AI system in both the diagnostic and surgical planning environment—one that fosters more personalized medical care [15-19]. Authors of these studies emphasized the use of deep learning and segmentation techniques in their approaches for the purpose of enhancing breast cancer imaging and diagnostic accuracies. Many have employed 3D image segmentation, as witnessed in predictive analysis on chemotherapy response and enhancement in analyzing MRI breast tissue. Segmentation of ultrasound images makes use of both global and local statistical methods, with current evidence suggesting a shift to more robust multi-resolution techniques. Finally, publicly available deep learning models and datasets advance the research in the segmentation of breast tissue, fibroglandular tissue, and vessels and provide critical tools for clinical applications. These further underline the increasing reliance on AI in personalized treatment for cancers [20-25].

3. Methodology

In our proposed work, we used a segmentation strategy to enhance the precision in the localization of tumors of breast cancer from ultrasound images. We employed three algorithms of deep learning specifically U-Net, MultiResUNet, and DeepLabV3 along with ResNet-50 and each of them is known for its excellence regarding complex patterns and sharp outline of bounders in an ultrasound image. This process is made possible using an encoder-decoder architecture, which also enables the precise localization of tumors within the images and the extraction of high-level features. The precise design of a novel data preprocessing pipeline that includes methods Gaussian blur, CLAHE, data augmentation, and normalization is important, producing more accurate segmentation results.

However Gaussian Blur, CLAHE, normalization and augmentation are separately established techniques, the innovation lies in their combination and sequencing with optimization. The preprocessing pipeline starts with the Gaussian Blur to reduce high frequency noise, followed by CLAHE to enhance local contrast. Normalization helps to ensure the consistency of pixel intensity distribution through the images and augmentation integrates variation to improve model generalization. Compared to conventional preprocessing techniques our pipeline structure is carefully tuned for breast cancer ultrasound characteristics, allowing more accurate tumor boundary detection. as presented in Table 1, the proposed preprocessing pipeline improves segmentation accuracy from 35% to 96.7%, establishing its effectiveness, novelty and clinical relevance. A detailed explanation of how the whole process is carried out is visualized in Figure 1.

Our research makes use of ultrasound images and their respective segmentation masks, which can also be coined as annotations. Originally, the image and mask data were combined in the same directory for three different labels. The authors here built an algorithm for separating the image and described the technique as essential for organizing and optimizing the breast cancer ultrasound dataset. By systematically segregating images and corresponding masks into separate directories, the technique streamlines data access and ensures data consistency. The stepwise data preparation algorithm is given in Algorithm 1.

Table 1. Algorithm performance with and without using pipeline

Pipeline

Algorithm

Accuracy

F1-Score

Jaccard

Precision

Recall

Without Normalization

DeeplabV3+Resnet50

0.35177

0.19793

0.12234

0.12273

0.99659

MultiResUnet

0.34240

0.19607

0.12099

0.12134

0.99674

Unet

0.20948

0.17227

0.10372

0.10372

1.00000

With Normalization

DeeplabV3+Resnet50

0.95761

0.77901

0.69673

0.86729

0.78945

MultiResUnet

0.95386

0.73817

0.69673

0.86431

0.73878

Unet

0.95650

0.67809

0.59712

0.78527

0.74600

Data+Normalization+Gaussian blur

DeeplabV3+Resnet50

0.95892

0.76974

0.68537

0.85848

0.77802

MultiResUnet

0.95818

0.75359

0.67176

0.83724

0.77136

Unet

0.95874

0.72032

0.63576

0.83508

0.73800

Data+Normalization+Gaussian blur+CLAHE

DeeplabV3+Resnet50

0.95929

0.77179

0.68552

0.85275

0.77854

MultiResUnet

0.96020

0.76420

0.68558

0.88728

0.75417

Unet

0.96017

0.72845

0.64810

0.84648

0.75781

Data+Normalization+Gaussian blur+CLAHE+Augmentation

DeeplabV3+Resnet50

0.96454

0.79516

0.71453

0.80500

0.85555

MultiResUnet

0.96553

0.79380

0.71990

0.79534

0.88684

Unet

0.96735

0.82284

0.74445

0.82933

0.87565

Figure 1. Overall system representation diagram

Algorithm 1. Stepwise Data Preparation algorithm

Input: Directory path containing raw image and mask data.

Output: Separated directories for images and masks.

1. Initialize variable path with the path of the data directory.

2. Initialize counter counter with a value of 1.

3. While there are images and masks to process:

Construct image_path using path, class_names, and counter:

image_path=path+class_names+counter_value.png

Construct mask_path using path, class_names, counter, and mask:

mask_path=path+class_names+counter_mask.png

4. Read the image from image_path and the mask from mask_path.

5. Create two separate directories to store images and masks.

6. Store the image in the image’s directory and the mask in the mask’s directory.

7. Increment the counter value.

8. Repeat steps 3 to 7 until all images and masks are processed.

3.1 Noise reduction through gaussian blur

The subsequent step in the pipeline, Gaussian blur, was introduced as one of the critical preprocessing techniques. As Gaussian blur is a filtering process that involves convolving the image with a Gaussian kernel, essentially averaging the pixel values in a localized neighborhood, authors used it to leverage the drawbacks of noisy data. This feature has mainly two purposes: first, smoothing out minor irregularities that helped the model to focus on more prominent features relevant to breast cancer diagnosis. It mitigates the influence of noise and fine-grained details present in ultrasound images and enhances image clarity. After examining the drawbacks of the noisy data, Gaussian blur further reduced the impact of outliers and extreme intensity variations that had persisted. Additionally, smoothing out minor irregularities helped the model to focus on more prominent features relevant to breast cancer diagnosis. The usage of a large kernel size (5,5) results in substantial blurring effects, and kernel size influences the amount of smoothing required as well as data characteristics. High-frequency noise is diminished by using a large kernel size. It also controls the amount of blurring added to the image. Here, (5,5) is the size of the neighborhood in the Gaussian kernel. Careful consideration was given to this parameter, as it finds out how much noise will be terminated as well as information lost in the process. The combined Gaussian blur method used after normalization upgrades data quality to further initial processing, resulting in more precise and noise-robust empirical classifier performance for breast cancer image analysis relevant to clinical practice.

3.2 Implementing CLAHE

The next process in the pipeline employed is Contrast Limited Adaptive Histogram Equalization (CLAHE), which is one of the key techniques for improving ultrasound image quality. CLAHE is contrast enhancement using adaptive histogram equalization, which modifies the image so that its intensity distribution achieves a desired average local contrast [6]. This method increases the effectiveness of preprocessing if used together with normalization and Gaussian blur. Although normalization aligns pixel values and Gaussian blur (smoothing) reduces the noise and fine details, CLAHE addresses the problems of intensity variations caused by a machine and uneven illumination, particularly apparent in ultrasound images. This stage increases the prominence of both fine and subtle features in the images by spreading pixel values with the aim to allow more efficient image analysis. CLAHE, along with normalization and Gaussian blur, encompasses an entire method to enhance ultrasound images by accentuating salient diagnostic features of the image and facilitating their accurate identification in breast cancer characteristics detection. CLAHE also covers the uniform blurring effect caused by excessive usage of Gaussian blur, resulting in losing fine details and edges that are very important in the segmentation purposes. Enhancing local contrast and mitigating over usage of noise, a more balanced way of pixel redistribution occurs across the image.

3.3 Data augmentation

The authors integrated data augmentation into the breast cancer ultrasound image segmentation pipeline to overcome challenges posed by limited datasets and enhance their model's performance. Data augmentation involves introducing controlled variations to the pre-processed ultrasound images through techniques of flips and rotations. By doing so, we aimed to address multiple critical goals.

(1) We achieved a better pixel-wise representation with spiculated mass not just at the centroids but also by augmenting images to reflect real-life variations during image acquisition and resulting in different angles for the learning of the model.

(2) The model was made less sensitive to variations because it was trained on features extracted from images that simulated various conditions like real-world imaging scenarios. Also, the model was saved from overfitting, which is a risk of failing to generalize to new data because it only remembers instances instead of learning how to setup rules based on standard examples.

(3) The identified augmented dataset improved the model generalization over different image variations, which is fundamental for a reliable breast cancer diagnosis. The authors then constructed a data augmentation training strategy that incorporated data augmentation into their work to achieve optimal generalizations of identifying key breast cancer characteristic behavior from MRI in a range of images.

3.4 Normalizing data

Normalization is an important part of data preprocessing, as it has increased the utility of the ultrasound image data for further analysis. Pixel values are scaled to a unified range between 0 and 1. Uniformity in pixel values in images was a major task in the context of breast cancer ultrasound images having varying intensity levels. This process balanced the scale of information within each image. Outliers in very high extreme intensities could easily skew the model training. The authors have worked on taking away the differences in pixel range, which in turn helps models such as DeepLabV3 and ResNet50, MultiResNet, and Unet converge significantly when training. As a result, the models can identify relevant features in the images more accurately and generally. The pipeline used normalization to lay a consistent foundation for subsequent techniques of segmentation. This alignment of data characteristics allows models to focus on meaningful patterns within images, resulting in more robust and accurate breast cancer diagnostic outcomes.

3.5 Deep learning algorithms for segmentation

In this study, we used three advanced deep learning architectures U-Net, MultiResUNet, and DeepLabV3+ResNet50 selected based on their performance in medical image segmentation and their additional strengths. U-Net was used as the baseline model because of its all-round adoption in medical imaging and its ability to accurately capture both low-level and high-level features using an encoder-decoder architecture. MultiResUNet, an extension of U-Net, comes with multi-resolution convolutional blocks that allow the network to extract fine-grained texture patterns, making it particularly effective for identifying small lesions and subtle breast tissue variations in ultrasound images. The architecture of DeepLabV3+ResNet50, allows to detect non similar tumor regions while maintaining boundary precision.

The integration of these three models comes up with a comprehensive framework for performance comparison. This diversity allows us to evaluate segmentation performance under different complexities of breast ultrasound images. The encoder-decoder architectures used in the networks of these models efficiently extract image features through the encoder and reconstruct accurate segmentation maps through the decoder. The strategic selection these architectures, supported by our optimized preprocessing pipeline, guarantees reliable segmentation performance, as illustrated by the notable enhancement in Jaccard indices and accuracy metrics reported in Section 4. Figure 2 provides a much more explicit explanation of how encoder-decoder architecture seems to be working.

Figure 2. Working of encoder and decoder architecture

4. Results

In this section, a novel preprocessing pipeline incorporating a wide variety of deep learning algorithms has achieved an accuracy of up to 96%. We present their findings through a combination of graphical representations, evaluation metrics, and visual figures illustrating the disparities between actual and predicted segmentation masks. The processes Gaussian blur, CLAHE, augmentation, and normalization were carried out extensively and methodically to unveil the pivotal role of the novel pipeline in enhancing model performance. The impact of each deployed technique is systematically scrutinized by the authors, which gives insights about how they contribute to improve results collectively. The comprehensive research reveals the enhancement in quality of segmentation achieved with the integration of the pipeline. In general, the results section is indeed an intensive study with great depths of understanding that covers the trend of results obtained using various techniques and the progressive refinement incorporated due to the new data preprocessing pipeline.

4.1 Experimental setup

Experiments were implemented on a system with an NVIDIA GeForce RTX 3050 GPU (4GB VRAM) and an AMD Ryzen 7 6800H CPU. The dataset was sub divided into 70% training, 15% validation, and 15% testing, and 5-fold cross-validation was performed to examine robustness. Models were trained using the Adam optimizer with a learning rate of 1e-3, a batch size of 6, and 60 epochs. A combined Binary Cross-Entropy and Dice loss was used, with early stopping (patience=20) and a ReduceLROnPlateau scheduler (factor=0.1, patience=9, min_lr=1e-7) to overcome overfitting.

To overcome on generalization, we integrated data augmentation: Horizontal Flip (p=1.0), Vertical Flip (p=1.0), and Rotation (limit=±45°, p=1.0). This scaled the dataset 4 times and integrated the variation in dataset. For noise reduction, Gaussian Blur with a kernel size of (5,5) was applied to suppress high-frequency noise while securing tumor boundaries. A fixed random seed (42) was used to certify reproducibility across dataset splitting, augmentation, and training.

4.2 Results without pre-processing

In Figure 3, the breast cancer image segmentation dataset has been subjected to the DeepLabV3+ResNet50 algorithm by the authors without applying pre-processing to the dataset. Thus, the algorithm is drawn to the raw image data which appears to be the case from the output the algorithm is giving. The output is sub-optimal. The values of its accuracy and Jaccard Scores are also very low.

Similarly, if we check the performance of MultiResUNet, and Unet in Figures 4 and 5 respectively, we can say that without using the data preprocessing pipeline, we cannot achieve better results for the segmentation.

Figure 3. Segmentation using DeepLabV3+ResNet50 without using the pipeline

Figure 4. Segmentation using MultiResUnet without using pipeline

Figure 5. Segmentation using Unet without using pipeline

Figure 6. Training and validation accuracy graph for the DeepLabv3+ResNet50 algorithm

The obtained training accuracy of 35%, as shown in Figure 6, along with precision and Jaccard indexes are both at a mere score of 0.12, that underscores the inadequacy of the initial model performance for the given breast cancer ultrasound image segmentation problem.

The initial segmentation results from the chosen algorithms without the use of our preprocessing pipeline resulted in poor performance. The natural reasons can be attributed to these inherent complexities in ultrasound images, such as the presence of noise and generally poor contrast along with significant variations in both texture and intensity. Because of such complexities, the algorithms get confused, and thus it becomes challenging for them to accurately demarcate the boundaries of the tumor. Without preprocessing, the algorithms used may not be able to extract as good features and reduce noise as much. In this case, segmentation masks produced would be less accurate and their overall performance would be lower Jaccard scores.

However, promisingly, the coming sections hold the promise of unveiling how such initial results are transformed by this preprocessing pipeline. The authors demonstrate impact on improvement in terms of accuracy and other means of evaluation, thus shedding light on transformation from modest outcomes to refined and more accurate segmentation results while promising tangible improvement in the challenge of confronting this complex medical image segmentation task.

4.3 Results with pre-processing

We successfully merged our novel data preprocessing pipeline into our workflow to understand the initial set of challenges and improve our segmentation results. We began this process with analysing noise reduction—a crucial step in improving the accuracy of our masks. We expect a progressive improvement in the quality and precision of our breast cancer tumor segmentations as we progressively add each part of the pipeline that includes noise reduction, contrast enhancement, data augmentation, and normalization. This systematic process shall increasingly improve our results and the performances of our deep learning algorithms as we advance with each stage of this preprocessing pipeline.

Figure 7. Noise reduction using Gaussian blur on an image

Figure 8. CLAHE on the denoised ultrasound image

The results shown in Figure 7 are using the Gaussian blur noise reduction technique, and it provides improved results. The pipeline's first preprocessing step is Gaussian blur, which starts to progressively enhance the quality of the results of segmentation. Noise in ultrasound images starts to be sorted out through this stage, which is obviously quite critical, particularly with breast cancer ultrasounds, which are known for intricate details and subtle changes. For making the image more stable and visually coherent, the Gaussian blur feature smoothes sharp transitions and tends to minimize noise-induced inconsistencies. Although this is the first step ahead in the more complex process, the improvement has set a base on which subsequent stages are built to further enhance the precision of the segmentation task.

Future elements of this preprocessing pipeline involve the usage of CLAHE and how the application of this technique would subsequently increase the chances of better segmentation results, as in Figure 8. The improvement marked in the predicted image mask can be attributed to the fact that CLAHE could preserve and highlight the required features better for accuracy in segmentation results. This keeps the local improvement provided by CLAHE to preserve the dependencies between the various constituents of an image and thereby provide the original ground truth mask with more faithful segmentations. Hence, this improvement speaks well for the effectiveness of CLAHE in adapting to the subtleties of medical images in producing better reliability and accuracy in their segmentations.

The obtained results with the second part of the pipeline involving augmentation. Figure 9 shows the involvement of this phase and how augmentation helps segmentation achieve better results. Introducing variations in the form of horizontal and vertical flips and many other operations have helped generate better data to accompany the original data and help algorithms to train these sets altogether. This makes the model more resilient to variations of several imaging conditions, patient poses, and probe orientations, and ultimately leads to a more generalized segmentation model. Augmentation further reduces the threat of overfitting—an ordinary issue in operating with limited medical imaging datasets. By introducing controlled variations, the model learned to extract and prioritize salient features regardless of minor image alterations, and finally, arriving at the final stage of the pipeline. Figure 10 shows how normalizing pixel values helps our model generate better masks.

Normalization improves training since pixel values are mapped to be between about the same range; in no case does the pixel intensities in individual images dominate the training due to variance resulting from differences in illumination. This minimizes effects due to variance in illumination conditions and increases the degree to which the model will generalize patterns related to breast cancer features in different images. The overall effect of the whole preprocessing pipeline is a significant advancement in segmentation research. The pipeline systematically handles inherent challenges created by breast cancer ultrasound images, ranging from noise and low contrast to minimal tissue appearance variations. With techniques such as sequence Gaussian blur, CLAHE, augmentation, and normalization, the pipeline processes raw images toward a standardized dataset. This fined dataset is used to train some sophisticated deep learning models such as DeepLabV3 and ResNet-50 that permits them to capture minute features that are fundamental to accurate segmentations. In Table 2, every highlight of the algorithms performs within the pipeline. Also, we are comparing the numerical results of every algorithm performing under every stage of our pipeline, which is shown in Figures 11-13.

Figure 9. Augmented segmented mask

Figure 10. Final Segmentation results after the use of the preprocessing pipeline

Figure 11. Performance of DeepLabV3+ResNet50 with the pipeline

Figure 12. Performance of MultiResUnet with the pipeline

Figure 13. Performance of Unet with the pipeline

Table 2. Comparison of state-of-the-art segmentation methods and proposed method on breast ultrasound images

Research Work /Paper Title

Model/Technique

Result

Year

Your Best Model (Proposed)

UNet+Gaussian+CLAHE +Augmentation

0.9674 (Acuuracy)

0.74 (Jaccard)

2025

DBU-Net: Dual branch U-Net [26]

U2-MNet

0.9378 (Acuuracy)

2023

AAU-net [27]

Adaptive Attention U-Net

0.6910 (Jaccard)

2022

Attention U-Net [28]

CNN-based Segmentation

0.9500 (Acuuracy)

2024

Table 1 presents the segmentation performance of the three architectures evaluated in this study. Notably U-Net attained the highest accuracy (96.7%) with the complete preprocessing pipeline, proving its strong resilience and efficient encoder-decoder structure for ultrasound image segmentation. The second evaluated architecture MultiResUNet achieved 96.5% due to its multi-resolution convolutional blocks, which acquire subtler structural details efficiently. The final architecture DeepLabV3+ResNet50 achieved 96.4% accuracy for extracting multi-scale contextual features by leveraging atrous spatial pyramid pooling (ASPP). Although three architectures aided from advanced preprocessing, the results suggest that U-Net demonstrates better for heterogeneous ultrasound data.

The proposed preprocessing pipeline plays a vital role in achieving these results. Prior to preprocessing, segmentation accuracy was limited (35.1% for DeepLabV3+ResNet50, 34.2% for MultiResUNet, 20.9% for U-Net) primarily caused by noise, poor contrast, and complex textures in ultrasound images. Performing normalization improved performance to roughly 95% throughout all models by fortifying pixel intensities. Gaussian blur further helps to refine accuracy by suppressing high-frequency noise, while CLAHE boosts local contrast and tumor boundary visibility, achieving 96.73% accuracy. These results confirm that the pipeline significantly enhances segmentation performance across diverse architectures.

Table 2 provides a comparison between existing state-of-the-art segmentation methods for breast ultrasound images and our proposed pre-processing pipeline. This highlights the significant performance improvement achieved through our optimized pipeline.

5. Conclusion

In this study, we developed a novel preprocessing pipeline that contains Gaussian blur, CLAHE, normalization, and augmentation to enhance segmentation accuracy for breast ultrasound images. By integrating this optimized preprocessing techniques with three state-of-the-art deep learning models U-Net, MultiResUNet, and DeepLabV3+ResNet50, we achieved prominent improvements in diagnostic precision. Our approach obtained a segmentation accuracy of 96.7% and a Jaccard index of 0.74, outperforming several existing methods and demonstrating the clinical relevance of our method for tumor traceability.

Despite these promising results, we admit certain limitations of proposed study. The proposed approach requires additional computational costs due to the multi-step preprocessing. In addition, challenging cases such as small tumors, heterogeneous tissue textures, and low-contrast ultrasound images remain challenging to segment with high precision.

There exists a future scope for implementing high performance and enhanced preprocessing stages, lightweight deep learning networks which requisite lesser computation and leveraging attention-based hybrid models to improve segmentation accuracy. Overall, our results illustrate that the proposed framework significantly enhances segmentation accuracy and offers a strong foundation for advancing computer-assisted breast cancer diagnostic.

Funding

This research is supported by Princess Nourah bint Abdulrahman University Researchers Supporting Project number (PNURSP2025R195), Princess Nourah bint Abdulrahman University, Riyadh, Saudi Arabia.

  References

[1] World Health Organization. Breast cancer. https://www.who.int/news-room/fact-sheets/detail/breast-cancer.

[2] Mehrotra, R., Yadav, K. (2022). Breast cancer in India: Present scenario and the challenges ahead. World Journal of Clinical Oncology, 13(3): 209-218. https://doi.org/10.5306/wjco.v13.i3.209

[3] Kalyanpur, A. (2008). Commentary 3-radiology in India: The next decade. Indian Journal of Radiology and Imaging, 18(3): 191-192. https://doi.org/10.4103/0971-3026.41869

[4] U.S. Food and Drug Administration. Ultrasound imaging. https://www.fda.gov/radiation-emitting-products/medical-imaging/ultrasound-imaging.

[5] Gu, P., Lee, W., Roubidoux, M.A., Yuan, J., Wang, X., Carson, P.L. (2015). Automated 3D ultrasound image segmentation to aid breast cancer image interpretation. Ultrasonics, 65: 51-58. https://doi.org/10.1016/j.ultras.2015.10.023.

[6] Hesaraki, S., Mohammed, A.S., Eisaei, M., Mousa, R. (2025). Breast cancer ultrasound image segmentation using improved 3DUnet++. WFUMB Ultrasound Open, 3(1): 100068. https://doi.org/10.1016/j.wfumbo.2024.100068

[7] National Breast Cancer Foundation. Breast cancer ultrasound. https://nbcf.org.au/about-breast-cancer/detection-and-awareness/breast-cancer-ultrasound/.

[8] Wu, G.G., Zhou, L.Q., Xu, J.W., Wang, J.Y., Wei, Q., Deng, Y.B., Cui, X.W., Dietrich, C.F. (2019). Artificial intelligence in breast ultrasound. World Journal of Radiology, 11(2): 19-26. https://doi.org/10.4329/wjr.v11.i2.19

[9] Inan, M.S.K., Alam, F.I., Hasan, R. (2022). Deep integrated pipeline of segmentation guided classification of breast cancer from ultrasound images. Biomedical Signal Processing and Control, 75: 103553. https://doi.org/10.1016/j.bspc.2022.103553

[10] Nasser, M., Yusof, U.K. (2023). Deep learning based methods for breast cancer diagnosis: A systematic review and future direction. Diagnostics, 13(1): 161. https://doi.org/10.3390/diagnostics13010161

[11] Jabeen, K., Khan, M.A., Alhaisoni, M., Tariq, U., Zhang, Y.D., Hamza, A., Mickus, A., Damaševičius, R. (2022). Breast cancer classification from ultrasound images using probability-based optimal deep learning feature fusion. Sensors, 22(3): 807. https://doi.org/10.3390/s22030807

[12] Martinez, R.G., Van Dongen, D.M. (2023). Deep learning algorithms for the early detection of breast cancer: A comparative study with traditional machine learning. Informatics in Medicine Unlocked, 41: 101317. https://doi.org/10.1016/j.imu.2023.101317

[13] Zhai, D., Hu, B., Gong, X., Zou, H., Luo, J. (2022). ASS-GAN: Asymmetric semi-supervised GAN for breast ultrasound image segmentation. Neurocomputing, 493: 204-216. https://doi.org/10.1016/j.neucom.2022.04.021

[14] Abo-El-Rejal, A., Ayman, S., Aymen, F. (2024). Advances in breast cancer segmentation: A comprehensive review. Acadlore Transactions on AI and Machine Learning, 3(2): 70-83. https://doi.org/10.56578/ataiml030201

[15] Podda, A.S., Balia, R., Barra, S., Carta, S., Fenu, G., Piano, L. (2022). Fully-automated deep learning pipeline for segmentation and classification of breast ultrasound images. Journal of Computational Science, 63: 101816. https://doi.org/10.1016/j.jocs.2022.101816

[16] Bilic, A., Chen, C. (2024). BC-MRI-SEG: A breast cancer MRI tumor segmentation benchmark. In 2024 IEEE 12th International Conference on Healthcare Informatics (ICHI), Orlando, USA, pp. 674-678. https://doi.org/10.1109/ICHI61247.2024.00107

[17] Belaid, A., Boukerroui, D., Maingourd, Y., Lerallut, J.F. (2010). Phase-based level set segmentation of ultrasound images. IEEE Transactions on Information Technology in Biomedicine, 15(1): 138-147. https://doi.org/10.1109/TITB.2010.2090889

[18] Chen, M., Xing, J., Guo, L. (2024). MRI-based deep learning models for preoperative breast volume and density assessment assisting breast reconstruction. Aesthetic Plastic Surgery, 48(23): 4994-5006. https://doi.org/10.1007/s00266-024-04074-2

[19] Zhang, S., Liao, M., Wang, J., Zhu, Y., Zhang, Y., Zhang, J., Zheng, R., Lv, L., Zhu, D., Chen, H., Wang, W. (2023). Fully automatic tumor segmentation of breast ultrasound images with deep learning. Journal of Applied Clinical Medical Physics, 24(1): e13863. https://doi.org/10.1002/acm2.13863

[20] Ranjitha, K.V., Pushphavathi, T.P. (2024). Improving prediction accuracy for neo-adjuvant chemotherapy response in breast cancer through 3D image segmentation and deep learning techniques. In Artificial Intelligence in Medicine, 137-162. https://www.taylorfrancis.com/chapters/edit/10.1201/9781003369059-12/improving-prediction-accuracy-neo-adjuvant-chemotherapy-response-breast-cancer-3d-image-segmentation-deep-learning-techniques-ranjitha-pushphavathi.

[21] Boukerroui, D., Baskurt, A., Noble, J.A., Basset, O. (2003). Segmentation of ultrasound images-multiresolution 2D and 3D algorithm based on global and local statistics. Pattern Recognition Letters, 24(4-5): 779-790. https://doi.org/10.1016/S0167-8655(02)00181-2

[22] Forghani, Y., Timotoe, R., Figueiredo, M., Marques, T., Batista, E., Cordoso, F., Cardoso, M.J., Santinha, J., Gouveia, P. (2024). Breast tissue segmentation in MR images using deep-learning. European Journal of Cancer, 200(S1): 113876. https://doi.org/10.1016/j.ejca.2024.113876

[23] Lew, C.O., Harouni, M., Kirksey, E.R., Kang, E.J., Dong, H., Gu, H., Grimm, L.J., Walsh, R., Lowell, D.A., Mazurowski, M.A. (2024). A publicly available deep learning model and dataset for segmentation of breast, fibroglandular tissue, and vessels in breast MRI. Scientific Reports, 14(1): 5383. https://doi.org/10.1038/s41598-024-54048-2

[24] Sarti, A., Corsi, C., Mazzini, E., Lamberti, C. (2005). Maximum likelihood segmentation of ultrasound images with Rayleigh distribution. IEEE Transactions on Ultrasonics, Ferroelectrics, and Frequency Control, 52(6): 947-960. https://doi.org/10.1109/TUFFC.2005.1504017

[25] Tanaka, H., Chiu, S.W., Watanabe, T., Kaoku, S., Yamaguchi, T. (2019). Computer-aided diagnosis system for breast ultrasound images using deep learning. Physics in Medicine & Biology, 64(23): 235013. https://doi.org/10.1088/1361-6560/ab5093

[26] Pramanik, P., Pramanik, R., Schwenker, F., Sarkar, R. (2023). DBU-Net: Dual branch U-Net for tumor segmentation in breast ultrasound images. Plos One, 18(11): e0293615. https://doi.org/10.1371/journal.pone.0293615

[27] Chen, G., Li, L., Dai, Y., Zhang, J., Yap, M.H. (2022). AAU-net: An adaptive attention U-net for breast lesions segmentation in ultrasound images. IEEE Transactions on Medical Imaging, 42(5): 1289-1300. https://doi.org/10.1109/TMI.2022.3226268

[28] Lu, Y.M. (2024). Breast ultrasound image segmentation based on attention U-Net. In Proceedings of the 2nd International Conference on Machine Learning and Automation, CONF-MLA 2024, Adana, Turkey. https://doi.org/10.4108/eai.21-11-2024.2354629