Deep Learning-Based Red Blood Cell Classification for Sickle Cell Anemia Diagnosis Using Hybrid CNN-LSTM Model

ABSTRACT


INTRODUCTION
Sickle cell anemia is an inherited blood condition that impairs hemoglobin structure and function [1].A mutation in the HBB gene is the root cause of SCA.This mutation results in the production of abnormal hemoglobin known as Hemoglobin S [2].Hemoglobin S tends to form rigid, abnormal red blood cells rather than the typical disc-shaped cells [3,4].Sickle cells have a drastically shorter life expectancy of just 10 to 20 days compared to normal RBCs, which have a life expectancy of 120 days.The sub-Saharan region of Africa has the highest prevalence, although people of Middle Eastern, Indian, and Mediterranean origin are also susceptible to it [5].Each year, there are around 300,000 infants with sickle cell disease globally and 5% of the world's population has the traits of sickle cells [6,7].Sickle cells have the potential to clog blood vessels, which would limit blood flow, harm tissue, and cause discomfort [8].Depending on the severity of the condition, sickle cell anemia can cause a variety of moderate to severe symptoms, such as anemia, lethargy [9], jaundice, pain crises, delayed growth and development, eyesight issues, and an increased susceptibility to infections [10].Complications of sickle cell anemia can affect multiple organ systems and include stroke, acute chest syndrome [11,12], pulmonary hypertension [13] and kidney damage [14].Manual detection of sickle cell anemia is not efficient as it is time consuming, requires skilled personnel, it is not cost effective and difficult to detect where resources are limited therefore in this paper image classification technique is used to overcome these barriers by classifying the microscopic blood images.Image preprocessing is done by applying filtration and segmentation techniques such as watershed segmentation [15], region based segmentation and then a comparative study of different deep learning models is done as deep learning is efficient for image classification it can detect complex pattern and relationship in data which may be difficult to identify with machine learning models.We have used ELM, CNN inception v3 and a combine CNN-LSTM model to classify the microscopic blood images into circular, elongated and others.

LITERATURE REVIEW
An overview of various studies to diagnose sickle cell anemia using various techniques which includes image-based classification is presented.In 1910, Dr. James B. Herrick's gave the first explanation on SCA in the published literature of sickle cell disease [16].To distinguish between circular and elongated blood cells in a processed picture, Otsu thresholding with watershed segmentation are performed.RF, logistic regression, support vector machines, and naive bayes are then applied, with random forest producing the best results with an accuracy of 92% [17].Fuzzy c means was applied, extracted geometrical and statistical features were used after to train and compare KNN, SVM and ELM where ELM was superior with 87.7% of accuracy, 83.33% specificity and 87.5% sensitivity as proposed by Chy and Rahaman [18].Otsu thresholding followed by gaussian and canny filter to enhance the image for classification using gradient boost and random forest classifier with SDS-score of 95.6% and 94.4%which performed better than previous models by selecting the best parameters using randomized and grid search [19].Data augmentation was used to minimize overfitting where the model plus a multiclass SVM classifier attains highest accuracy [20].When it comes to accuracy and processing time, Niblack's methodology outperforms other techniques of adaptive thresholding with 95.4% accuracy and 1.73 seconds of processing time [21].The amalgamation of fuzzy C-means and NICK's thresholding with KNN classifier was the most effective, according to Patgiri and Ganguly [22].Distinct adaptive thresholding methods are tested together with fuzzy C-means, Naive Bayes and KNN classifier.Machine learning models were outperformed by MLP, a sort of feedforward neural network with many layers of neurons [23].Faster R-CNN was implemented to detect RBCs, WBCs and Platelets which achieved greater accuracy in detecting objects more effectively [24].For classification, Soni et al. [25] proposed equipped AlexNet model transfer learning, including model evaluation utilizing data division algorithms.Deep neural network in which augmented images are fed to a deep CNN model utilizing pre-trained models such as VGG16, VGG19, ResNet50, ResNet101, and InceptionV3, among which InceptionV3 yielded the highest accuracy of 91% [26].Seven CNN-based hybrid image classification techniques-CNN with ELM, KNN, GA, MLP, SVM, RNN, and LSTM-were compared to determine which was the most accurate [27].CNN-LSTM performed better.Integration of CNN-LSTM utilized for feature extraction and the latter for feature-based classification [28].CNN model and a RCNN are two recent techniques that are compared and combined in video analysis to categorize cell motion [29].Our research paper introduces a hybrid model, yielding superior results as CNNs and its architectures have been separately explored, their integration in this domain remains unexplored.

Image acquisition
The image dataset used in this research was acquired from a local hospital and is part of confidential patient records.The dataset consists of 260 microscopic colored blood cell images taken in JPG format.The images consist of normal RBC's and abnormal RBC's like sickle cell and others.

Data augmentation
By generating extra training samples from the original data, a technique known as data augmentation is used to artificially enhance the size of a dataset.It prevents overfitting, can help in improving the performance and generalization of a model [20].The rotation angles of images are 0, 90, 180 and 270, part of dataset is flipped horizontally and vertically and a scaling factor between 0.8 and 1.2 is applied to introduce variations in image size.

Image preprocessing
Image preprocessing is used to enhance the quality microscopic blood images and prepare images for subsequent processing steps.To minimize complexity and dimensionality, as well as to remove high frequency noise, the picture is first converted to grayscale using a mean filter given as: /2 ( 1)/2 where, W is the input gray image, Z is the output image, and n is the size of the kernel.The resulting value represents the average intensity of the neighborhood surrounding the pixel at position (i, j).Followed by a median filter to smooth the image.The Gaussian filter is applied to remove noise while preserving edges given as: where, W is the input image, Z is the output image, K is the normalization factor, n is the size of the Gaussian kernel, and G is the Gaussian kernel given below: where, x and y are the distances from the center of the kernel, the Gaussian distribution's standard deviation is denoted by the sigma, π is the mathematical constant, and e is the base of the natural logarithm.Finally, to locate the cell borders, the Canny edge filter is used.

Segmentation
Segmentation is a technique used to separate items of use from the background of a picture.First, Otsu thresholding is applied given by Eq. ( 4): The variance for a particular threshold value "t" is denoted by the symbol   2 .The likelihood of a pixel falling into the background class for threshold "t" is calculated as the proportion of pixels with intensity levels below "t" to all of the pixels in the picture, and is denoted by the formula w0(t).
The likelihood of a pixel falling into the foreground class for threshold "t" is expressed as w1(t)which is calculated as the ratio of pixels with intensity values greater than or equal to "t" to all pixels in the picture.The average intensity value of the background class pixels given threshold "t" is represented by the formula μ0(t).The average intensity value of the foreground class pixels given threshold "t" is μ1(t).
Following this, Watershed segmentation was then applied to refine the segmented regions and separate objects that were touching or overlapping.Finally, more attributes were extracted from the segmented sections using region-based segmentation, such as texture, shape, or size.This combination of techniques resulted in accurate and robust segmentation of blood cells in the images.

Morphological operations
Morphological operations are applied to enhance features and improve the segmentation process.To extract critical information from the images, we performed dilation and closing to fill gaps and holes of the object, erosion and opening is done to remove small objects and smooth out the boundaries.

Feature extraction
We did feature extraction to obtaining meaningful information from processed image dataset as given in Figure 1.

Figure 1. Block diagram of methodology
Metric Value: It represents the total number of pixels found inside the area that is relevant (ROI).Mathematically, Metric Value can be represented as follows: Circularity: It measures the roundness of the object.A normal RBC has a circularity of 1, while an abnormal RBC like sickle cell and others has a circularity inclining towards 0. Mathematically, Circularity can be calculated as: Standard Deviation: It measures the degree of variation of the pixel values within the region of interest.Mathematically, Standard Deviation can be calculated as: Aspect Ratio: The aspect ratio is calculated by dividing the vertical axis length by the horizontal axis length.Following are the mathematical steps to determine the aspect ratio: Eccentricity: It is the measure of how much an object deviates from a perfect circle.Mathematically, Eccentricity can be calculated as: Variance: It measures the degree of spread of the pixel values within the region of interest.Mathematically, Variance can be calculated as: Skewness: It measures the degree of asymmetry of the pixel intensity distribution within the region of interest.
Mathematically, Skewness can be calculated as: These features are used to train and evaluate ELM classifier used for sickle cell detection.

ELM
It is a simple, highly computational efficient single-hiddenlayer feedforward neural network [30].Its output weights are solved using a closed-form method, and its input weights are initialized randomly.As opposed to advanced algorithms for machine learning like KNN and SVM, ELM has been demonstrated to perform better in the identification of SCA using microscopic blood images [18].Its efficient due to simplicity in architecture and faster training time, although it does not leverage the intricate hierarchical features captured by deeper architectures.

CNN InceptionV3
A CNN architecture called Inception-V3 is used to classify images due to its efficient feature extraction, parallel processing, and better accuracy [31].Convolutional, pooling, and activation layers are included in its 48 total layers.Additionally, it has Inception modules, which combine 1x1, 3x3, and 5x5 convolutions with pooling and concatenation processes to capture intricate hierarchical patterns.The network can extract features using these modules at various sizes and resolutions.Here, batch normalization and regularization are used to enhance performance while preventing overfitting.

CNN -LSTM hybridization
In Figure 2, Convolutional neural network (CNN) is a type of deep neural network that includes convolutional layers, pooling layers, and fully connected layers [32] that work together to get important characteristics from the input data and categorize the data as circular elongated and other shapes.It also uses filters [33] or kernels for extraction of specific features: where, in Eq. ( 15) yp,q is the output activation at location (p, q) in the feature map, f being the activation function (ReLU is used), xp+i, q+j being the input activation at position (p+i, q+j) in input feature map, wi,j is the weight (or kernel) applied to input feature map, b is the bias term.
The feature maps created by the convolutional layers are down sampled by pooling layers to decrease their spatial dimensionality [34].Convolutional and pooling layers' feature maps are then used by fully connected layers to categorize the input picture [35].
Long Short-Term Memory (LSTM) is a form of recurrent neural network (RNN) shown in Figure 3 and is intended to address the issue of vanishing gradients in conventional RNNs [36].A memory cell is used by LSTMs to store data over time [37], and the input gate, output gate, and forget gate being the three gates that regulate how data enters and leaves the memory cell.Each time a time step is taken, the input gate Eq. ( 16) chooses which data should be stored in the memory cell.The values that come from this process are then transferred via a tan activation function, which creates the new candidate cell state.

Figure 3. LSTM architecture
The memory cell state is obtained by subtracting the result from the memory cell state as determined by the forget gate Eq. ( 17) for each time step in order to produce the new cell state.
A tanh activation function Eq. ( 19) is applied to the values once the output gate Eq. ( 18) has determined which data should be output from the memory cell at each time step.
which output gate multiplies to create the new hidden state Eq.(20).
where, xt is the input at time step t, ht is the hidden state output at time step t, ct is the cell state at time step t, * represents element-wise multiplication, and W and b are the weight matrices and bias vectors that will be learned during training. is the sigmoid activation function.CNN-LSTM: CNN and LSTM networks are combined to create hybrid architecture it is advantageous for sickle cell anemia detection by effectively combining spatial and temporal information, recognizing sequential patterns, and creating robust feature representations critical for accurate image classification.The architecture blends the LSTM's temporal modelling skills with CNNs' feature extraction capabilities.The LSTM classifier is used to model the temporal connections between frames and arrive at a final prediction.Each frame of the input sequence is processed by the CNN feature extraction algorithm to extract pertinent features.The CNN feature extractor uses a convolutional layer with 32 filters each, kernels of size 3x3 and is triggered by the ReLU activation function.It then uses 10 convolutional and 5 pooling layers, followed by one fully connected layer.For the final classification into circular, elongated, and other shapes, the LSTM network consists of one LSTM layer with 64 units, followed by one fully linked layer, as illustrated in Figure 4. Further hyperparameter tuning is done where learning rate is 0.001 and dropout rate 0.3 for regularization.Backpropagation through time (BPTT) jointly train the CNN-LSTM components in order to enhance the hybrid architecture's overall performance.Modern findings in image classification to identify sickle cell anemia have been attained using the CNN-LSTM hybrid architecture.

RESULT
The dataset employed in this study consists of 750 microscopic blood pictures, of which 80-20 ratio is maintained train test split respectively.To enhance the dataset size and improve performance, image augmentation is used.In image Figure 5 preprocessing images are converted from RGB to grey scale after that mean, median and gaussian filter is applied to obtain processed image.Figure 6 processed images are then passed through segmentation step where otsu thresholding in Figures 7 and 8, watershed segmentation, region-based segmentation images several morphological operations are done.Table 1 gives the result of feature extraction.ELM has less accuracy compared to InceptionV3 andInceptionV3 has less accuracy than CNN+LSTM architecture.CNN+LSTM provides better results in detection of sickle cell comparative to other mentioned models used in this paper given in Table 5.

CONCLUSIONS
Hemoglobin molecules are produced abnormally as a result of the genetically transmitted blood disorder known as sickle cell anemia.In order to avert serious health issues our research focuses on faster and more accurate identification of sickle cell anemia using image classification using deep learning techniques.Expansion of data set is done using data augmentation to prevent overfitting.Different deep learning models like ELM, InceptionV3 and CNN+LSTM are used to classify images into circular, elongated and others.Early detection is crucial for timely intervention and management of the condition.Our research shows that the hybrid CNN and LSTM excelled the others, achieving the highest accuracy of 92.66%, when compared to ELM and InceptionV3.This implies that the performance of deep neural networks for the classification of sickle cell anemia in images can be improved by integrating CNN and LSTM components.This helps in accurate and early detection of sickle cell anemia through image classification which contribute to improved patient outcomes and more efficient healthcare processes.

FUTURE WORK
In this section, we list a few prospective areas for further study and advancement, expansion of data to include more diverse set of patients and to evaluate the generalizability of our models to other populations.For better detection of echinocytes optimization of hyperparameter can be done and exploration of different deep learning architecture with ensembling techniques can be performed.Furthermore, the study can be expanded by recommending the optimal treatment on the basis of diagnosis of severity of sickle cell anemia obtained through diagnosis of SCA which may include blood transfusions, dosage of hydroxyurea to reduce pain and inflammation and bone marrow or stem cell transplants in severe cases.
It measures the randomness or uncertainty of the pixel intensity distribution within the region of interest.Mathematically, Entropy can be calculated as: It measures the sum of the squared pixel values within the region of interest.Mathematically, Energy can be calculated as: It measures the degree of peakedness of the pixel intensity distribution within the region of interest.Mathematically, Kurtosis can be calculated as:  = ∑ (  −) 4 ⋅   =1  4

Figure 7 .Figure 8 .
Figure 7. Binary image , confusion matrix of ELM, InceptionV3 and CNN+LSTM are depicted where true positive refers to correct detection of sickle cell, TN refers correct detection of normal cell, FP refers to incorrect detection of normal cell and FN refers to incorrect detection sickle cell as CNN+LSTM model has consistent TP and TN values and less FP and FN values which indicates that CNN+LSTM outperforms InceptionV3 and ELM.

Table 1 .
Feature extraction Refined pictures with relevant features are then input into ELM, InceptionV3, and CNN-LSTM where accuracy, precision, F1 score and recall of the above techniques are reported based on TP, TN, FP, and FN results.Tables 2-4 are examples.