Multi-class Classification of Alzheimer’s Disease Using Deep Learning and Transfer Learning on 3D MRI Images

Alzheimer's disease (AD) poses a significant challenge for neurologists due to its progressive nature and debilitating impact on cognitive function. Recent advancements in neuroimage analysis have paved the way for innovative machine learning techniques, offering potential for substantial improvements in AD detection, diagnosis, and progression prediction. In this study, we embarked on developing a novel deep learning framework to address this critical need. Traditional manual classification methods for AD are often time-consuming, labor-intensive, and prone to inconsistencies. Given that the brain is the primary organ affected by AD, leveraging a classification system based on brain scans presents a promising avenue for achieving more accurate and reliable results. To effectively capture the spatial information embedded within 3D MRI scans, we extended convolutional techniques to three dimensions. Classification was accomplished by strategically combining features extracted from various layers of the 3D convolutional network, with differential weights assigned to the contributions of each layer. Recognizing the potential of transfer learning to accelerate training time and enhance AD detection efficacy, we incorporated this approach into our methodology. Our proposed framework integrated transfer learning with fine-tuning, harnessing brain MRI images from three distinct classes: Alzheimer's disease (AD), mild cognitive impairment (MCI), and normal control (NC). We explored a range of pre-trained deep learning models, including ResNet50V2 and InceptionResNetV2, for AD classification. ResNet50V2 emerged as the frontrunner, demonstrating superior classification accuracy compared to its counterparts. It achieved a remarkable training accuracy of 92.15%, followed by a sustained high testing accuracy of 91.25%. These results convincingly underscore the remarkable capabilities of deep learning methods, particularly transfer learning with ResNet50V2, in accurately detecting Alzheimer's disease using 3D MRI brain scans.


INTRODUCTION
Alzheimer's disease (AD) is a form of dementia that causes gradual mental deterioration and memory loss over time.Individuals with this suffer permanent brain damage, which ultimately results in death from brain failure [1][2][3].It casts a long shadow, not just on individual lives but on society.As a leading cause of death among the aging population, estimated at nearly 50 million individuals globally, its impact is widespread and profound.Magnetic resonance imaging (MRI) and other cutting-edge neuroimaging techniques are currently being used to diagnose AD.Millions of voxels make up the 3D images that an MRI may generate.Most Alzheimer's disease lesions can be seen on magnetic resonance imaging (MRI) scans, and their severity is typically assessed with the help of the radiologist's training and experience.The brain, soft tissues, and lesions are and measured with the help of digital processing technology.Computers have al-lowed clinicians to perform both qualitative and quantitative analyses of lesions and other areas of interest.Helping clinicians make more informed decisions about lesions is one of the many applications for AI in medicine [4].Mostly, deep learning is used in the scientific tool.Convolutional neural networks, made possible by recent developments in deep learning, offer significant promise in medical image diagnosis and perform well in the classification of natural images.There have been numerous proposals for the classification and segmentation of Alzheimer's disease using convolutional neural networks (CNNs) [5].To properly categorize AD, one must first examine its defining characteristics.Figure 1 Depicts the 3D MRI images samples.
The primary contributions of this research can be summarized as follows: We conducted a comprehensive review of 20 widely used Deep Neural Network (DNN) models.The purpose of this review was to aid in the selection of the most effective DNNbased classifiers for the classification of Alzheimer's Disease (AD) using 3D MRI images.We meticulously analyzed the experimental results of these models to categories AD across its various stages, including Normal Control (NC), Mild Cognitive Impairment (MCI), and AD itself.This analysis provided insights into the performance of these models in a variety of scenarios.We implemented these models with patient age in mind to improve the accuracy of our performance comparisons.Recognizing the potential impact of age on AD diagnosis, our approach accounted for age-related variations.ResNet50V2 demonstrated the best classification performance in our experiments.We improved classification accuracy by replacing all convolution layers in ResNet50V2 with depth-wise convolution layers as a practical solution.The goal of this optimization was to keep accuracy high while reducing computational demands.

LITERATURE SURVEY
Academic interest in AD detection has been growing in recent years, with ML and DL being cited as potential methods for automatic detection.The 3D DL-model can distinguish accurate and detailed spatial and temporal data for accurate classification of AD than the conventional DL model and radiologists, and its clinical application able to further improve the diagnosis of AD.
Klöppel et al. [6] mapped the entire brain's gray matter to a high-dimensional space, where voxels served as coordinates and their values were interpreted as intensity levels.Linear support vector machine was then used to categorize the subjects (SVM).
Lerch et al. [7] using a wide variety of machine learning techniques, many computer-aided systems have been built to interpret disease states from MR images.Features collected from voxel intensity, tissue density, or form descriptor were used in training these algorithms to create the required result.
Zhang et al. [8] in order to classify the deformation vectors on the gray matter of the entire brain as image dissimilarity, employed support vector machines.Due to the high dimensionality of the features, whole brain approaches may be computationally expensive, hence approaches that focus on regional features typically pick a subset of the brain as having relevance to AD or select regions of interest (ROIs) that are tailored to the cohorts.The hippocampus, Para hippocampal gyrus, and entorhinal cortex shared properties in 3D volume and shape with ROI-based techniques.
Brain images were segmented into 116 anatomical ROIs using Mask RCNN technique by Silveira and Marques [9] and boosting classification was used for labeling.
Using a multi-kernel support vector machine, as suggested by Suk et al. [10], multi-modal features, such as tissue volumes estimated from 93 ROIs, might be ensembled to improved accuracy.
The first SVM classification investigation based on hippocampus surface shape invariants was provided by Long and Wyatt [11].Aspects of spherical harmonics that are rotationally invariant served as the basis for the shape invariants (SPH).Despite the shown efficacy of ROI-based approaches, region segmentation mistakes or feature volatility may impact classification accuracy.
Liu et al. [12] developed a deep learning system based on a 3D convolutional neural network (CNN).By integrating the MRI gray matter density map and PET intensity values with the 3D CNN features, we were able to perform multi-modal AD discrimination.Recent years have seen the widespread implementation of deep neural networks into the categorization procedure.
The 3D convolutional neural network (CNN) model used for feature extraction and subsequent classification was trained by Gutman et al. [13] using sparse automatic encoding.
Liu and Shen [14] cropped MRI images based on the predicted locations of AD lesions using the regression forest technique and then fed them into an SVM to make a diagnosis.
Positron emission computed tomography (PET) and magnetic resonance imaging (MRI) scans were used in a multi-modal diagnostic done by Li et al. [15].In addition, they used deep neural network for patch-based input to a cascade network to analyze each MRI and PET image.
Using a data-driven approach, Payan and Montana [16] (GAN) generated numerous patches around discriminative anatomical features in each MRI image.For multi-instance diagnosis learning, these patches are fed into several different classification networks.
Karas et al. [17] used generative adversarial networks (GAN) and picture segmentation to generate the missing PET data, which was subsequently used to train a multi-instance neural network.
Multi-modal data was split into multiple patches by Xu et al. [18] before being fed into the CNN pretrained models for fusion diagnosis.

Dataset description
Alzheimer's Disease Neuroimaging Initiative (ADNI) is a longitudinal multicenter study aiming to identify diagnostic biomarkers (clinical, imaging, genetic, and biochemical) for early detection of Alzheimer's disease.screening for Alzheimer's disease and subsequent treatment [19].Since ADNI contains more MRI data than any other publicly available source, we have used it to test the efficacy of our method.We used 375 samples with Alzheimer's disease, 378 samples with mild cognitive impairment, 447 cognitively normal samples and total number of samples are 1200 images from the dataset.Grad Warp (which fixes distorted picture geometry caused by the gradient model), B1 Correction (which employs B1 calibration scans to even out image brightness), and N3 (which employs a histogram peak sharpening method to even out brightness) were all used as preliminary processing on all samples.The original data dimensions were changed to 113×137×113×3 so that all samples would be uniform in size and shape.As hippocampus volume is a strong indicator of Alzheimer's disease classification, the data reduction was performed by scaling rather than cropping to preserve hippocampi information.Table 1 represents the dataset description.MRI scans are collected from a variety of sources and given for preprocessing.The image's dimensions are then changed by the pre-processing layer.This model classifies AD into three categories and recognizes it.Leveraging the power of 3D convolutions and connection-wise attention mechanisms, our densely linked CNN architecture tackles the challenge of extracting meaningful features from complex 3D brain MRI scans.This robust and efficient network provides a novel approach to analyzing these rich datasets, paving the way for advancements in neuroscience and medical diagnosis.The suggested deep learning-based system model used MRI data to detect and classify disorders at an early stage [20][21][22][23].MRI scans were among the raw training data that was collected.The image was converted from its original 96×120×96×3 dimensions via a pre-processing layer.The study explores applying transfer learning to detect Alzheimer's disease.It utilizes pre-trained deep learning models, modifying their final layers to adapt to this specific task.The proposed paradigm is illustrated in full in Figure 2.
We suggested ResNet (Residual Network) as a solution to the problem of vanishing gradients that occurs during deep convolutional network training.ResNet introduces skip connections, which use an identity function to bypass nonlinear transformations.This one-of-a-kind architectural feature allows for the training of much deeper networks with less computational effort.The benefit of ResNet is that it avoids the vanishing gradient problem that plagues traditional networks by propagating gradients from one layer to the next.Furthermore, densely linked convolutional networks, a novel connectivity pattern, were introduced [24].This pattern enhances interlayer communication even further, contributing to improved network performance.ResNet capitalizes on the network's potential by reusing features, rather than relying solely on extremely deep or wide architectures to enhance representational power.This approach results in streamlined models that are easier to train and more efficient in terms of parameter usage.Additionally, it has been demonstrated that these feature maps effectively integrate information from previous layers, thereby increasing input variation and enhancing overall network efficiency.Consider a convolutional network being applied to a single image, X0.The network consists of L layers, where l indexes the layer and Hl(.) denotes a non-linear transformation implemented by each layer.As a result, the feature maps from all previous layers, including X0, X1, … and Xl, are input to the lth layer.
This network was built using a time-saving strategy that took visual attention into account.This work employed a convolutional neural network architecture with a connectionwise attention mechanism, allowing for flexible feature map integration through weighted summation.Weights for this summation were learned automatically during training, enabling the model to optimize feature representation for the task at hand.By focusing on the most useful information, this has made the network simpler and more effective [25][26][27][28].The i th layer of the convolution neural network received a weighting coefficient W in accordance with Eq. ( 2), where Wi represented an attention vector composed of i-1 elements.Formula 3, in which the feature maps from the j th layer and Hl were represented by xj (1<j<l-1).was a non-linear transformation that connection-wise focused on the layout of the lth layer.
Wi=[wi−1, i, wi−2, i, ⋯, w2, i, w1, i] (2) There were four distinct layers in the network.First, there was the layer that the network received the picture patches through called the input layer.The second kind of layer was a convolutional one, which used both the input images and the learnt filters to generate feature maps for each filter.

Transfer learning for classification
Leveraging transfer learning, the proposed approach achieved 3-way AD classification as depicted in Figure 3.
When we have a large dataset from which model learn all parameters, we can switch to transfer learning.A trained network, such as Resnet50v2, is used as a starting point for learning a new task.After training on ImageNet, the Resnet50v2 model was used ADNI dataset.The frozen fully connected layers produced 890 features and outputs for 3 classes, necessitating a transfer learning approach.To accommodate four-class categorization, the model's architecture underwent modification.The final layers were replaced with a new fully connected layer, a SoftMax layer, and an output layer specifically designed for multi-class handling.The network was then trained using a dataset of magnetic resonance images and optimized training parameters [29].After training, the model's accuracy was evaluated to assess its effectiveness in making correct classifications.To measure model performance and guide training, loss was calculated using the Cross-Entropy function.This function ensures that the model's output dimensions match the number of classes being classified.X denotes the feature space, and P(X) denotes the associated marginal probability [30][31][32][33][34].In P(X), X={x1, x2, ..., xn}, where n denotes the number of input images.The domain is represented mathematically.

Domain = {X, P(x)}
Within two distinct domains, the ways in which features were distributed and represented differed significantly.To formalize a specific task within a domain, a set of potential labels (W) and a prediction function (f(.)) were employed.

Task = {W, f(. )}
The model's prediction function (f(.)) was trained on features extracted from the data, enabling it to make predictions on unseen test data.The proposed framework involved two domains: a target domain (Domain_p) and a source domain (Domainq).Data points in the source domain with label wsi were designated as xsi, while those in the target domain with label wti were designated as xti.The target domain and the source domain is formulated as follows: = {( 1 ,  1 ), ( 2,  2 ), . . . .( ,   )} = {( 1 , w), ( 2,  2 ), . . . .( ,   )} Transfer learning shines as a powerful technique for building predictive models (f(.)).It leverages insights gleaned from past tasks and domains (source activities and domain) to efficiently train the model and accurately predict labels for new data points (x).f(x) was represented mathematically as () = (   ⁄ ).

End For ➢ Output
The fine-tuned model achieved a high level of accuracy in the classification of test dataset images.

InceptionResNetV2
Inception-ResNet-V2 framework is built around the Residual Inception Block.each block is followed by a meticulous dimensionality check.After each block, an 11convolution filter expansion layer ensures accurate input depth representation before summation.To maintain harmony, batch normalization is selectively applied to conventional layers only.This intricate network elegantly accepts 113×137×113 pixel inputs and orchestrates a symphony of 164 layers.At the heart of this innovation lies the Residual Inception Block (RIB), a masterful ensemble of diverse convolutional filters and residual connections [35].This architecture takes advantage of residual connections to combat severe network deterioration and quicken the training process.Since there were no tuning parameters in this core design, Max Pooling was performed to reduce overfitting in the convolutional structure by increasing the correlation between feature importance and label category [36,37].This means that max Pooling outperforms the Flatten method in terms of parameter efficiency.To safeguard against overfitting and promote model generalization, a Dropout layer is strategically injected, wielding a constant value of 0.8.

𝜎(𝑥) 𝑖 =
∑     =1 (6) The activation function, applied to the dense layer, transformed its outputs into probability distributions across K classes, as specified in Eq. ( 6), which in this case is e=2.718, the dense layer was activated using the SoftMax activation function.

Proposed ResNet50V2 with TL
ResNet50v2, a star performer in the computer vision world, sits alongside champions like VGG16, DenseNet121, Xception, and MobileNetV2.Built upon vast datasets of diverse images, these pre-trained models offer their expertise through transfer learning algorithms, even with limited data and resources.In this study, we leverage a large medical image dataset and perform transfer learning with ten distinct pretrained weights derived from ResNet50v2.ResNet50v2, a Convolutional Neural Network (CNN) boasting 50 layers, forms the backbone of our exploration.Figure 4 unveils its architecture, alongside our fine-tuning setup for transfer learning.
ResNet50v2 architecture features a series of convolutional layers, starting with an initial layer using 64 kernels, a stride of 2, and a 7×7 filter, followed by 3×3 pooling.It then stacks multiple sets of convolutional layers, each set containing three layers: 1×1 convolutions, 3×3 convolutions, and a final 1×1 convolution that increases channel depth.This pattern repeats with progressively larger kernel numbers and more repetitions through the network.Max pooling and hidden layers combining convolutions, batch normalization, and ReLU activations further refine the features.Notably, the original fully connected layer with 1000 out-features is replaced with a group of fully connected layers to enhance the model's performance.To adapt ResNet50v2 for three-class dementia classification (non-demented, mild, moderate, very mild), the original final layer is replaced with a custom dropout scheme.This involves selecting the first 2048-feature layer with a 50% chance of inclusion, followed by a ReLU layer and another dropout layer with the same probability.Finally, a final fullyconnected layer with 4 outputs maps features to the specific dementia classifications.Notably, this study explored transfer learning, utilizing 10 pre-trained ResNet50v2 models from diverse medical image datasets to optimize performance for this specific task.

RESULTS
Developing a model for classifying MRI scans and detecting Alzheimer's disease often involves transfer learning, leveraging pre-trained weights from a larger model.This project utilized TensorFlow, a popular framework for building and training machine learning models.The process involved feeding 3256 MRI images into the network.In this study, we employed the Stochastic Gradient Descent with Momentum (SGDM) optimizer to play this role for our Alzheimer's detection model.This technique delicately fine-tuned the model's weights and biases, guiding it towards minimizing the loss function and maximizing accuracy.The learning process unfolded over 50 epochs, each meticulously reviewing the entire dataset 107 times.To ensure the model didn't overfit and memorize specific patterns, we relied on a small batch size of 512 images.This, coupled with an early stopping parameter of 4 on the validation set, acted as a safeguard against clinging to irrelevant details, promoting robust generalization to unseen data.Through this careful choreography of optimization, data exposure, and safeguard measures, we empowered the model to achieve impressive accuracy in Alzheimer's detection, paving the way for more confident diagnoses and improved patient care.This allows the model to stop training if it's not improving on the validation data, preventing it from memorizing the training set without generalizing well to unseen data.Choosing the optimal learning rate plays a crucial role in balancing convergence speed and accuracy.Experiments revealed that while the model achieved its best performance at a learning rate of 1e-4, this was still significantly faster than the average.A learning rate of 1e-4 was therefore used across all models for consistency.Evaluating the performance of a classification model goes beyond simple accuracy.To gain deeper insights, a confusion matrix was used to assess precision, recall, and other metrics for each class.This comprehensive analysis provides a more nuanced understanding of the model's strengths and weaknesses.Overall, this project explored six different models, each trained on a balanced dataset of 1200 MRI scans (400 per category) for 50 epochs.70% of the data was used for training, with the remaining 30% reserved for testing.By utilizing transfer learning, optimizing training parameters, and employing advanced evaluation techniques, this approach provides a valuable framework for building and refining Alzheimer's disease detection models.Both InceptionResnetV2 and ResNet50V2 achieved strong performance on all evaluation metrics, as detailed in Tables 2   and 3. Notably, ResNet50V2 excelled in classification, achieving an impressive 91.25% testing accuracy.This surpassed even state-of-the-art models like VGG16 and Xception, as shown in the same tables.In Table 4 with its outstanding 92.15% training accuracy and top 91.25% testing accuracy, ResNet50V2 emerged as the clear winner for classifying Alzheimer's disease.
Figure 5 depicts comparative results graphically: ResNet50v2 reigns supreme in classifying 3D MRI images!Among all the models tested, it boasts the highest training and testing accuracy, a true testament to its prowess.While InceptionResNetV2 follows closely behind, securing the second-highest accuracy, ResNet50v2 clearly sets the benchmark.Compared to other contenders like VGG16, DenseNet121, Xception, and MobileNetV2, ResNet50v2 emerges as the undisputed champion.This visual illustration reinforces the data, leaving no doubt about ResNet50v2's exceptional capabilities in tackling the challenge of accurate Alzheimer's disease detection through MRI analysis.

Figure 5. Graphical representation for comparative results of
Alzheimer's disease classification

CONCLUSIONS
Deep learning shines in identifying Alzheimer's disease from MRI scans, as this study demonstrates.Unlike humans, complex deep neural networks excel at navigating vast, intricate datasets, offering a powerful, data-driven approach to problem-solving in medical research.Their potential lies in automating tasks for neurologists while minimizing human error.Here, we applied transfer learning to classify MRI images into two categories using InceptionResNetV2 and Resnet50v2, both trained on existing datasets.Both models successfully categorized the data, with the proposed model showcasing exceptional performance: 92.15% training accuracy and 91.25% testing accuracy, surpassing other models.These results solidify Resnet50v2 with transfer learning as a champion for classifying 3D MRI images.Looking ahead, exploring advanced deep learning models and hyperparameter tuning on diverse datasets holds immense promise for even more accurate Alzheimer's disease detection.

Figure 2 .Figure 3 .
Figure 2. Basic architecture of proposed methodology : ➢ Input P(Y), Y={y1, y2, ..., yn}: Probability distribution of samples in the dataset, where Y represents the collection of samples.➢ Pre-Training For each sample in the dataset: • Utilizing a pre-trained source domain (Ds) network with embedded knowledge.• Preparing target domain (Dt) training and validation sets for model adaptation.• Initiating knowledge transfer via training and validation on these sets.End for ➢ Fine-Tuning For each feature f(y): • Perform model customization for the target domain via fine-tuning of designated layers, targeting {Y, P(y)}.• Optimize target task performance through further fine-tuning utilizing the training dataset (Dt).• Conduct model evaluation on unseen images using the test dataset (Dt) to gauge categorization efficacy.

Figure 4 .
Figure 4. Enhanced architecture of the proposed model: Modified ResNet50V2 with 2PTL

Table 1 .
Analysis of dataset for implementing models

Table 2 .
Performance evaluation table for InceptionResnetv2

Table 3 .
Performance evaluation table for Resnet50v2

Table 4 .
Accuracy comparison of different models onAlzheimer's disease MRI images