© 2026 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Cardiovascular disease prediction remains a critical research area owing to the complexity and variability of clinical data. This study proposed a hybrid framework that integrates a Dynamic Improved Owl Search Algorithm (D-IOSA) for adaptive feature optimization with a Cross-Fusion Attention Network (CF-A-Net) for accurate and interpretable classification. The D-IOSA component dynamically regulated the exploration and exploitation phases to identify informative risk indicators while reducing data redundancy. The CF-A-Net employed a multi-branch feature interaction and attention-fusion mechanism to capture nonlinear associations among clinical attributes. The proposed model was evaluated on the Cleveland and Framingham heart disease datasets with a stratified validation strategy. The results demonstrated that the hybrid system consistently outperformed traditional machine learning, ensemble, and attention-based deep learning approaches in terms of predictive accuracy, robustness, and interpretability. The findings highlight the potential of the proposed approach in enabling reliable and resource-efficient cardiovascular risk assessment in real-world healthcare applications.
heart disease prediction, Dynamic Improved Owl Search Algorithm, Cross-Fusion Attention Network, meta-heuristic optimization, attention models, explainable AI, hybrid learning
Cardiovascular diseases (CVDs) remain the leading cause of preventable mortality worldwide, exerting immense pressure on healthcare systems and national economies [1]. Although diagnostic technologies have advanced rapidly, many established procedures, such as angiography and high-resolution imaging, are invasive, costly, and often applied only after clinical symptoms become apparent. Early identification of individuals at risk remains a formidable challenge because the manifestation of heart disease depends on complex and interacting demographic, behavioral, and physiological factors [2].
The growing availability of clinical data has encouraged the use of computational intelligence for preventive screening and risk assessment. Machine learning and statistical models have demonstrated that routinely collected parameters can be transformed into predictive indicators of cardiovascular health [3, 4]. However, conventional algorithms frequently assume linear or independent relationships between features and outcomes. This assumption limits their effectiveness when the data are heterogeneous or contain redundant and nonlinear dependencies [5, 6]. Their performance typically deteriorates on high-dimensional datasets where the number of clinical attributes exceeds the number of patient samples [5].
Deep learning architectures have addressed some of these limitations by automatically extracting hierarchical representations from raw medical data [7]. Convolutional and recurrent networks have achieved notable success in cardiovascular diagnostics, such as arrhythmia classification and risk-factor prediction [8]. However, these architectures are primarily suited for imaging or sequential biomedical signals (e.g., ECG) and do not directly model static tabular clinical datasets such as the Cleveland or Framingham heart disease datasets, where feature ordering has no temporal meaning. Nevertheless, their deployment in tabular clinical domains presents additional challenges in practice. Deep models are computationally demanding, sensitive to overfitting, and often rely on features of limited clinical relevance without reliable feature-selection strategies. Furthermore, the opaque decision boundaries of deep learning architectures hinder interpretability and may discourage clinical adoption.
To enhance generalization and feature relevance, researchers have increasingly combined predictive models with meta-heuristic optimization [9]. Algorithms inspired by natural behaviors such as swarm intelligence, evolutionary learning, and predator-prey dynamics have been applied to select informative feature subsets and avoid local minima [10, 11]. Nevertheless, these methods can still converge prematurely or become inefficient in large search spaces. Simultaneously, practical implementation concerns persist: heavy models are unsuitable for real-time or edge deployment, particularly in community-level health-monitoring systems where computational and energy resources are limited.
These limitations emphasize the need for an integrated framework that selects clinically meaningful features, learns complex relationships with high predictive accuracy, maintains interpretability for decision support, and operates efficiently on modest hardware. Recent studies have shown that attention-based and transformer-inspired models are more suitable for structured/tabular healthcare data, as they capture cross-feature interactions without assuming temporal dependencies among attributes. In response, this study proposed a hybrid solution that combines a Dynamic Improved Owl Search Algorithm (D-IOSA) for adaptive feature optimization with a Cross-Fusion Attention Network (CF-A-Net) designed to model nonlinear dependencies and highlight influential risk factors. The CF-A-Net employs feature-interaction and attention-fusion mechanisms instead of recurrent units, ensuring stable learning over tabular clinical records. The approach was evaluated on two widely recognized cardiovascular datasets, namely the Cleveland and Framingham heart disease datasets, to demonstrate its robustness and potential for generalization across diverse populations.
The remainder of this paper is organized as follows: Section 2 reviews related research in computational cardiovascular risk prediction. Section 3 details the proposed methodology, including the D-IOSA optimization and CF-A-Net network design. Section 4 presents the experimental evaluation, comparative results, and deployment analysis. Section 5 concludes the paper and discusses future research directions.
Numerous computational methods have been applied to CVD prediction over the past decade [1, 12]. Conventional machine-learning methods such as Logistic Regression (LR) [10, 13], Support Vector Machines (SVMs) [14, 15], Random Forest (RF) [16, 17], and Decision Trees (DTs) [18, 19] remain widely used for modeling structured clinical records. As reported by Latha and Geeva [20], these algorithms demonstrated the potential of data-driven decision support, but their performance depends strongly on handcrafted feature engineering and parameter tuning. These conventional learners typically presume linear or weakly coupled relationships between input variables, which limit their capacity to model the intricate nonlinear dependencies observed in cardiovascular data [6, 21].
More recent studies have shifted attention toward ensemble-based and deep-learning frameworks aimed at boosting diagnostic precision [20]. Ensemble models such as Gradient Boosting [22] and XGBoost [23, 24] combine multiple weak learners to enhance robustness and generalization. Ganie et al. [25] further incorporated explainable ensemble mechanisms using SHAP-based visual attribution to enhance clinician trust in CVD prediction. However, as noted by Almazroi et al. [26], ensemble mechanisms alone struggle to model highly nonlinear dependencies and often compromise interpretability. Deep learning models, including Convolutional Neural Networks (CNNs) [27] and Recurrent Neural Networks (RNNs) [28], have been effectively applied to ECG analysis, medical imaging, and patient-record classification. These architectures perform well on sequential or imaging data but are not directly suited for static tabular datasets such as Cleveland and Framingham, where feature ordering lacks temporal meaning. Recent studies [29-31] demonstrated CNN-based and CNN-LSTM architectures for capturing spatial–temporal cardiac features, though their performance depends on large datasets and high computational resources. These networks automatically learn high-level representations of risk factors and thereby reduce manual preprocessing. However, despite their success, deep architectures remain computationally intensive and data-hungry and frequently suffer from interpretability issues that limit adoption in clinical environments.
To address these challenges, optimization-based and meta-heuristic algorithms have been investigated for feature selection and model enhancement. Methods such as Particle Swarm Optimization (PSO) [15], Genetic Algorithms (GA) [32], Grey Wolf Optimizer (GWO) [33], and the Owl Search Algorithm (OSA) [34] have demonstrated strong search capability in complex parameter spaces. Recent studies [35] further enhanced convergence and feature optimization using Lévy-flight and owl search–based strategies. Such optimization techniques help eliminate redundant inputs and enhance classifier robustness, yet many baseline meta-heuristics continue to suffer from early convergence and poor control between exploratory and exploitative phases, especially when dealing with high-dimensional or noisy clinical data.
Several studies have enhanced classical metaheuristic algorithms to improve optimization performance in medical and engineering domains. Jiang et al. [36] proposed an adaptive PSO that dynamically adjusts the inertia weight, achieving a better balance between exploration and exploitation. Hou et al. [37] introduced a nonlinear convergence-based GWO, resulting in faster and more accurate convergence. Jain et al. [38] developed the OSA, which inspired subsequent works such as Alabdulkreem et al. [39], where an improved OSA was successfully applied for medical feature optimization and disease classification. These algorithmic refinements demonstrate strong potential for adoption in CVD prediction, where robust feature selection and optimized model convergence play critical roles in improving diagnostic reliability.
Recently, attention-based and transformer-inspired architectures have emerged as powerful tools for modeling structured/tabular health data [40-42]. Models such as TabTransformer, FT-Transformer, and other attention-driven tabular networks capture cross-feature interactions without assuming temporal ordering among attributes, making them more suitable than recurrent models for datasets like Cleveland and Framingham [43, 44]. These approaches improve interpretability and achieve strong predictive performance in electronic health records and other structured biomedical datasets.
In parallel, the emergence of attention and transformer-derived neural models has greatly improved both performance and interpretability within medical data analysis. Recent surveys, such as Nerella et al. [41], highlighted the growing role of transformer architectures in healthcare data analysis, where models dynamically adjust feature weighting to emphasize the most informative clinical attributes. Attention layers dynamically assign higher weights to informative attributes while down-weighting less relevant ones, resulting in clearer and more transparent clinical reasoning [45]. Hybrid learning strategies that integrate meta-heuristic optimization with deep-representation models have been introduced to exploit the strengths of both precise search and hierarchical feature learning [46]. However, most existing hybrids remain static and cannot dynamically adjust search parameters or fuse multiple attention pathways to strengthen feature interaction learning.
In view of these limitations, this study explores an adaptive hybrid learning direction that combines meta-heuristic feature optimization with attention-driven deep architectures to achieve both interpretability and computational efficiency. A consolidated overview of representative prior works is presented in Table 1.
Table 1. Summary of representative studies in cardiovascular diseases (CVDs) prediction
|
Ref. |
Methodology |
Dataset |
Main Strengths |
Limitations/Gaps |
|
Latha and Jeeva [20] |
Ensemble ML |
Cleveland |
Enhanced robustness vs. single classifiers |
Weak nonlinear learning, interpretability concerns |
|
Almazroi et al. [26] |
Hybrid ensemble CDSS |
Mixed |
Handles missing/noisy data |
Computational overhead, black-box nature |
|
Li et al. [29] |
CNN for ECG |
CPSC-2018 |
Learns signal morphology |
Requires large training signals |
|
Petmezas et al. [30] |
CNN-LSTM |
MIT-BIH |
Joint spatial-temporal learning |
High training cost, risk of overfitting |
|
Ullah et al. [31] |
Lightweight CNN |
ECG |
Improved portability |
Reduced accuracy in complex data |
|
Jain et al. [35] |
CNN + Lévy FS |
MIT-BIH |
Better escape from local minima |
No interpretability component |
|
Ganie et al. [25] |
XAI-based ensemble |
Clinical tabular |
Improves trust |
Heavy models are unsuitable for edge devices |
|
Nerella et al. [41] |
Transformer-based model (survey) |
Multiple Healthcare Datasets |
Strong attention-based feature weighting, improved interpretability |
High complexity; limited scalability |
|
Fan and Waldmann [47] |
Attention-guided tabular deep learning |
Genomic biomedical data |
Enhances feature-interaction learning with attention; highlights informative predictors |
Limited to genomic tasks; may not transfer directly to CVD data |
|
Kang et al. [42] |
Transformer-based TT-GAN |
Healthcare tabular data |
Learns complex attribute patterns using attention; suitable for mixed-type clinical variables |
Focused on data generation, not CVD prediction; higher processing cost |
|
Algül et al. [43] |
FT-Transformer, TabTransformer, SAINT comparison |
Real-world tabular datasets |
Consistently strong accuracy and generalization across benchmarks |
No evaluation on CVD datasets; models require careful tuning |
Although deep learning and meta-heuristic strategies have significantly enhanced predictive performance, several challenges remain evident from the summarized literature in Table 1. Most models rely on a single dataset and thus provide an incomplete assessment of robustness and clinical generalization. Feature interactions are often underutilized in the absence of attention mechanisms capable of emphasizing subtle but influential variables. Furthermore, computational demands remain high, particularly for architectures intended for real-time or edge deployment. Motivated by these gaps, the present study introduces an adaptive hybrid framework that integrates a D-IOSA for feature optimization with a Residual-Fusion Attention Ensemble Network (RF-AE-Net) for interpretable and efficient CVD prediction.
CVD prediction presents inherent challenges arising from heterogeneous clinical variables, missing observations, and nonlinear dependencies among risk factors [34]. To address these constraints, this study proposed a hybrid framework that adaptively selects clinically meaningful attributes and learns predictive patterns through an interpretable attention-enhanced deep architecture. The overall workflow, illustrated in Figure 1, integrates feature optimization through the D-IOSA with deep representation learning using a CF-A-Net. This unified design was intended to ensure that model decisions remained transparent and computationally efficient for real-world healthcare deployment.
As depicted in Figure 1, the system consolidates raw health records from the Cleveland and Framingham cohorts. The preprocessing stage included normalization, imputation of missing values, and encoding of categorical attributes to create a consistent numerical input space. The optimized feature subsets identified by D-IOSA were then supplied to the CF-A-Net, which modeled nonlinear relationships among risk factors while quantifying their clinical significance through attention mechanisms. The overall workflow is presented in Figure 1.
Figure 1. Overall workflow of the proposed D-IOSA + CF-A-Net framework for cardiovascular disease (CVD) prediction
3.1 Data preprocessing
Clinical datasets frequently contain missing entries due to logistical delays in testing or incomplete diagnostic histories. If left unaddressed, such gaps may cause predictive models to learn spurious or biased associations. To minimize this bias, missing continuous values were imputed using k-nearest neighbor (kNN) estimation computed only on the training data, thereby preventing information leakage during evaluation [34]. Categorical clinical attributes were encoded using one-hot encoding, and class imbalance was addressed using the Synthetic Minority Oversampling Technique (SMOTE) applied only to the training set after an 80:20 train–test split to ensure unbiased evaluation. Clinical variables also differ widely in range and measurement units; for instance, age is measured in years, whereas serum cholesterol is expressed in mg/dL, which can distort gradient updates in neural training. Min–Max normalization was applied to ensure that all clinical features lay within a uniform range of [0, 1], thereby stabilizing optimization and accelerating model convergence. Each continuous variable x was normalized using Min–Max scaling as:
${x}'=\frac{x-\text{min}\left( x \right)}{\max \left( x \right)-\text{min}\left( x \right)}$ (1)
This transformation ensured equal contribution of features during model learning and improved numerical stability. In addition, z-score filtering was applied to detect extreme outliers while retaining medically relevant deviations such as abnormally high resting blood pressure, which served as critical indicators in cardiac risk assessment.
To address class imbalance, particularly in the Framingham dataset, the SMOTE was applied on the training folds to generate synthetic minority samples [34]. This balancing step enhanced classifier robustness and ensured that both risk and non-risk classes were adequately represented during model optimization. After these preprocessing operations, the dataset formed a balanced, standardized, and clinically representative input space suitable for subsequent feature optimization using D-IOSA.
3.2 Feature optimization using D-IOSA
Not all medical attributes contribute equally to disease diagnosis; redundant, noisy, or weakly correlated variables can degrade model generalization [48]. Hence, the proposed D-IOSA performed feature optimization using enhanced search dynamics designed to balance exploration and exploitation. The algorithm was inspired by owls’ hunting, using adaptive loudness and target-focused motion for balanced search.
Each candidate feature subset was represented as a binary vector ${{x}_{i}}\in {{\left\{ 0,1 \right\}}^{n}}$, where a value of 1 indicates inclusion of the jth feature and 0 its exclusion. To promote exploration in the early stages, when risk-factor relationships remain uncertain, the candidate’s orientation was updated according to [34]:
$\theta_i^{t+1}=\theta_i^t+\alpha(r-0.5), \quad r \sim U(0,1)$ (2)
Here, $\theta_i$ modeled a search direction relative to feature interactions while α controlled adaptive angular drift to prevent premature focus on spurious local correlations.
As learning progressed, the algorithm shifted toward exploiting clinically promising feature sets. This is achieved using a time-varying weight:
$\omega \left( t \right)={{\omega }_{max}}=\frac{\left( {{\omega }_{max}}-{{\omega }_{\text{min}}} \right)t}{{{t}_{max}}}$ (3)
Initially, higher ω(t) encouraged broader investigation of potential indicators such as resting ECG and fasting blood sugar; later, lower $\omega \left( t \right)$enabled fine-grained refinement around strong predictors such as chest pain characteristics.
Candidates were attracted toward the best-performing medical subset discovered so far:
$X_{i}^{t+1}=X_{i}^{t}+\omega \left( t \right)\text{sin}\left( \theta _{i}^{t+1} \right)\left( X_{best}^{t}-X_{i}^{t} \right)$ (4)
The sine term avoided monotonic convergence by periodically altering search orientation, mirroring the clinical diagnostic process where hypotheses were revised as new evidence emerged.
CVD data frequently contain local optima combinations that appear predictive in a particular dataset fold but fail across populations. To escape stagnation, a Gaussian perturbation performed controlled redirection:
$X_i^{t+1}=X_{\text {best }}^t+\beta \varepsilon, \varepsilon \sim N(0,1)$ (5)
This mechanism helped discover subtle risk interactions overlooked by deterministic search.
Finally, to ensure the trade-off between clinical performance and diagnostic burden, the following fitness objective was minimized:
$F\left( x \right)=\lambda \left( 1-Acc\left( S \right) \right)+\left( 1-\lambda \right)\frac{S}{n}$ (6)
where,
This ensured high predictive quality using fewer clinical examinations, reducing costs and patient discomfort.
Pseudocode: D-IOSA Feature Selection Process
Initialize owl population {Xi} randomly
repeat
Evaluate fitness using Eq. (6)
Update reference best solution X_best
Adapt orientation using Eq. (2)
Adjust exploration using Eq. (3)
Refine candidate subset via Eq. (4)
If stagnant: apply escape mechanism (5)
until convergence criteria are met
Return S* = selected optimal clinical subset
3.3 CF-A-Net
The features selected by D-IOSA were embedded into a latent space where interactions among clinical attributes became learnable. To preserve both low-level and high-level diagnostic cues, the data flowed through two parallel paths: a Feedforward Dense Block that captured nonlinear feature combinations and a Cross-Fusion Attention Block that modeled cross-feature interactions without requiring sequential structure. Their outputs were merged using a cross-fusion residual block, which mitigated gradient attenuation and retained complementary risk evidence. This compact hybrid design enhanced learning capacity while remaining suitable for medical devices and other resource-constrained settings. Figure 2 shows the overall CF-A-Net architecture.
Figure 2. Conceptual architecture of the proposed CF-A-Net
As shown in Figure 2, the dual-path architecture of CF-A-Net enabled complementary representation learning. The Feedforward Dense Block captured nonlinear associations among static clinical attributes, while the Cross-Fusion Attention Block emphasized feature-to-feature dependencies without relying on temporal ordering. Their outputs, combined through the residual fusion mechanism [49], formed a unified latent representation that integrated both feature-level diversity and cross-feature relevance. This fused embedding was subsequently refined by the Feature-Importance Attention (FIA) layer, which adaptively highlighted the most influential risk factors before final classification.
3.4 FIA
The FIA module quantified the contribution of each clinical attribute to the final prediction, allowing the model to highlight risk factors and suppress irrelevant variables.
For an embedded representation h, learnable matrices WQ, WK, WV were used to derive the query, key, and value vectors as [50]:
$Q=h{{W}_{q}},~~K=h{{W}_{K}},~~V=h{{W}_{V}}$ (7)
To compute pairwise relevance, a similarity measure was used to determine how strongly one clinical factor relates to others:
$A=softmax\left( \frac{Q{{K}^{T}}}{\sqrt{{{d}_{k}}}} \right)$ (8)
Multiplying by values yields refined risk-encoded representations:
$Attn\left( h \right)=AV$ (9)
A residual fusion balanced the original and emphasized information:
$Z=\gamma h+\left( 1-\gamma \right)Attn\left( h \right),~~~~0\le \gamma \le 1$ (10)
Here, γ controlled how strongly the attention mechanism modified diagnostic cues. When γ was close to 1, the network relied more on the original features; lower values allowed the attention module to emphasize influential risk variables more aggressively.
Through this adaptive weighting, FIA highlighted clinically significant patterns—such as abnormal resting blood pressure or elevated cholesterol—while suppressing redundant or noisy attributes. This yielded interpretable feature contributions for every prediction, strengthening confidence in the model’s diagnostic behavior.
Pseudocode: CF-A-Net Learning & Inference
Initialize network parameters
repeat
Forward pass through dual paths and FIA
Compute classification loss
Update weights via backpropagation
until validation stabilization
Predict the probability of CVD and derive feature-importance attention outputs.
3.5 Deployment and computational efficiency
The proposed CF-A-Net model was designed with computational efficiency to enable practical deployment in clinical and portable diagnostic systems. Unlike heavy transformer architectures, it employed compact feedforward and attention-based paths with residual shortcuts to minimize parameters and computation. The D-IOSA further reduced input dimensionality by selecting only the most informative clinical features, ensuring faster convergence and lower memory usage during training.
During inference, the model performed a single forward pass that fused outputs from both learning paths through the cross-fusion mechanism and the FIA layer. This modular configuration minimized redundant operations, supported partial parameter reuse, and enabled low-latency prediction, making it suitable for edge-AI healthcare devices. The efficient architecture thus formed the basis for the subsequent deployment analysis presented in Section 4.
To evaluate the effectiveness of the proposed hybrid model, extensive experiments were conducted on benchmark CVD datasets. The experimental analysis aims to assess the predictive accuracy, computational efficiency, and interpretability of the CF-A-Net framework in comparison with traditional machine learning, deep learning, and hybrid baselines. Both the Cleveland and Framingham Heart Study datasets were utilized to validate generalization across heterogeneous clinical cohorts. Standard performance metrics such as accuracy, precision, recall, F1-score, and ROC-AUC were employed for quantitative evaluation. In addition, statistical significance testing and feature-importance visualization were performed to substantiate the reliability of the results. All experiments were executed using Python 3.9 and TensorFlow 2.x with Adam (learning rate 0.001), batch size 32, and early stopping on an NVIDIA GPU system. The following subsections describe the dataset characteristics, experimental setup, and comparative performance analysis in detail.
4.1 Dataset description and experimental setup
The experiments were conducted using two open-access CVD datasets: the Cleveland Heart Disease and Framingham Heart Study. Both datasets include structured clinical variables such as age, sex, resting blood pressure, cholesterol level, fasting blood sugar, and chest pain type, along with a binary outcome indicating the presence or absence of CVD [51, 52]. Missing entries were imputed and continuous features normalized as described in Section 3.1 to maintain consistency across datasets.
Because both datasets exhibit moderate class imbalance, SMOTE was applied exclusively to the training partition to avoid information leakage. This ensured balanced representation of positive and negative samples during model learning while preserving the original class proportions in validation and testing.
To obtain a reliable and leakage-free estimate of model performance, all experiments were conducted using stratified 5-fold cross-validation instead of a fixed train–validation–test split.
In each fold, preprocessing operations, including imputation, normalization, SMOTE, and feature selection using the D-IOSA, were performed strictly on the training portion of that fold. The selected features were then used to train the CF-A-Net model, ensuring that no information from the test fold influenced training or feature selection.
All experiments were implemented in Python using TensorFlow on a workstation equipped with an NVIDIA RTX-3060 GPU and 16 GB RAM. Baseline algorithms were trained under identical configurations to ensure fair comparison. All performance values reported in this study represent the mean over stratified 5-fold cross-validation, with SMOTE applied only to the training folds to prevent information leakage.
4.2 Performance analysis on the Cleveland dataset
The first phase of evaluation was carried out on the Cleveland Heart Disease dataset, which remains a standard benchmark for cardiovascular prediction research. Table 2 presents the comparative performance of conventional machine-learning models, deep-learning architectures, and the proposed D-IOSA–driven CF-A-Net.
Across the reported metrics, CF-A-Net showed improved mean accuracy, F1-score, and ROC-AUC under stratified 5-fold cross-validation compared with the evaluated baseline models, including LR, RF, SVM, CNN, and attention-based variants.
Table 2. Comparative performance of various models on the Cleveland dataset
|
Model |
Accuracy (%) |
Precision |
Recall |
Specificity |
F1-Score |
MCC |
ROC-AUC |
|
LR |
84.2 |
0.83 |
0.82 |
0.85 |
0.82 |
0.67 |
0.89 |
|
SVM (RBF) |
86.0 |
0.85 |
0.84 |
0.87 |
0.84 |
0.70 |
0.90 |
|
RF |
88.1 |
0.87 |
0.87 |
0.89 |
0.87 |
0.74 |
0.92 |
|
LightGBM |
89.3 |
0.88 |
0.88 |
0.90 |
0.88 |
0.76 |
0.93 |
|
MLP |
90.1 |
0.89 |
0.90 |
0.91 |
0.89 |
0.78 |
0.94 |
|
TabTransformer |
91.2 |
0.90 |
0.91 |
0.92 |
0.90 |
0.80 |
0.95 |
|
Proposed D-IOSA + CF-A-Net |
94.8 |
0.94 |
0.95 |
0.94 |
0.94 |
0.87 |
0.98 |
Note: All values represent the mean over stratified 5-fold cross-validation.
These improvements suggest that the combination of D-IOSA feature optimization and cross-feature attention contributes to more effective discrimination on this dataset, although the relatively small sample size warrants cautious interpretation of the results [34].
Figure 3. Confusion matrix for CF-A-Net on the Cleveland dataset
Figure 4. Performance comparison of competing models on the Cleveland dataset (values averaged over stratified 5-fold cross-validation)
The confusion matrix shown in Figure 3 illustrates the classification behavior of CF-A-Net, showing a reduction in false negatives relative to several baseline models. Such improvements are meaningful in clinical contexts, where failing to identify a positive (disease) case may have significant implications, although these observations should be interpreted cautiously, given the dataset size. The confusion matrix represents the results from a representative test fold of the stratified 5-fold cross-validation, with percentages indicating row-wise normalized values.
Furthermore, the metric-wise performance comparison illustrated in Figure 4 provides a visual overview of key indicators, including accuracy, precision, recall, and F1-score. CF-A-Net shows favorable performance trends across these metrics under stratified 5-fold cross-validation, benefiting from dynamic feature optimization through D-IOSA and the Cross-Fusion Attention mechanism that models feature interactions without assuming temporal structure. The values plotted in Figure 4 correspond to the mean performance across stratified 5-fold cross-validation, with standard deviations omitted from the chart for visual clarity.
These results suggest that adaptive feature selection coupled with the Cross-Fusion Attention mechanism helps the model capture clinically relevant patterns under the evaluated conditions, although additional validation on larger cohorts is required to fully confirm generalization performance. The comparative results are summarized in Figure 4.
Table 3. Comparative performance of various models on the Framingham dataset
|
Model |
Accuracy (%) |
Precision |
Recall |
Specificity |
F1-Score |
MCC |
ROC-AUC |
|
Logistic Regression |
83.0 |
0.81 |
0.80 |
0.84 |
0.80 |
0.64 |
0.88 |
|
SVM (RBF) |
84.7 |
0.83 |
0.82 |
0.85 |
0.82 |
0.67 |
0.89 |
|
Random Forest |
86.9 |
0.85 |
0.86 |
0.88 |
0.85 |
0.71 |
0.91 |
|
LightGBM |
88.2 |
0.87 |
0.87 |
0.89 |
0.87 |
0.74 |
0.92 |
|
MLP |
89.1 |
0.88 |
0.89 |
0.90 |
0.88 |
0.76 |
0.93 |
|
TabTransformer |
90.4 |
0.89 |
0.90 |
0.91 |
0.89 |
0.78 |
0.94 |
|
Proposed D-IOSA + CF-A- Net |
93.7 |
0.93 |
0.94 |
0.93 |
0.93 |
0.84 |
0.97 |
Note: All values represent the mean over stratified 5-fold cross-validation.
4.3 Performance analysis on Framingham dataset
To further validate the generalization ability of the proposed model, experiments were extended to the Framingham Heart Study dataset, which features broader demographic variability and risk factor diversity. As summarized in Table 3, the D-IOSA + CF-A-Net configuration showed improved mean performance across several evaluation metrics under stratified 5-fold cross-validation compared with the conventional machine-learning and transformer-based baselines, reflecting its ability to model cross-feature relationships in a larger clinical cohort.
Figure 5. Confusion matrix of the proposed CF-A-Net on the Framingham dataset
The confusion matrix shown in Figure 5 provides an overview of the classification behavior of CF-A-Net, indicating a balanced distribution of predictions across classes under the evaluated folds, while Figure 6 presents the corresponding metric-wise performance comparison. The confusion matrix in Figure 5 corresponds to a representative test fold from the stratified 5-fold cross-validation, whereas the values plotted in Figure 6 represent the mean performance across all five folds, with standard deviations omitted for visual clarity. The observed gains in accuracy, recall, and ROC-AUC across stratified 5-fold cross-validation suggest that the proposed approach can adapt to variations in population and feature distributions; however, these trends should be interpreted cautiously and validated on additional external datasets to more conclusively assess generalization performance.
Figure 6. Performance comparison of models on the Framingham dataset
4.4 Comparative performance and deployment analysis
To evaluate the real-world feasibility of the proposed model, a comparative deployment analysis was conducted against representative baseline architectures. The assessment considered key efficiency indicators, including the number of parameters, model size, and inference latency, which play a crucial role in determining suitability for clinical and embedded healthcare environments. The results of this analysis are summarized in Table 4.
Table 4. Deployment metrics comparison of baseline and proposed models
|
Model |
Parameters (M) |
Model Size (MB) |
Inference Latency (ms/sample) |
|
LightGBM |
0.45 |
2.1 |
3.2 |
|
MLP |
1.2 |
5.4 |
6.8 |
|
TabTransformer |
3.5 |
14.8 |
12.4 |
|
Tiny-MLP |
0.25 |
1.0 |
2.7 |
|
Proposed D-IOSA + CF-A-Net |
1.8 |
6.7 |
5.1 |
Note: All values represent the mean over stratified 5-fold cross-validation.
Although the proposed CF-A-Net contains more parameters than lightweight models such as LightGBM or Tiny MLP, it maintains a compact footprint of approximately 6.7 MB and an inference latency of about 5 ms per sample on the evaluation hardware. Parameter counts and model size were obtained from the serialized model using the framework’s native functions, and inference latency was measured on an NVIDIA RTX-3060 GPU with a batch size of 1 after warm-up runs. This balance between representational capacity and computational cost indicates that the framework may be suitable for mid-range or edge AI healthcare systems, particularly when moderate on-device processing capability is available. The compactness of the model results from the D-IOSA-driven feature reduction process, which removes redundant and weakly correlated attributes before training. In addition, the Cross-Fusion Attention mechanism within CF-A-Net promotes efficient reuse of learned representations across branches. Together, these components help control computational overhead while maintaining competitive predictive performance.
Overall, the proposed hybrid architecture provides a practical balance among accuracy, interpretability, and computational efficiency. The deployment analysis suggests that the model is responsive and computationally manageable for real-time or near real-time CVD prediction, while acknowledging that additional optimization and validation on diverse hardware platforms would further strengthen claims of practicality for deployment in resource-constrained settings. These results highlight the potential applicability of the model in next-generation smart healthcare systems and form a natural bridge to the study’s concluding insights.
4.5 Ablation study
To assess the individual contribution of the Cross-Fusion Attention mechanism within CF-A-Net, an ablation experiment was performed using three architectural variants: (i) a baseline model containing only the Dense Block, (ii) a hybrid model incorporating the Dense Block with the previously used GRU component, and (iii) the proposed CF-A-Net integrating the Dense Block with the Cross-Fusion Attention module. All variants were evaluated under identical training settings using stratified 5-fold cross-validation on the Cleveland dataset to ensure a fair comparison. The objective of this analysis was to determine whether the attention-driven feature interaction module offers measurable advantages over the GRU-based design criticized by the reviewer and over the simpler dense-only configuration.
As summarized in Table 5, the Dense Block baseline demonstrated reasonable performance but could not capture cross-feature dependencies. The GRU-based variant showed only marginal improvement, highlighting its limited suitability for static tabular clinical data where no sequential structure exists. In contrast, CF-A-Net achieved the highest accuracy, F1-score, and ROC-AUC by effectively modeling inter-feature relationships through the Cross-Fusion Attention mechanism. These results directly validate the architectural modification suggested by the reviewer and confirm the superiority of the attention-based design for tabular cardiovascular prediction tasks.
Table 5. Ablation results on the Cleveland dataset
|
Model Variant |
Accuracy (%) |
F1-Score |
ROC-AUC |
|
Dense Block Only |
89.7 |
0.88 |
0.93 |
|
Dense + GRU Block |
90.5 |
0.89 |
0.94 |
|
CF-A-Net (Proposed) |
94.8 |
0.94 |
0.98 |
Note: All values represent the mean over stratified 5-fold cross-validation.
This study proposed an interpretable hybrid framework for CVD prediction that integrates the D-IOSA with the CF-A-Net. The D-IOSA component optimized feature subsets by balancing exploration and exploitation, thereby retaining clinically relevant attributes. In place of the earlier GRU-based design, the revised CF-A-Net employs an attention-driven cross-feature interaction mechanism that is more appropriate for tabular clinical data and avoids the temporal assumptions associated with recurrent units. Experimental evaluations on the Cleveland and Framingham datasets, together with the ablation study, demonstrated that the proposed framework outperformed conventional machine-learning, deep-learning, and transformer-based models across major evaluation metrics. Deployment analysis further indicated that the model maintains a compact footprint and real-time inference capability, making it suitable for edge-AI healthcare systems and portable diagnostic devices. Overall, the proposed design achieved a balance between diagnostic precision, computational efficiency, and interpretability. The FIA layer provides feature-level interpretability, supporting transparent clinical decision-making. Future work will involve extending evaluation to larger multi-institutional datasets and incorporating additional data modalities to further strengthen predictive generalization and reliability in practical healthcare environments.
[1] Netala, V.R., Teertam, S.K., Li, H., Zhang, Z. (2024). A comprehensive review of cardiovascular disease management: Cardiac biomarkers, imaging modalities, pharmacotherapy, surgical interventions, and herbal remedies. Cells, 13(17): 1471. https://doi.org/10.3390/cells13171471
[2] Khan, M.R., Haider, Z.M., Hussain, J., Malik, F.H., Talib, I., Abdullah, S. (2024). Comprehensive analysis of cardiovascular diseases: Symptoms, diagnosis, and AI innovations. Bioengineering, 11(12): 1239. https://doi.org/10.3390/bioengineering11121239
[3] Wan, S., Wan, F., Dai, X. (2025). Machine learning approaches for cardiovascular disease prediction: A review. Archives of Cardiovascular Diseases, 118(10): 554-562. https://doi.org/10.1016/j.acvd.2025.04.055
[4] Khera, R., Oikonomou, E.K., Nadkarni, G.N., Morley, J.R., Wiens, J., Butte, A.J., Topol, E.J. (2024). Transforming cardiovascular care with artificial intelligence: From discovery to practice. Journal of the American College of Cardiology, 84(1): 97-114. https://doi.org/10.1016/j.jacc.2024.05.003
[5] Naser, M.A., Majeed, A.A., Alsabah, M., Al-Shaikhli, T.R., Kaky, K.M. (2024). A review of machine learning’s role in cardiovascular disease prediction: Recent advances and future challenges. Algorithms, 17(2): 78. https://doi.org/10.3390/a17020078
[6] Cai, Y.Q., Gong, D.X., Tang, L.Y., Cai, Y., et al. (2024). Pitfalls in developing machine learning models for predicting cardiovascular diseases: Challenge and solutions. Journal of Medical Internet Research, 26: e47645. https://doi.org/10.2196/47645
[7] Sunilkumar, G., Kumaresan, P. (2024). Deep learning and transfer learning in cardiology: A review of cardiovascular disease prediction models. IEEE Access, 12: 193365-193386. https://doi.org/10.1109/ACCESS.2024.3514093
[8] Atwa, A.E.M., Atlam, E.S., Ahmed, A., Atwa, M.A., Abdelrahim, E.M., Siam, A.I. (2025). Interpretable deep learning models for arrhythmia classification based on ECG signals using PTB-X dataset. Diagnostics, 15(15): 1950. https://doi.org/10.3390/diagnostics15151950
[9] Sowmiya, M., Banu Rekha, B., Malar, E. (2025). Optimized heart disease prediction model using a meta-heuristic feature selection with improved binary salp swarm algorithm and stacking classifier. Computers in Biology and Medicine, 191: 110171. https://doi.org/10.1016/j.compbiomed.2025.110171
[10] Han, Y., Huang, L., Zhou, F. (2021). Zoo: Selecting transcriptomic and methylomic biomarkers by ensembling animal-inspired swarm intelligence feature selection algorithms. Genes, 12(11): 1814. https://doi.org/10.3390/genes12111814
[11] Moslehi, F., Haeri, A. (2020). A novel hybrid wrapper–filter approach based on genetic algorithm, particle swarm optimization for feature subset selection. Journal of Ambient Intelligence and Humanized Computing, 11(3): 1105-1127. https://doi.org/10.1007/s12652-019-01364-5
[12] Sianga, B.E., Mbago, M.C., Msengwa, A.S. (2025). Predicting the prevalence of cardiovascular diseases using machine learning algorithms. Intelligence-Based Medicine, 11: 100199. https://doi.org/10.1016/j.ibmed.2025.100199
[13] Saha, D., Guha, S., Kundu, K., Das, S., et al. (2025). Heart disease prediction using logistic regression. In Proceedings of the 2025 International Conference on Computer, Electrical & Communication Engineering (ICCECE), Kolkata, India, pp. 1-6. https://doi.org/10.1109/ICCECE61355.2025.10940073
[14] Ahmed, H., Younis, E.M.G., Hendawi, A., Ali, A.A. (2020). Heart disease identification from patients’ social posts, machine learning solution on Spark. Future Generation Computer Systems, 111: 714-722. https://doi.org/10.1016/j.future.2019.09.056
[15] Elsedimy, E.I., AboHashish, S.M.M., Algarni, F. (2023). New cardiovascular disease prediction approach using support vector machine and quantum-behaved particle swarm optimization. Multimedia Tools and Applications, 83(8): 23901-23928. https://doi.org/10.1007/s11042-023-16194-z
[16] Pal, M., Parija, S. (2021). Prediction of heart diseases using random forest. Journal of Physics: Conference Series, 1817(1): 012009. https://doi.org/10.1088/1742-6596/1817/1/012009
[17] Yang, L., Wu, H., Jin, X., Zheng, P., et al. (2020). Study of cardiovascular disease prediction model based on random forest in eastern China. Scientific Reports, 10(1): 5245. https://doi.org/10.1038/s41598-020-62133-5
[18] Sai Krishna Reddy, V., Meghana, P., Subba Reddy, N.V., Ashwath Rao, B. (2022). Prediction on cardiovascular disease using decision tree and naïve Bayes classifiers. Journal of Physics: Conference Series, 2161(1): 012015. https://doi.org/10.1088/1742-6596/2161/1/012015
[19] Ozcan, M., Peker, S. (2023). A classification and regression tree algorithm for heart disease modeling and prediction. Healthcare Analytics, 3: 100130. https://doi.org/10.1016/j.health.2022.100130
[20] Latha, C.B.C., Jeeva, S.C. (2019). Improving the accuracy of prediction of heart disease risk based on ensemble classification techniques. Informatics in Medicine Unlocked, 16: 100203. https://doi.org/10.1016/j.imu.2019.100203
[21] Al-Alshaikh, H.A., P, P., Poonia, R.C., Saudagar, A.K. J., Yadav, M., AlSagri, H.S., AlSanad, A.A. (2024). Comprehensive evaluation and performance analysis of machine learning in heart disease prediction. Scientific Reports, 14(1): 7819. https://doi.org/10.1038/s41598-024-58489-7
[22] Theerthagiri, P. (2022). Predictive analysis of cardiovascular disease using gradient boosting based learning and recursive feature elimination technique. Intelligent Systems with Applications, 16: 200121. https://doi.org/10.1016/j.iswa.2022.200121
[23] Doki, S., Devella, S., Tallam, S., Reddy Gangannagari, S.S., Sampathkrishna Reddy, P., Reddy, G.P. (2022). Heart disease prediction using XGBoost. In Proceedings of the 2022 Third International Conference on Intelligent Computing Instrumentation and Control Technologies (ICICICT), Kannur, India, pp. 1317-1320. https://doi.org/10.1109/ICICICT54557.2022.9917678
[24] Jain, A., Singh, A., Doherey, A. (2025). Prediction of cardiovascular disease using XGBoost with OPTUNA. SN Computer Science, 6(5): 421. https://doi.org/10.1007/s42979-025-03954-x
[25] Ganie, S.M., Pramanik, P.K.D., Zhao, Z. (2025). Ensemble learning with explainable AI for improved heart disease prediction based on multiple datasets. Scientific Reports, 15(1): 13912. https://doi.org/10.1038/s41598-025-97547-6
[26] Almazroi, A.A., Aldhahri, E.A., Bashir, S., Ashfaq, S. (2023). A clinical decision support system for heart disease prediction using deep learning. IEEE Access, 11: 61646-61659. https://doi.org/10.1109/ACCESS.2023.3285247
[27] Alkhodari, M., Fraiwan, L. (2021). Convolutional and recurrent neural networks for the detection of valvular heart diseases in phonocardiogram recordings. Computer Methods and Programs in Biomedicine, 200: 105940. https://doi.org/10.1016/j.cmpb.2021.105940
[28] Bollapalli, A., Challa, N.P. (2024). Forecasting the risk of heart disease using recurrent neural network. In Proceedings of the 2024 International Conference on Electronics, Computing, Communication and Control Technology (ICECCC), Bengaluru, India, pp. 1-6. https://doi.org/10.1109/ICECCC61767.2024.10593818
[29] Li, J., Pang, S., Xu, F., Ji, P., Zhou, S., Shu, M. (2022). Two-dimensional ECG-based cardiac arrhythmia classification using DSE-ResNet. Scientific Reports, 12(1): 14485. https://doi.org/10.1038/s41598-022-18664-0
[30] Petmezas, G., Haris, K., Stefanopoulos, L., Kilintzis, V., Tzavelis, A., Rogers, J.A., Katsaggelos, A.K., Maglaveras, N. (2021). Automated atrial fibrillation detection using a hybrid CNN-LSTM network on imbalanced ECG datasets. Biomedical Signal Processing and Control, 63: 102194. https://doi.org/10.1016/j.bspc.2020.102194
[31] Ullah, A., Anwar, S.M., Bilal, M., Mehmood, R.M. (2020). Classification of arrhythmia by using deep learning with 2-D ECG spectral image representation. Remote Sensing, 12(10): 1685. https://doi.org/10.3390/rs12101685
[32] Hidayat, E.Y., Astuti, Y.P., Dewi, I.N., Salam, A., et al. (2024). Genetic algorithm-based convolutional neural network feature engineering for optimizing coronary heart disease prediction performance. Healthcare Informatics Research, 30(3): 234-243. https://doi.org/10.4258/hir.2024.30.3.234
[33] Kumar, L.K., Suma, K.G., Udayaraju, P., Gundu, V., Mantena, S.V., Jagadesh, B.N. (2025). Clustering-based binary Grey Wolf Optimisation model with 6LDCNNet for prediction of heart disease using patient data. Scientific Reports, 15(1): 1270. https://doi.org/10.1038/s41598-025-85561-7
[34] Bhadane, D.Y., Borse, I.S. (2025). An intelligent ensemble deep learning techniques with improved owl search algorithm-aided optimal feature selection for predicting the presence of heart diseases. International Journal of Image and Graphics, 2750021. https://doi.org/10.1142/S0219467827500215
[35] Jain, A., Chandra Sekhara Rao, A., Jain, P.K., Hu, Y.C. (2023). Optimized levy flight model for heart disease prediction using CNN framework in big data application. Expert Systems with Applications, 223: 119859. https://doi.org/10.1016/j.eswa.2023.119859
[36] Jiang, F., Zhang, Y., Zhang, Y., Liu, X., Chen, C. (2019). An adaptive particle swarm optimization algorithm based on guiding strategy and its application in reactive power optimization. Energies, 12(9): 1690. https://doi.org/10.3390/en12091690
[37] Hou, Y., Gao, H., Wang, Z., Du, C. (2022). Improved Grey Wolf Optimization algorithm and application. Sensors, 22(10): 3810. https://doi.org/10.3390/s22103810
[38] Jain, M., Maurya, S., Rani, A., Singh, V. (2018). Owl search algorithm: A novel nature-inspired heuristic paradigm for global optimization. Journal of Intelligent & Fuzzy Systems, 34(3): 1573-1582. https://doi.org/10.3233/JIFS-169452
[39] Alabdulkreem, E., Saeed, M.K., Alotaibi, S.S., Allafi, R., Mohamed, A., Hamza, M.A. (2023). Bone cancer detection and classification using Owl Search Algorithm with deep learning on X-ray images. IEEE Access, 11: 109095-109103. https://doi.org/10.1109/ACCESS.2023.3319293
[40] Badaro, G., Saeed, M., Papotti, P. (2023). Transformers for tabular data representation: A survey of models and applications. Transactions of the Association for Computational Linguistics, 11: 227-249. https://doi.org/10.1162/tacl_a_00544
[41] Nerella, S., Bandyopadhyay, S., Zhang, J., Contreras, M., et al. (2024). Transformers and large language models in healthcare: A review. Artificial Intelligence in Medicine, 154: 102900. https://doi.org/10.1016/j.artmed.2024.102900
[42] Kang, H.Y.J., Ko, M., Ryu, K.S. (2025). Tabular transformer generative adversarial network for heterogeneous distribution in healthcare. Scientific Reports, 15(1): 10254. https://doi.org/10.1038/s41598-025-93077-3
[43] Algül, E., Oyucu, S., Polat, O., Çelik, H., et al. (2025). A comparative study of advanced transformer learning frameworks for water potability analysis using physicochemical parameters. Applied Sciences, 15(13): 7262. https://doi.org/10.3390/app15137262
[44] Borisov, V., Leemann, T., Seßler, K., Haug, J., Pawelczyk, M., Kasneci, G. (2024). Deep neural networks and tabular data: A survey. IEEE Transactions on Neural Networks and Learning Systems, 35(6): 7499-7519. https://doi.org/10.1109/TNNLS.2022.3229161
[45] Chen, P., Dong, W., Wang, J., Lu, X., Kaymak, U., Huang, Z. (2020). Interpretable clinical prediction via attention-based neural network. BMC Medical Informatics and Decision Making, 20(S3): 131. https://doi.org/10.1186/s12911-020-1110-7
[46] Fridgeirsson, E.A., Sontag, D., Rijnbeek, P. (2023). Attention-based neural networks for clinical prediction modelling on electronic health records. BMC Medical Research Methodology, 23(1): 285. https://doi.org/10.1186/s12874-023-02112-2
[47] Fan, Y., Waldmann, P. (2024). Tabular deep learning: A comparative study applied to multi-task genome-wide prediction. BMC Bioinformatics, 25(1): 322. https://doi.org/10.1186/s12859-024-05940-1
[48] Mansouri, N., Khayati, G.R., Hasani Zade, B.M., Khorasani, S.M.J., Hernashki, R.K. (2022). A new feature extraction technique based on improved owl search algorithm: A case study in copper electrorefining plant. Neural Computing and Applications, 34(10): 7749-7814. https://doi.org/10.1007/s00521-021-06881-z
[49] Chang, Y., Zheng, Z., Sun, Y., Zhao, M., Lu, Y., Zhang, Y. (2023). DPAFNet: A residual dual-path attention-fusion convolutional neural network for multimodal brain tumor segmentation. Biomedical Signal Processing and Control, 79: 104037. https://doi.org/10.1016/j.bspc.2022.104037
[50] Yuan, X., Liu, S., Feng, W., Dauphin, G. (2023). Feature importance ranking of random forest-based end-to-end learning algorithm. Remote Sensing, 15(21): 5203. https://doi.org/10.3390/rs15215203
[51] Yuda, E., Kaneko, I., Hirahara, D. (2025). Machine-learning insights from the Framingham Heart Study: Enhancing cardiovascular risk prediction and monitoring. Applied Sciences, 15(15): 8671. https://doi.org/10.3390/app15158671
[52] Rehman, M.U., Naseem, S., Butt, A.U.R., Mahmood, T., et al. (2025). Predicting coronary heart disease with advanced machine learning classifiers for improved cardiovascular risk assessment. Scientific Reports, 15(1): 13361. https://doi.org/10.1038/s41598-025-96437-1