Developing an Interpretable EEG-Based Model for ADHD Diagnosis in Children Using Temporal, Spectral, and Wavelet Features

Developing an Interpretable EEG-Based Model for ADHD Diagnosis in Children Using Temporal, Spectral, and Wavelet Features

Sarah Talal Mohammed Taher* Mohammed Sabah Jarjees Muhammad Abul Hasan

Department of Medical Instrumentation Techniques Engineering, Technical Engineering College of Mosul, Northern Technical University, Mosul 41001, Iraq

Department of Biomedical Engineering, NED University of Engineering and Technology, Karachi 74200, Pakistan

Corresponding Author Email: 
sarah.tm@ntu.edu.iq
Page: 
2515-2524
|
DOI: 
https://doi.org/10.18280/jesa.581206
Received: 
11 October 2025
|
Revised: 
26 November 2025
|
Accepted: 
3 December 2025
|
Available online: 
31 December 2025
| Citation

© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

Attention-deficit and hyperactivity disorder (ADHD) is a clinical challenge because its signs and symptoms overlap with those of other conditions, making accurate and objective diagnosis difficult using behavioral tests alone. The study aims at a multi-view EEG signal classification model for ADHD, as well as interpretation of clinical implications. Three kinds of features were extracted: temporal statistical features, power waveform retorn time, and frequency-domain analyzed data across four levels and power spectral density (PSD). The Random Forest method was employed to analyze them. Additionally, to support anatomical interpretation, the most important channel-related features were projected onto a topographic map. The suggested method surpassed the other machine learning models tested in this study, including the K-nearest Neighbor algorithm (95.1% test accuracy) and the Decision Tree model (82% test accuracy). The Logistic Regression algorithm attained an accuracy of 69%, while the Support Vector Machine algorithm recorded the lowest accuracy at 55.9% The Random Forest model achieved 95.7% test accuracy. These results were further confirmed through cross-validation, which showed consistent performance and low variability for the Random Forest model. This comparison demonstrated the Random Forest model's ability to handle nonlinear, time-varying data and its generalization capability. The results indicate the potential of the proposed approach as a robust and clinically applicable tool for ADHD detection, laying the groundwork for future investigations involving larger datasets and advanced methodologies.

Keywords: 

machine learning, wavelet, ADHD, PSD, time domain features

1. Introduction

Attention-deficit and hyperactivity disorder (ADHD) is one of the most common neurodevelopmental disorders, making it very difficult to treat. Surprisingly, this disorder makes children very active, yet easily distracted, whether by what they are currently doing or what they are thinking about next. As a result, school becomes more difficult for children with this disorder, and they often fall behind their peers in terms of educational attainment or even life satisfaction. National and international studies have shown that about 6% of children have been diagnosed with ADHD. However, despite its prevalence, accurate diagnosis remains as challenging as it has been throughout history [1]. This is because the symptoms that constitute the basic diagnostic criteria – in addition to other accompanying features – are very similar to those found in behavioural disorders, as well as those typical of children suffering from any type of anxiety disorder. At the same time, traditional diagnosis relies heavily on subjective assessments by parents or teachers who observe the child over a period of time, leading to difficulties in ruling out other disorders that may present similarly [2]. As well as traditional diagnostic practices, which rely on interviews and questionnaires, often lack objectivity and consistency among clinicians, highlighting the urgent need for new diagnostic tools based on biological markers and data-driven quantitative methods [3].

Due to its high temporal resolution and ability to accurately capture neural oscillations, EEG has recently become a key tool for detecting neurological markers associated with ADHD [4]. The use of machine learning algorithms applied to ERP data has enabled the extraction of characteristic patterns that distinguish children with ADHD from healthy controls [5]. Support vector machines (SVMs) and other machine learning algorithms have shown some success in an early study in distinguishing the two groups according to their spectral characteristics and power spectral density (PSD) features [6, 7]. However, many of these studies were limited in terms of the diversity of features used and lacked clear neural interpretation, making them difficult to use clinically.

In the last few years, deep learning models, particularly convolutional neural networks (CNNs) and CNN-LSTM hybrid models, have achieved extremely high classification accuracies (ranging from 97% to 99%) in detecting ADHD from EEG signals. For example, in 2023, Alkahtani et al. [8] released a study that included temporal, spectral, and entropy information with different classification techniques. The model, which utilized a CNN, achieved an accuracy of 97.75%. In 2024, Chugh et al. [9] release a study that adopts a hybrid model combining CNN and LSTM. Not only able to capture brain signals' spatial and temporal fashion, but also this model even climbers an accuracy of 98.86%. In 2025, Hu et al. came up with a new method. In their work, they used the SCANet model on selective channel attention mechanism. The model managed to be 99.78% accurate and this study provided a clear explanation of just how crucial it is for diagnosis [10]. Despite the excellent performance of these models, they are often treated as “black box” models that are difficult to interpret clinically or link their outputs to underlying neural mechanisms.

Although previous studies have achieved high accuracy in classifying ADHD using EEG signals, most relied on limited feature sets, produced results that were difficult to interpret, or failed to link results to underlying neural mechanisms, highlighting the need for models that combine high accuracy with clinical interpretability. Such interpretability is crucial to link classification outcomes to underlying neural mechanisms, facilitating clinical understanding and practical decision-making.

Based on these challenges, the goal of this work is to develop an interpretable model for diagnosing ADHD based on EEG signals by combining temporal and spectral characteristics (PSD) and time-frequency (wavelet analysis) to improve both accuracy and interpretability. Unlike deep models, which are difficult to interpret, the Random Forest model used in this study allows us to determine the importance of each feature and show the brain regions that contribute most to discrimination through topographic maps. By analyzing how different features contribute to the discrimination between children with ADHD and normal, this study seeks to bridge the gap between computational classification and neurophysiological understanding and provide a clear and clinically meaningful diagnostic framework.

2. Methodology

In this section, we present the approach followed in this study to classify ADHD using EEG signals. First, an overview of the dataset is provided, followed by a description of the preprocessing steps applied to the EEG signals. Next, the extraction of features and the preparation of data for classification are explained. Finally, the model training and evaluation procedures are outlined. Figure 1 illustrates the sequence from data input to the analysis of results.

Figure 1. Block diagram of the workflow

2.1 Dataset

EEG signals in the proposed model were exported from the IEEE Data Port (EEG data for ADHD / Control children) [11]. The dataset included 121 children: 61 with ADHD (mean age 9.62 ± 1.75 years) and 60 healthy controls (mean age 9.85 ± 1.77 years). The dataset included boys and girls, ages from 7 to 12 years. The children with an ADHD diagnosis by an experienced psychiatrist, according to DSM-IV criteria, and took Ritalin for six months, while the normal children had no history of any psychiatric disorders or any specific reports of high-risk behaviour or epilepsy. EEG signal recorded by using the system 10-20, the 19 channels were used to record the EEG signals (Fz, Cz, Pz, C3, T3, C4, T4, Fp1, Fp2, F3, F4, F7, F8, P3, P4, T5, T6, O1, O2) with a sample rate of 128 Hz. The reference electrodes (A1, A2) were placed on the earlobe. The recording was performed during a visual attention task. The children count the cartoon characters when shown the pictures. The size of the pictures was large enough to allow the children a clear vision and to count the characters easily. The recorded EEG signals' duration during the visual attention tasks was dependent on the speed of the children's response [11].

2.2 Pre-processing data

Pre-processing the raw EEG signals that were recorded from children was performed using the EEG-LAB toolbox available in MATLAB [12]. The channels' location was added to the data by using the tools and referenced to the average. The FIR filter was applied to limit the bandwidth of signals from (0.5 to 40) HZ, then ICA in the EEG-LAB tools was used, and artifacts were removed Handly depended on the proportions displayed by the program for each type of signal (e.g., brain, eye, or other artifacts). Components with less than 50% brain signal contribution were excluded, as these contained a high proportion of artifacts .After that, the cleaned data for each child was saved in (.set) format. The cleaned data were segmented into fixed-length time windows of 150 milli seconds for each participant to obtain more samples for analysis. The number of windows for each recording depended on its duration, and the total number of windows was approximately equal across both healthy and ADHD participants (1864 windows for ADHD and 1466 windows for controls). To analyze the signals and extract features more accurately.

2.3 Extracted features

It is a critical step, enabling us to understand the properties present in data and transform them into digital representations that are highly useful for analysis and classification. This step comes after data processing, artifact removal, and refinement of the quality of the EEG signals. In this work, a three-significant set of features was extracted that reflects the temporal, statistical, and frequency aspects of brain activity.

2.3.1 Spectral feature Power Spectral Density (PSD)

The power spectral density (PSD) was calculated using the fast Fourier transform (FFT) to examine how energy is distributed across four EEG bands: delta (0.5–4 Hz), theta (4–8 Hz), alpha (8–13 Hz), and beta (13–30 Hz). We chose PSD as a key feature because it clearly reflects differences in cortical activation between ADHD and control participants, as shown in the topographic maps in Figure 2.

In the delta band, strong activity appears over the frontal regions in both groups. ADHD participants show slightly higher delta power, which may reflect slower cortical activity and lower arousal.

Theta activity is clearly elevated in ADHD individuals, mainly in frontal and central areas. This pattern is commonly observed in ADHD and suggests reduced alertness and difficulties in attention control. Alpha power is lower in ADHD participants, especially over parietal regions. This indicates reduced cortical inhibition and altered resting-state rhythms. Controls, in contrast, display a more balanced alpha distribution. Beta activity is reduced in ADHD participants at frontal and central electrodes, while controls show stronger beta power. This difference aligns with previous studies linking ADHD to lower cortical activation and challenges in executive functioning. Gamma band activity (> 30 Hz) was not analyzed because it is mainly informative in conditions involving seizures or strong cognitive stimulation. ADHD is a non-epileptic disorder, and resting-state EEG in these participants does not show meaningful gamma-band differences. Including gamma could have introduced noise and unreliable results without providing useful information for distinguishing ADHD from controls.

Figure 2. The power distribution in four bands for the patient and normal children

Overall, these findings highlight the classic theta/beta imbalance in ADHD. Increased theta and decreased beta activity serve as consistent neurophysiological markers of the disorder, supporting the use of PSD as a reliable feature for distinguishing ADHD from control participants.

Eqs. (1) and (2) illustrate the mathematical method of calculation [13].

$P S D(f)=\frac{|F F T(x)|^2}{N}$              (1)

where, x is the signal segment, N is the number of samples in the segment, FFT(x) is the fast fourier transform of the signal segment, and f is frequency.

Average power in a frequency band [f1, f2]:

$P_{\text {Band }}=\frac{1}{\left|f \in\left[f_1, f_2\right]\right|} \sum_{f_2}^{f_1} P S D(f)$               (2)

where, $\mathrm{f}_1, \mathrm{f}_2$: lower and upper frequency bounds, and P_Band is: average power in the frequency band.

PSD is computed using FFT, then the PSD values over the desired frequency range are used to get the band power.

2.3.2 Time–frequency features (wavelet decomposition)

A multilevel discrete waveform transform is applied to each EEG channel, where the signal is divided into four levels: the first level has the highest frequencies and the finest time details. The second level has lower frequencies than the first level, with a slightly lower time resolution, covering broader details. The third level has lower frequencies and a lower time resolution, covering slow patterns. The fourth level has the lowest frequencies and the lowest time resolution, representing the long, slow fluctuations in the signal. These extracts combined information from the time and frequency domains, as well as from the detailed parameters. At each level, four main indicators are calculated: power, mean, variance, and Shannon entropy. Power describes or reflects the extent of activity in a given time-frequency range. The mean represents the central values of the parameters at that level. Variance shows the core or degree of irregularity in the signal, as it measures the variation of values around the mean. Finally, Shannon entropy is a measure of the degree of randomness or complexity of the signal in that range, with higher values indicating a more complex and irregular system. Eqs. (3), (4), (5), and (6) illustrate the mathematical formula for calculation [14].

$Power=\sum_{j=1}^L\left(c_j\right)^2$                (3)

where, $\mathrm{c}_{\mathrm{j}}$ is the wavelet coefficient at the given decomposition level, L is the total number of coefficients at that level.

$Mean=\frac{\sum_{j=1}^L c_j}{L}$               (4)

where, $\mathrm{c}_{\mathrm{j}}$ is the wavelet coefficient at the given decomposition level, and L is the total number of coefficients at that level.

$Variance=\frac{\sum_{j=1}^L\left(c_j-\bar{c}\right)^2}{L}$               (5)

where, $\mathrm{c}_{\mathrm{j}}$ is the wavelet coefficient at the given decomposition level, $\overline{\mathrm{c}}$ is the mean of the coefficients at that level, and L is the total number of coefficients at that level.

$Entropy=-\sum_{j=1}^L p_j \log _2\left(p_j\right) ; p_j=\frac{\left|c_j\right|^2}{\sum_{k=1}^L\left|c_k\right|^2}$                (6)

where, $\mathrm{c}_{\mathrm{j}}$ is the wavelet coefficient at the given decomposition level, $\mathrm{p}_{\mathrm{j}}$ is the normalized power of the coefficient $c_j$, and L: Total number of coefficients at that level.

2.3.3 Time-domain statistical features

Directly from the signal, four statistical values are calculated (mean, standard deviation, kurtosis, and skewness) because these values help us understand the shape of the signal and reflect the extent of its fluctuations, whether the signal is tilted in a certain direction, and whether its peaks are sharp or flat. After calculating these measures for all channels, the average across all channels is taken to represent the time-domain features for each segment. Eqs. (7), (8), (9), and (10) illustrate the mathematical formula for calculation [15].

  • The mean is the sum of the values divided by their number.

$Mean=\frac{\sum_{i=1}^N x_i}{N}$                (7)

where, $\mathrm{x}_{\mathrm{i}}$ is the value of the $\mathrm{i}^{\text {th}}$ sample, and N is the total number of samples.

  • The standard deviation measures how spread out the values of a dataset are around the mean. A high standard deviation means the values are more dispersed, while a low standard deviation indicates that they are closer to the mean.

$S D=\sqrt{\frac{\sum_{i=1}^N\left(x_i-\text { Mean}\right)^2}{N}}$               (8)

where, $\mathrm{x}_{\mathrm{i}}$ is the value of the $\mathrm{i}^{\text {th}}$ sample, N is the total number of samples, Mean is the average of all samples, and SD is the standard deviation.

  • Skewness measures the asymmetry of a data distribution. A positive skew indicates that the tail on the right side of the distribution is longer or fatter than the left side, meaning more low values. A negative skew indicates that the tail on the left is longer, meaning more high values.

$Skewness=\frac{\frac{1}{N} \sum_{i=1}^N\left(x_i-\text { Mean}\right)^3}{S D^3}$               (9)

where, $\mathrm{x}_{\mathrm{i}}$ is the value of the $\mathrm{i}^{\text {th}}$ sample, N is the total number of samples, Mean is the average of all samples, and SD is the standard deviation.

  • Kurtosis values near the mean and long tails, while a low kurtosis indicates a flatter distribution with fewer values near the mean and shorter tails.

$Kurtosis=\frac{\frac{1}{N} \sum_{i=1}^N\left(x_i-\text { Mean}\right)^4}{S D^4}$                (10)

where, $\mathrm{x}_{\mathrm{i}}$ is the value of the $\mathrm{i}^{\text {th}}$ sample, N is the total number of samples, Mean is the average of all samples, and SD is the standard deviation.

The three methods described in the previous features were applied to EEG data recorded from 19 channels for each participant, including children with ADHD and healthy children. The process involved calculating wavelet features at four levels of analysis, as well as calculating power in four frequency bands using power spectral (PSD) analysis, along with four temporal features for each channel. three types of features chose because they complement Each Other in the capture of EEG signals. PSD reflects the active state of the cortex at each frequency, wavelet features capture time-frequency patterns; time-domain statistics made signal scale and shape concrete. Their combined effort gives a full accounting of ADHD-related issues EEG characteristics. It could take 316 distinct features to reflect time, frequency, and spectral dimensions of brain activity, as shown in Table 1, based on which constitutes something beyond a meaningful number for a database into the next step in classifying.

Table 1. The number of features extracted for each method and their total

Feature Type

Calculation Details

Total Features

Spectral Features (PSD)

8 frequency bands × mean across 19 channels

8

Time-Domain Features

4 statistical measures × mean across 19 channels

4

Time-Frequency Features (Wavelet)

4 levels × 4 features × 19 channels

304

Total

Combination of all features

316

Note: The PSD was calculated for 8 frequency bands because each of the 4 main bands (Delta, Theta, Alpha, Beta) was subdivided into sub-bands (e.g., Delta Lower/Upper, Alpha Lower/Upper, etc.).

2.4 Data preparation

After extracting the spectral, temporal-frequency, and temporal statistical features from the processed EEG signals of both ADHD and healthy individuals, the features are standardized in matrix X with vector Y, which is a classification vector created to represent the ADHD and healthy individuals, where the value 1 represents the patients and the value 0 represents the healthy individuals. The data is randomly divided so that the model is not biased towards one data group over another using the cvpartition function in MATLAB, where 70% of the data is used for training and 30% of the data for testing. The model is also compared with four other machine learning models (logistic regression, k-nearest neighbors, decision tree model, and support vector machine) to ensure the effectiveness of the model in classification and its robustness. Additionally, k-fold cross-validation was performed to further assess the generalization capability of the models and confirm that the Random Forest model is the most appropriate for distinguishing between children with ADHD and healthy controls.

2.5 Training model

The Random Forest model was chosen for its high ability to deal with non-linear data that are of high dimensions, such as EEG signals, which change continuously and non-linearly with time. This makes it suitable for analyzing the features extracted from these signals. The number of trees in the model was set at 200 trees after the experiment, which showed that this number of trees gives high classification accuracy and stability to the model, as well as maintains reasonable computational time. The X-group is used as an input to the model, while the Y-group represents the label or category (infected, healthy). The Out-Of-Bag (OOB) feature is used to calculate the importance of features using the out-of-sample error. Using the OOB feature greatly helps us evaluate the impact of each feature on the performance of the classifier without the need for additional data.

The model's performance is evaluated using an independent data set that is not used during the model training phase (the test set). Several model performance measures are used, such as calculating the overall accuracy, precision, sensitivity, specificity, and F1-score extracted from the confusion matrix, which is also used. These indicators measure the model's accuracy in distinguishing between the classes (infected and healthy) accurately and reliably. The training and testing accuracies were also calculated.

  • Accuracy is defined as the percentage of cases that the model correctly classifies (whether infected or healthy) out of all cases [16].

$A C C=\frac{T P+T N}{T P+F P+T N+F N}$               (11)

where, TP is the True Positives, TN is the True Negatives, FP is the False Positives, and FN is the False Negatives.

  • Positive precision is the percentage of true positive cases out of all cases classified as positive [17].

$P R=\frac{T P}{T P+F P}$               (12)

  • Sensitivity or recall measures the model's ability to identify all actual positive cases [17].

$R E=\frac{T P}{T P+F N}$               (13)

  • The F1-score measure combines positive precision and recall into a single value. It gives a balanced idea of the model's performance in classifying positive cases [17].

$F 1=\frac{2 P R \times R E}{P R+R E}$                (14)

2.6 Feature importance analysis

After the training phase using the Random Forest model has been completed, the five most important and best features that have a significant role in influencing the classification performance are selected. The importance of each feature is calculated using OOB, as mentioned in the model training phase, and they are arranged from most important to least important. The standard 10-20 electrode placement system was used to create a topographic map of the five features projected onto their respective channels. These maps provide a deeper understanding and anatomical interpretation, as they contribute to knowing which brain regions had the greatest impact in the process of classifying or distinguishing between children with ADHD and healthy children.

3. Results and Discussion

This part of the study will review and discuss the results obtained using the Random Forest model.

3.1 Results

3.1.1 Random Forest Model Performance

A Random Forest model is used to classify ADHD patients and healthy controls using EEG signals. The dataset is randomly divided into 70% for the training set and 30% for the test set. The model has achieved a test accuracy reached 95%. The test sensitivity is 0.9767, the specificity is 0.9318, and the positive accuracy is 0.9421. All of these metrics reflect the model's accuracy in classifying data that it has not seen before. The confusion matrices for the training and test sets, as illustrated in Figure 3(a) and Figure 3(b), indicate how many cases were correctly or incorrectly classified, helping to identify where the model makes mistakes. the ROC curves and AUC values, shown in Figure 4, summarize the model's ability to distinguish ADHD from controls. Higher AUC means better discrimination between the two groups.

(a)

(b)

Figure 3. The confusion matrix (a) for the training data. (b) for the test data

Figure 4. The ROC curves and corresponding AUC values for the training and test sets

3.1.2 Comparison of Random Forest with other machine learning models

The Random Forest model was chosen after evaluating and comparing its performance with four other machine learning models SVM (Linear kernel), Decision Tree (Max Num Splits = 200, Min Leaf Size = 1), Logistic Regression (probability estimation for class assignment), and KNN (K = 3, city block distance metric). This comparison aimed to evaluate the robustness of the proposed model compared to other traditional classification methods on the test set. These results are summarized in Table 2.

According to Table 2, the Random Forest model demonstrates superior performance and high accuracy in discriminating between children with ADHD and healthy controls. It is worth noting that the K-NN model also achieved performance close to Random Forest on the test set before cross-validation, highlighting its potential effectiveness in this particular scenario. The other models' performance was relatively low, and this can be explained as follows:

  • The SVM model: Although the model is robust in handling linear data, the data extracted from EEG signals is high-dimensional and has nonlinear features, which limit the ability of the linear kernel to accurately discriminate between classes.
  • The decision tree is distinguished by its ability to handle nonlinear features, but overfitting the training data sometimes leads to deteriorating performance on the test set.
  • Logistic regression suffers from poor performance due to the underlying linear assumption, but the EEG data contains complex patterns. It cannot be separated by a single line.

Table 2. The performance measures for all models show that the Random Forest model achieved higher performance compared to the other models

Model

Test_Acc

Sensitivity

Precision

F1_Score

Specificity

Random Forest

95.70

97.67

94.79

96.21

93.18

K-Nearest Neighbour

95.10

94.28

96.88

95.56

96.14

Decision Tree

82.08

82.83

84.80

83.80

81.14

Logistic Regression

69.87

89.45

67.39

76.86

45.00

Support Vector Machine

55.96

100.00

55.96

71.76

0.00

Note: This comparison also reflects the effectiveness of the model in distinguishing between children with ADHD and healthy children.

The Random Forest model's superiority over other models stems from its superior ability to handle non-linear and high-dimensional data, in addition to its ability to evaluate the importance of each feature without the need for resampling the data, which increases the accuracy of prediction and reduces overfitting to the training data. The OOB feature also provides an internal estimate of the model's performance on new data, making the model more effective and efficient in classifying children with ADHD compared to other models, such as Random Forest, Logistic Regression, SVM, and K-NN.

3.1.3 Additional Verification (cross-validation)

While our original approach relied on a data split (70% for training and 30% for testing) using a hold-out method to evaluate model performance and compare it with other models, an additional validation step was performed to ensure the robustness and reliability of each model. Cross-validation was performed only within the training set, while the final 30% hold-out test set remained completely independent to evaluate model performance on unseen data. This step was intended to guarantee that what was observed on the performance measure was not just a matter of favourable occurrence among the data but reflected too in reality how well models correctly diagnose infected or healthy children. During cross-validation, the data was divided into five groups. Each of these groups was both test data once while all others acted as controller set for training, as shown in Table 3. Below, the additional validation results show that the Random Forest model consistently maintained high accuracy across all datasets, confirming its stability and robustness. Its low standard deviation (± 0.58) also indicates that its performance was consistent across different cross-validation folds, reinforcing its reliability. In contrast, the K-nearest neighbor model's performance declined with cross-validation. Its higher standard deviation (± 1.28) indicates that its performance was less stable across different folds, suggesting that the initially comparable results to the Random Forest model in the holdout split were partly a result of the specific data ordering. Other models, such as decision trees, logistic regression, and SVM, maintained a trend of declining performance relative to the test set. Decision Tree showed relatively high variability (± 3.16), SVM had moderate variability (± 1.56), and Logistic Regression, although consistently low in accuracy (± 1.27), reflected stable but poor performance. This supports the Random Forest model's selection as the best model for this dataset, as it combines high accuracy with stability and Performance across different data splits, reinforced by its low standard deviation.

Table 3. Comparison of classification performance of different machine learning models using mean accuracy and standard deviation obtained via cross-validation

Model

Mean Accuracy (± Std)

Random Forest

96.49 ± 0.58

SVM

81.71 ± 1.56

KNN

75.11 ± 1.28

Logistic Regression

46.58 ± 1.27

Decision Tree

82.85 ± 3.16

3.1.4 Feature importance analysis

Also, by extracting and identifying the features, they were arranged from most important to least important, and the five most important features associated with the channels were identified. Figure 5 shows a comprehensive topographical distribution of all 316 extracted features according to their importance, highlighting the associated brain regions based on the Random Forest model. These features play a major role in distinguishing people with ADHD. Table 4, below, summarizes the 5 most important features and the channels related to them.

Figure 5. A comprehensive topographical map where each EEG channel is color-coded according to the maximum importance among all its features, based on the Random Forest model

Note: The red-to-blue color gradient highlights channels with high to low importance, respectively. This visualization provides a clear overview of which brain regions contribute most to distinguishing ADHD patients from healthy controls.

Table 4. The five most important EEG features and their corresponding channels

Rank

Channel (Name)

Level

Feature Type

Importance

1

Ch11(O2)

Variance of wavelet coefficients at level 1

Variance

0.6470

2

Ch10 (T8)

Variance of wavelet coefficients at level 3

Variance

0.6263

3

Ch3(F7)

Energy of wavelet coefficients at level 2

Energy

0.6126

4

Ch14(P3)

Variance of wavelet coefficients at level 1

Variance

0.5956

5

Ch1(Fp1)

Energy of wavelet coefficients at level 2

Energy

0.5890

3.2 Discussion

3.2.1 Projection of features onto the topographic map

To demonstrate the spatial distribution of the most important features extracted from the EEG signals, the top features identified by the Random Forest model were projected onto the topographic map of the head, as shown in Figure 6(a) and (b) This projection provides a clear visual representation of the regions where the most affected brain activity was recorded by combining the statistical ordering of the features with the actual electrode locations. This image helps identify the scalp regions that contribute most to the discrimination between children with the disorder and healthy controls. This adds a spatial perspective that complements the results presented in Table 4 in the previous paragraph.

(a)

(b)

Figure 6. The topographic maps (a) The projection of channels related to energy features on a topographic map. (b) The projection of channels related to variance features on a topographic map

3.2.2 Neurophysiological insights

The top five features identified by the Random Forest model, which have the greatest impact in distinguishing children with ADHD from healthy controls, are interpreted below in terms of their neurophysiological significance:

  • Channel 11 O2 contrast at level 1 reflects scattered electrical activity in the right occipital region, which is important for visual information processing and attention.
  • Channel 10 T8 contrast at level 3 captures medium-frequency activity in the right temporal region, which is associated with attentional control and lateral brain processing.
  • Channel 3 F7 power at level 2 measures electrical activity in the left prefrontal region, which is responsible for executive functions, planning, and attention. These are areas often affected in ADHD.
  • Channel 14 P3 contrast at level 1 reflects or represents contrast in the left parietal region, which is important for sensorimotor integration and cognitive organization.
  • Channel 1 Fp1 power at level 2 reflects activity in the left prefrontal cortex, which is associated with early executive functions, attentional control, and working memory.

3.2.3 Comparison with previous studies

Our study represents part of a series of studies that use EEG signals to diagnose ADHD. In this study, statistical temporal features are combined with spectral features (PSD) and wavelet features, and a random forest model is used to classify patients from healthy controls. Our study also enhances the anatomical and functional understanding by projecting the features onto a topographic map. This study also highlights the practical applicability of the model in clinical settings, allowing clinicians to interpret EEG patterns and brain regions relevant to ADHD diagnosis. The results of our study are consistent with recent studies that used similar methods, some of which we highlight.

In 2020, a study conducted by Altınkaynak et al. [18] used EEG signals recorded during an auditory task to diagnose ADHD. After extracting temporal and frequency features, the researchers applied a multi-layer neural network to create a classification model, which reached an accuracy of around 91.3, but it was limited to interpreting the model's performance without providing a graphical representation of the important channels.

In 2023, another study was conducted by Maniruzzaman et al. [19] This study used the LASSO model because it provides the optimal selection of features and channels from EEG signals, and with the help of the t-test, the study used several classification models, but the Random Forest model achieved the highest accuracy among them, reaching 97.53but it did not provide a comprehensive integration of spectral and temporal features and did not show the neural importance of the selected channels.

A study conducted by Ahire et al. [20] in 2025 relied on the use of brain signals recorded from participants in an open-eye state to extract spectral features from these signals. Several models were used in this study to conduct the analysis. Among these models that achieved high accuracy of up to 96% were the K-nearest neighbors model and the Random Forest model, but it was limited to the resting state with eyes open and did not integrate temporal features or wavelet waves, and did not provide interpretive maps of brain channels.

All the above-mentioned studies have shown that selecting and integrating several features, such as spectral and temporal features, and channel selection, significantly contribute to improving the diagnostic accuracy of the model. Our study has strengthened this concept and contributes to this trend by projecting the most important channel-related features onto a topographic map. These maps provide an anatomical interpretation and a deep understanding of the regions that had the greatest impact in distinguishing between patients and healthy children. It is worth noting that our study takes into account limitations such as the small sample size, the specific age range (7-12 years), and the potential effect of medications such as Ritalin. Future studies can address these issues and solve them using deep learning models such as CNN and others, or use a larger dataset, which will help improve the slicing interpretation of relevant brain activities. Table 5 summarizes what was mentioned in the previous paragraphs and displays the similarities and differences between previous studies that used the same approach and our current study.

Table 5. Comparison of the current study with previous EEG-based ADHD classification studies

Authors

Year

Features Extracted

Classifiers Used

Test-Acc (%)

Notes

Altınkaynak et al. [18]

2020

Temporal features, wavelet coefficients, frequency features (ERP)

Multilayer Perceptron (MLP), SVM, Random Forest, etc.

91.3

Multi-feature fusion on the auditory oddball EEG task

Md. Maniruzzaman et al. [19]

2023

Statistical features, channel selection via t-test and SVM, and LASSO for feature selection

Random Forest, Gaussian Process, KNN, etc.

97.53

Optimal channel and feature selection, dimensionality reduction

Ahire et al. [20]

2025

Power Spectral Density (PSD), PCA for dimension reduction

Random Forest, KNN, AdaBoost, Bernoulli Naive Bayes

96

Resting state (open-eye) EEG data, multiple classifiers

Our Study

2025

Statistical temporal features, PSD, wavelet features, and feature projection on a topographic map

Random Forest

96

Feature projection enhances anatomical interpretation

Further consideration should be given to the integration of the proposed Random Forest-based EEG model into existing clinical workflows. The model can support clinicians by providing an objective decision-support tool for ADHD diagnosis through the extraction of temporal, spectral, and wavelet-based EEG features, as well as the generation of topographic maps that highlight the most relevant brain regions. These interpretable outputs can facilitate clinical understanding and contribute to more accurate diagnostic decisions. However, several practical limitations must be acknowledged, including dependency on specific EEG acquisition devices, the requirement for accurate electrode placement, signal quality control, and the operational complexity associated with preprocessing and feature extraction. Addressing these factors is essential for translating the proposed model from a research setting into routine clinical practice.

4. Conclusion

The study found that combining temporal statistical information with spectral parameters (PSD) and wavelet scale derived from EEG signals enhanced the accuracy of ADHD categorization. The Random Forest model achieved 95.7% test accuracy, compared to four other machine learning models, and the most important characteristics were projected onto a topographic map of the scalp, giving a clear anatomical interpretation of the results of the study. This study illustrates the model's utilization in clinical settings, offering medical professionals an objective tool for ADHD diagnosis. However, the study has some drawbacks, such as a small sample size (121 children), a narrow age range (7-12 years), the possible side effects of Ritalin, and the use is limited to 19 EEG channels.

Further research should include larger and more diverse datasets, evaluate the model on medication-naïve children, and explore advanced deep learning techniques such as CNN or hybrid CNN-LSTM models to improve classification performance and generalizability. While deep learning models have shown high performance in previous studies, they were not used in the current study due to the relatively small dataset and their “black-box” nature, which prevents identifying which features contributed most to the classification. In contrast, Random Forest provides interpretable results with clear feature importance mapping across brain regions.

Nomenclature

ACC

Accuracy

$\mathrm{c}_{\mathrm{j}}$

Wavelet coefficient at the given decomposition level

L

Total number of wavelet coefficients at that level

N

Total number of samples

P_Band

Average power in the frequency band

PSD(f)

Power Spectral Density at frequency f

SD

Standard deviation

TP

True Positives

TN

True Negatives

FP

False Positives

FN

False Negatives

X

Feature matrix

Y

Classification vector

FFT(x)

Fast Fourier Transform of signal segment x

P

Power

RE

Sensitivity / Recall

PR

Positive precision

F1

F1-score

$\mathrm{x}_{\mathrm{i}}$

The value of the $\mathrm{i}^{\text {th }}$ sample

x

Signal segment

Greek symbols

$\mathrm{p}_{\mathrm{j}}$

Normalized power of the coefficient $c_j$

Subscripts

i

Index of sample in time-domain features

j

Index of wavelet coefficient

k

Index used in normalization of wavelet coefficients

  References

[1] López, C.Q., Vera, V.D.G., Quintero, M.J.R. (2025). Diagnosis of ADHD in children with EEG and machine learning: Systematic review and meta-analysis. Clinical and Health, 36(2): 109-121. https://doi.org/10.5093/clh2025a16

[2] Elwin, M., Elvin, T., Larsson, J.O. (2020). Symptoms and level of functioning related to comorbidity in children and adolescents with ADHD: A cross-sectional registry study. Child and Adolescent Psychiatry and Mental Health, 14(1): 30. https://doi.org/10.1186/s13034-020-00336-4

[3] Parashar, A., Kalra, N., Singh, J., Goyal, R.K. (2021). Machine learning based framework for classification of children with ADHD and healthy controls. Intelligent Automation & Soft Computing, 28(3): 669-682. https://doi.org/10.32604/iasc.2021.017478

[4] Cortese, S., Aoki, Y.Y., Itahashi, T., Castellanos, F.X., Eickhoff, S.B. (2021). Systematic review and meta-analysis: Resting-state functional magnetic resonance imaging studies of attention-deficit/hyperactivity disorder. Journal of the American Academy of Child & Adolescent Psychiatry, 60(1): 61-75. https://doi.org/10.1016/j.jaac.2020.08.014

[5] Kautzky, A., Vanicek, T., Philippe, C., Kranz, G.S., Wadsak, W., Mitterhauser, M., Hartmann, A., Hahn, A., Hacker, M., Rujescu, D., Kasper, S., Lanzenberger, R. (2020). Machine learning classification of ADHD and HC by multimodal serotonergic data. Translational Psychiatry, 10(1): 104. https://doi.org/10.1038/s41398-020-0781-2

[6] Alam, S., Raja, P., Gulzar, Y. (2022). Investigation of machine learning methods for early prediction of neurodevelopmental disorders in children. Wireless Communications and Mobile Computing, 2022(1): 5766386. https://doi.org/10.1155/2022/5766386

[7] Gabriel, R., Spindola, M.M., Mesquita, A., Neto, A.Z. (2017). Identification of ADHD cognitive pattern disturbances using EEG and wavelets analysis. In 2017 IEEE 17th International Conference on Bioinformatics and Bioengineering (BIBE), Washington, DC, USA, pp. 157-162. https://doi.org/10.1109/BIBE.2017.00-62

[8] Alkahtani, H., Aldhyani, T.H., Ahmed, Z.A., Alqarni, A.A. (2023). Developing system-based artificial intelligence models for detecting the attention deficit hyperactivity disorder. Mathematics, 11(22): 4698. https://doi.org/10.3390/math11224698

[9] Chugh, N., Aggarwal, S., Balyan, A. (2024). The hybrid deep learning model for identification of attention-deficit/hyperactivity disorder using EEG. Clinical EEG and Neuroscience, 55(1): 22-33. https://doi.org/10.1177/15500594231193511

[10] Hu, H., Tong, S., Wang, H., Wu, J., Zhang, R., Jiang, R., Zhao, Y., Ju, Y., Zhang, X. (2025). SCANet: An innovative multiscale selective channel attention network for EEG-based ADHD recognition. IEEE Sensors Journal, 25(11): 20920-20932. https://doi.org/10.1109/JSEN.2025.3560349

[11] Nasrabadi, A.M., Allahverdy, A., Samavati, M., Mohammadi, M.R. (2020). EEG Data for ADHD / Control Children. IEEE Dataport. https://doi.org/10.21227/rzfh-zn36

[12] Vuckovic, A., Gallardo, V.J.F., Jarjees, M., Fraser, M., Purcell, M. (2018). Prediction of central neuropathic pain in spinal cord injury based on EEG classifier. Clinical Neurophysiology, 129(8): 1605-1617. https://doi.org/10.1016/j.clinph.2018.04.750

[13] Bingham, C., Godfrey, M., Tukey, J. (1967). Modern techniques of power spectrum estimation. IEEE Transactions on Audio and Electroacoustics, 15(2): 56-66. https://doi.org/10.1109/TAU.1967.1161895

[14] Stéphane, M. (2008). A Wavelet Tour of Signal Processing: The Sparse Way. Academic Press. https://doi.org/10.1016/B978-0-12-374370-1.X0001-8

[15] Siuly, S., Li, Y. (2015). Designing a robust feature extraction method based on optimum allocation and principal component analysis for epileptic EEG signal classification. Computer Methods and Programs In Biomedicine, 119(1): 29-42. https://doi.org/10.1016/j.cmpb.2015.01.002

[16] Mohammed, R.S., Abdulhammed, R. (2025). Comparative analysis of machine learning algorithms for phishing email detection. NTU Journal of Engineering and Technology, 4(3). https://doi.org/10.56286/mdh75h13

[17] Alsayigh, H.K.S., Khidhir, A.S.M. (2024). Deep learning-based mental task classification using a muse 2 EEG headset. AIP Conference Proceedings, 3091(1): 040002. https://doi.org/10.1063/5.0204932

[18] Altınkaynak, M., Dolu, N., Güven, A., Pektaş, F., Özmen, S., Demirci, E., İzzetoğlu, M. (2020). Diagnosis of attention deficit hyperactivity disorder with combined time and frequency features. Biocybernetics and Biomedical Engineering, 40(3): 927-937. https://doi.org/10.1016/j.bbe.2020.04.006

[19] Maniruzzaman, M., Hasan, M.A.M., Asai, N., Shin, J. (2023). Optimal channels and features selection based ADHD detection from EEG signal using statistical and machine learning techniques. IEEE Access, 11: 33570-33583. https://doi.org/10.1109/ACCESS.2023.3264266

[20] Ahire, N., Awale, R.N., Wagh, A. (2025). Electroencephalogram (EEG) based prediction of attention deficit hyperactivity disorder (ADHD) using machine learning. Applied Neuropsychology: Adult, 32(4): 966-977. https://doi.org/10.1080/23279095.2023.2247702