© 2024 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Deep Neural Network (DNN) is an advancing technology that improves our life by allowing machines to perform complex tasks. Hybrid Deep Neural Network (HDNN) is widely used for emotion recognition using EEG signals due to its increase in performance than DNN. Among several factors that improve the performance of the network, activation is an essential parameter that improves the model accuracy by introducing nonlinearity into DNN. The activation function enables nonlinear learning and solves the complexity between the input and output data. The selection of the activation function depends on the type of data that is used for computation. This paper investigates the model performance with respect to various activation functions like ReLU, ELU, and Leaky ReLU on a hybrid CNN with a BiLSTM and CNN model for emotion recognition. The model was tested on the DEAP dataset which is an emotion dataset that uses physiological and EEG signals. The experimental results have shown that the model has improved accuracy when the ELU function is used.
activation function, ELU, BCI, emotion recognition, ReLU, Leaky ReLU, Deep Neural Network
Affective computing is a branch of Artificial Intelligence (AI) that creates systems and devices to respond to human emotions naturally. The communication between the machine and the human is more effective and seamless. It is a computational study of an emotion that communicates to healthcare people like doctors, healthcare educators, medical administrators, and patients [1]. It has many applications such as developing assisting technology for people with disability, enhancing mental diagnosis and treatment, and improving human computer interaction. The biggest challenge in affective computing is the recognition of emotion since it varies across subjects and cultures and is complex. Usually, emotions were expressed in different ways like facial expressions, speech signals, and physiological signals.
Emotions represent the true state of a person’s thoughts and behavioural responses and they are closely associated with the nervous system [2]. Emotion recognition through physiological signals is more reliable since the interpretation of emotions starts from the central nervous system [3]. The brain computer interface is a technique that acquires the Electroencephalogram (EEG) signal from the brain and is used for various purposes like assisting the individual with disabilities, limited motor functions, neurological injuries, mental health applications [4], and in neuroscience for emotion recognition [3]. BCI enables the machine and the environment to exchange information with the human. These signals are obtained by EEG from the brain cortex through various electrodes placed on the head. It gives more precise and consistent results [5]. It is more commonly preferred by researchers since it is noninvasive, portable, and can be used in different environments [3]. It also plays an important role in illustrating the activity of brain regions in different emotional states [6].
Emotion recognition using EEG signals is carried out either using the machine learning method like Support Vector Machine (SVM), Linear Discriminant Analysis (LDA) or the deep learning method. In a machine learning based model, the features are extracted manually and given to the classifiers like SVM, LDA, Random Forest, Naive Bayes, and other classifiers to classify different types of emotions. Since EEG signal has nonlinear and high dimensions problems, it is difficult to solve the problem using the linear algorithm. In order to overcome this issue, deep learning was introduced to solve nonlinear issues [7]. Deep learning is preferred to develop an accurate and automated system that can learn large amounts of data and recognize different emotional cues. It learns complex features from the raw data automatically which is essential to quantify different emotions. The performance of the deep learning model can be improved through various techniques like changing the network architecture, tuning parameters, preventing excessive dropout of neurons, augmentation, and preprocessing of data. The network architecture changed in many ways by changing the number of layers, learning rate etc. including the activation function of the layers.
An activation function is a mathematical function that determines whether the neuron should be activated or not based on the weighted sum of its input. It introduces nonlinearity at the output of the neuron, which allows the model to solve complex and nonlinear data. There are activation functions that operate on both positive and negative data like Tanh, sigmoid and ELU while some of them operate only on positive data like ReLU as shown in Table 1. Among them, the Tanh function is zero centred and therefore reduces the number of epochs to train the model and helps the back propagation process [8]. However, it cannot be used for multiple layers networks due to the vanishing gradient problem. As it leads to saturation with differentiation, the sigmoid function suffers from vanishing gradient problems and as it is not differentiable. ReLU and ELU do not suffer from saturation effect due to its linearity in the positive region. ReLU function also introduces sparsity in the model by removing the negative values which reduces the computational complexity of the model and improves the model’s generalization performance. It performs well for multiple layers in the network but suffers from dead neurons due to its characteristics in the negative region. At such situations, ELU introduces exponential non linearity for smaller portion in the negative region. Due to this, it gives better performance than ReLU and Leaky ReLU [9]. It performs well for large datasets and gives fast convergence [10].
Therefore, this proposed work does the empirical study on different activation function applied to HDNN and CNN models and subsequently identifies the best fitting activation function.
Keelawat et al. [11] have found that the change in the activation function of the model has improved the accuracy of the model. Supporting this, Apicella et al. [12] found that, the choice of activation function has a great impact on the learning process and the shape of the error function. Therefore, in this proposed work different activation functions were experimented on the HDNN model and the best fitting activation function was determined. In this paper, ReLU, Leaky ReLU and ELU activation functions were used to evaluate the changes in model accuracy. The effectiveness and efficiency of selected activation functions are tested for two different architectures with DEAP EEG dataset for emotion classification in the proposed work. The effectiveness of an activation function here is defined to be the accuracy and value of the loss function computed on the test set. Efficiency on the other hand is defined as the prediction time, epochs required to converge and time required for a single epoch. The following research question and hypotheses have been devised for the proposed work:
Table 1. Activation functions and their corresponding equation and waveforms
Activation 
Function and Gradient 
Equation 
Sigmoid 
$f(x)=\frac{1}{1+e^{x}}$ 

Tanh 
$f(x)=\frac{e^xe^{x}}{e^x+e^{x}}$ 

ReLU 
$f(x)=\max (0, x)$ 

ELU 
$f(x)=\left\{\begin{array}{c}\alpha\left(e^x1\right) \text { for } x<0 \\ x \text { for } x \geq 0\end{array}\right.$ 
Research Question: Does the use of activation functions results in noteworthy changes in efficacy of the model chosen with DEAP EEG dataset?
H1. Can the performance of the model attributed to mean gradients in each layer?
H2. Do weights play the major role in model performance?
H3. Can the performance of the model attributed to results from statistical analysis?
The paper is organized in the following way. Section 2 explains the survey reports and other research works that discusses the activation function. Section 3 elaborates on the model architecture and the process involved in the experiment. Section 4 describes the experimentation and the interpretation of results that were evaluated for various activation functions. Finally, Section 5 concludes of the research work.
Most of the papers that focused on emotion recognition using EEG signals have experimented with the DEAP dataset. The various research works that are available in the literature with DEAP dataset have been studied to understand the use of activation functions in the DNN model. The most prominent activation function that was used in the models was Tanh, sigmoid, and ReLU [13]. Tanh activation function is preferred for dataset like DEAP where there are negative values. It is good in introducing nonlinearity to the model and helps to stabilize the gradients during training Very few articles have used Tanh, linear, and sigmoid functions in the middle layers. In recent days, ReLU is the most commonly preferred activation function in the model layers for different applications [14], as it introduces nonlinearity, prevents vanishing gradients, and computationally efficient to avoid overfitting. Generally, most of the CNN models independent of application, use ReLU and Leaky ReLU and the two works [15, 16] are among those. Mastromichalakis [15] have used Leaky ReLU function for COVID pneumonia dataset and concluded that it was the best alternative to ReLU for avoiding dying neuron problem. Jebadurai et al. [16] have used Leaky ReLU for generalization of their model.
Among the works considered specifically for EEG based application and most importantly emotion recognition application, Pan and Zheng [17] have used generative adversarial network for data augmentation and convolutional network as their model for EEG based emotion classification. Here in every convolutional layer, they have used ReLU after batch normalization. While they have reported increase in accuracy, it is not discussed that if it was due to ReLU and data augmentation method or only due to data augmentation. Similar to this there are multiple works in literature which have used ReLU or Leaky ReLU but have not attempted to analyse the impact of the activation function. The below Table 2 provides the details of such works. The work shows, though the actual reason for increase in accuracy was not discussed, there is a room for increase in accuracy due to change in activation functions like ReLU, Leaky ReLU and this is one of the reasons for the proposed work which analyses and identifies suitable activation function for the given application.
Table 2. Literature works available that used ReLU or Leaky ReLU as the activation function
Authors 
Model/Application 
Reported Accuracy in % and Inference 
Activation Function Used 
Chao et al. [18] 
DNN model with capsules 
Average accuracy of 67 for 2DCNN and 65 for 1DCNN 
ReLU activation function in the CNN layer 
Pandey et al. [19] 
Variational mode decomposition feature extraction on a DNN to detect subject dependent emotions 
61 for arousal and 62 for valence and the difference in accuracy is due to class imbalance 
ReLU 
Li et al. [20] 
Hybrid DNN with the combination of CNN and LSTM networks 
Average accuracy of 75.21 
ReLU 
Garg and Verma [21] 
GoogleNet model for emotion recognition 
83 for valence and 55 for arousal 
RELU 
Ashgar et al. [22] 
Alexnet model with SVM 
Average accuracy of 77 was obtained for DEAP dataset 
ReLU 
Xing et al. [23] 
Stacked AutoEncoder for feature extraction with LSTM model 
81 valence and 74  arousal 
Linear and sigmoid activation 
Ozdemir et al. [24] 
CNN 
Accuracy of 90 for 2class and 88 for 4 classes 
ReLU 
Acharya et al. [25] 
LSTM and CNN models 
87 for CNN and 88 for LSTM 
ReLU 
Yang et al. [26] 
CNN model 
90 for arousal and 89 for valence 
ReLU 
ELU is another activation function that is recently preferred by researchers for negative inputs. The use of ELU in various applications like emotion recognition through text [27], facial emotion recognition [28] and speech emotion recognition [29] has shown significant improvement in the model. Devi and Deepa [27] have compared the results of ELU and ReLU activation function for emotion recognition from twitter data. On comparing, they have found that ELU has positive merits over negative values of data in not allowing the mean activation to be nearer to zero. This has enabled to decrease the gap between the gradients. It has also provided better accuracy than ReLU. Bejjagam and Chakradhara [28] have used ELU as the activation function for Facial Emotion Recognition since the inputs are negative and to overcome the dying problem of ReLU. Le et al. [29] have used the ELU activation function for speech based emotion recognition which has negative data as the inputs. The maximum accuracy of the model is estimated to be 90.54%. Abelwahab and Busso [30] have studied the activation function of the dense network. It was found that there was no significant change in the performance on using ELU and ReLU for speech emotion recognition. Therefore, they have suggested increasing the number of layers and training data set which led to increase in performance by ELU. Few research works have used ELU as the activation function for EEG signals. Schirrmeister et al. [31] have identified the degradation in the accuracy over frequency ranges on using ReLU over ELU for Motor Imagery based EEG signals and it happened due to the presence of negative values in the data. By experimenting with different batch size, epochs and learning rate, Nguyen et al. [32] concluded that ELU and its combinations like SELU and GELU performs better by avoiding vanishing gradient and dead state problems. Liu et al. [33] have proposed TSin as the activation and proved its efficiency based on training stability, convergence speed and precision.
León et al. [34] have observed that the model got stuck in learning due to dying neurons with ReLU and therefore they have used ELU as the activation function. Farahat et al. [35] have insisted on choosing Tanh or ELU as the activation over ReLU since its inability to compute negative inputs leading to poor results. Bai [36] have observed better performance with ELU since it pushes the average activation value closer to zero just like Batch Normalization and thus aiding in fast learning with reduced bias. Jang et al. [37] have used ELU in the graphical model for fast extraction of the features. Liang et al. [38] have used ELU as the activation function in the middle layers to improve the model fitting. With these observations, different activation functions were experimented in the proposed work in two different models i.e., HDNN and CNN.
The role of the activation function is well known with respect to simple datasets like random dataset having normal distribution etc. However, as pointed out in literature review section, though the model uses different activation function and reports accuracy, the reason behind in that particular scenario is not well explained. Hence this paper considers the following performance criteria for deciding the effectiveness of the activation function, while considering the regular evaluation accuracy and loss metrics and computational efficiency for comparison.
1. The change of gradients with respect to each layer both in terms of mean and standard deviation;
2. The change of weights with respect to each layer both in terms of mean and standard deviation.
The gradients were obtained by running the model with user defined fit function instead of using the default fit function available with Keras. The weights were obtained by using default fit function with callbacks for weight capture. Two different models have been tested with three activation functions i.e., ReLU, Leaky ReLU and ELU. The experimental setup used for running the experiment is given in Table 3. Also, in order to test the above hypothesis the work considers intra subject variations i.e., if subjects 15 are considered and augmented, it will be split into 702010 ratio, 70% of the data will be used for training, 20 % will be used for validation and remaining 10% will be used for evaluation.
Table 3. Experimental setup
Name /Description 
Version 
CPU 
Intel® Core™ i5 
RAM 
8 GB 
OS 
Windows 10 
Python 
Python 3.11.5 
TensorFlow 
TensorFlow 2.14.0 
Scikitlearn 
Scikitlearn 1.3.1 
Anaconda 
2021.05 
The distribution of data within few features (from a total of 70 features) after normalization in the input layer shown in Figure 1 and it can be seen that, it has both positive and negative values. However, after the first layer of CNN, the distribution is unknown, i.e., each layer’s distribution changes during training and hence in order to make the distribution as normal, batch normalization can be used. Batch normalization forces the neurons to work in the linear region. Thus, Batch Normalization is very beneficial to neural networks but it comes with some additional Batch Normalization parameters to learn. Hybrid CNN with LSTM, as it is computationally expensive always; CNN model alone has been tested with and without Batch Normalization for identifying the significance of ELU. Batch Normalization weakens the dependency between the layers by recentering the inputs to each layer.
Figure 1. Distribution of data after normalization
Figure 2. Block diagram of EEG based emotion recognition
Each participant’s single emotion has been recorded for 63 seconds in DEAP dataset. However, considering the hardware implementation and augmentation, each 2 seconds data is taken as one sample and this is discussed in Augmentation section. As the model is tested only with one modality, intrasubject validation is considered in this work, i.e., out of n subjects’ data available, n1 or n2 subjects data were considered for training and evaluation. The leftout subject’s data will be considered for prediction.
The emotion recognition process undergoes a few steps and the architecture block diagram to predict such emotion is given in Figure 2. The input EEG signals are used to test the model. EEG signals can either be selfrecorded signals or a dataset that consists of a large collection of EEG signals. Selfrecorded signals refer to the collection of signals or data that have been recorded or captured by an individual user. Then the signals are preprocessed to remove the unwanted noise and make them ready for the next step. The most relevant information from the processed EEG signals is given to the DNN model for classification.
3.1 Dataset description
DEAP dataset is an emotion dataset that has the collection of EEG from different participants. In this dataset, the EEG signals were recorded from 32 participants at the sampling frequency of 512 Hz. The other peripheral signals like skin temperature, blood pressure, Electromyogram, and galvanic skin response were recorded from the channels separately. Each subject responded to 40 videos which were considered as the number of trials. After the video, each subject assessed the emotional value based on the emotion levels of valence, arousal, dominance, and liking on a scale of 19.
The original signal was downsampled to 128Hz to reduce the computational complexity and high frequency noise in the signal. The signal was also applied to a band pass filter of 445Hz to filter the signal and remove the artifacts from the signal. The Electrooculogram noise in the DEAP dataset was removed by Independent Component Analysis so that the data could represent the emotions of the subject clearly [23]. Electromyogram artifacts were removed by subtracting the temporal low frequency drift [39].
3.2 Augmentation and feature extraction of EEG
The preprocessed signal was augmented to increase the number of data samples. The data sequence was augmented with a window size of 2sec and an overlap of 0.125sec corresponding to 16 samples and the detailed description of this process is given in the previous study [40]. Therefore, the 63sec data sequence has generated 488 samples of data which is described in Figure 3. The augmented data was given as the input to the feature extraction module. The relevant five band features were extracted from the EEG dataset using Fast Fourier Transform (FFT) that convert the signal from the time domain to the frequency domain and they are theta, alpha, sigma, beta, and gamma [41] which are then split into training and testing and evaluation samples in the ratio of 702010. After the split of the data samples, they were given to the deep learning models.
Figure 3. Augmentation of EEG data
The model parameters of hybrid BiLSTM and CNN are given in Table 4 comprises two neural networks namely CNN and BiLSTM. The extracted data is standardized and normalized to be more suitable for classification. In order to improve the input of the BiLSTM network, CNN is added in the first two layers of the model. CNN will process the signal by smoothening and reducing the sequence length [42] so that the processed data is fed into the BiLSTM network instead of the original data. Here the BiLSTM model sends both the positive and inverted sequence of EEG data to be trained [43]. The model is used to classify 4 classes based on valence, and arousal. In both the models, CNN is formed at the beginning of the model to extract spatial and temporal features of time series data. It can detect essential information from different positions with excellent accuracy [44]. It also converts the univariate data into a multidimensional dataset which refers to the extraction of multiple features from the EEG signal.
Table 4. Model Parameters of hybrid BiLSTM and CNN
Parameters of Hybrid BiLSTM Model 

Layer 
Filter/No. of Neurons 
Kernel 
Activation 
Max Pooling 
Dropout 
Batch Normalization 
Conv1D 
128 
3 
Yes 
2 
0.2 
 
Conv1D 
128 
3 
Yes 
2 
0.2 
 
BiLSTM 
256 
 
 
 
0.2 

BiLSTM 
32 
 
 
 
0.2 

Flatten 
 
 
 
 
 

Dense 
128 

Yes 

0.2 

Dense 
4 

Softmax 



Parameters of CNN Model 

Layer 
Filter/No. of Neurons 
Kernel 
Activation 
Max Pooling 
Dropout 
Batch Normalization 
Conv1D 
128 
3 
Yes 
2 
 
Yes 
Conv1D 
128 
3 
Yes 
2 
 
Yes 
Conv1D 
64 
3 
Yes 
2 
 
No 
Flatten 
 
 
 
 
 
 
Dense 
64 
 
Yes 
 
0.2 
 
Dense 
32 
 
Yes 
 
0.2 
 
Dense 
16 
 
Yes 
 
0.2 
 
Dense 
4 
 
Softmax 
 
 
 
Pooling layer shrinks feature maps of large sizes into smaller feature sizes. In order to avoid the overfitting problem and also to decrease the complexity of the model, the dropout layer is added. The dropout technique is used for generalization where the neurons are dropped randomly while training at each epoch. Therefore, the feature selection is equal for all neurons and that makes the model learn different independent features [45]. At the output layer, softmax activation is used to generate the output based on the probability distribution [45]. The probability in the output is p $\in\{0,1\}$. It is also called a log loss function. It is mainly used in multiclass classification problems.
The results and discussions are discussed with respect to the hypothesis made in literature review section. According to this the experiment was conducted to tap the gradients for the following cases:
1. ELU, ReLU and Leaky ReLU are used with Hybrid Model
2. ELU, ReLU and Leaky ReLU are used with CNN Model and Batch Normalization after each CNN layer
3. ELU, ReLU and Leaky ReLU are used with CNN Model without Batch Normalization
The respective gradients are shown in Figure 4.
Figure 4. Gradient plot of different activation function
Table 5. Performance measure (effectiveness) of different models
Model 
Activation Function 
Evaluation Accuracy (%) 
Evaluation Loss 
Hybrid 
Leaky ReLU 
94.0 
0.207 
ReLU 
92.5 
0.31 

ELU 
94.3 
0.209 

CNN with Batch Normalization 
Leaky ReLU 
84.73 
0.43017 
ReLU 
83.22 
0.4957 

ELU 
86.15 
0.3942 

CNN without Batch Normalization 
Leaky ReLU 
85.66 
0.4114 
ReLU 
82.30 
0.48747 

ELU 
85.09 
0.4189 
Table 6. Performance measure (efficiency) of different models
Model 
Activation Function 
Training Time in Sec/Epoch 
Prediction Time in Sec 
Convergence Epoch Number 
Hybrid CNN + LSTM 
Leaky ReLU 
247 
1.82 
120 
ReLU 
233 
1.45 
150 

ELU 
255 
1.711 
70 

CNN model with Batch Normalization 
Leaky ReLU 
59 
0.393 
120 
ResLU 
60 
0.334 
150 

ELU 
62 
0.471 
70 

CNN model with without Batch Normalization 
Leaky ReLU 
32 
0.301 
120 
ReLU 
30 
0.303 
130 

ELU 
33 
0.310 
70 
Figure 5. Accuracy of Hybrid BiLSTM model for various activation functions: (a) Leaky ReLU (b) ReLU (c) ELU
4.1 Hypothesis1
It can be seen from the gradients Figure 4 that, in Hybrid model, ELU has its gradients more away from zero gradient (0.15 to 0.2) which is the desirable factor compared to ReLU which are closer to zero (0.03 to 0.03) meaning that it may lead to vanishing gradient problem. Leaky RELU however has in between values. As vanishing gradient depends on the number of layers, the experiment has been conducted with CNN model of total 7 layers as given in the Table 4. This was preferred as LSTM is computationally more complex. Here it can be noticed that there is a difference between gradients flow between the model with Batch Normalization and model without. ReLU performs well with Batch Normalization i.e., the gradient distance from zero is doubled (0.015 to 0.03), whereas ELU almost similar between with Batch Normalization and without Batch Normalization. The difference between with and without Batch Normalization in case of ReLU was due to the normalization of the input to each layer done by Batch Normalization. Leaky ReLU which was designed for reducing the dying neuron problem did not perform better with Batch Normalization. Also, with respect to each activation function, compared to the hybrid model, the gradients are little closer to zero and this is because of the increase in number of layers. In every case even if the changes are very less with respect to gradient flow and other parameters, ELU outperforms over all other activation function as shown in the below Table 5 and Table 6, i.e., ELU is the effective activation function.
The model computation included both training and testing of the samples. The model was computed for 120 epochs for all the activation function except ReLU which converges at 150 epochs. The Figure 5 depicts the accuracy of emotion classification with respect to various activation functions. It can be seen from the Figure 5(c) that fast convergence happens and it was attained in less than 50 epochs due to the improved gradient flow of the model facilitated by the ELU activation.
It is observed from Table 5 that ELU has 94% of accuracy i.e., ELU has performed well for both positive and negative data. The use of ELU as the activation function allows the negative values to be pushed closer to zero with lower computational complexity. Due to the reduced bias effect, the normal gradient and unit natural gradient gets closer resulting in a faster mean shift towards zero [9]. On the other hand, ReLU becomes inactive and stops learning for negative inputs. However, the distribution of negative data is less compared to positive data and hence the efficiency is not affected more and it is 92.5%.
4.2 Hypothesis2
During the model fitting, the weight grows by the learning algorithm. The weights will rise in size in order to handle the features of the samples given in the training data. The model with smaller weights is preferred than larger weights as larger weights tend to capture more specifics of the given data rather than generalizing it [46]. Hence it is confirmed from the weight plots as shown in Figure 6, that ELU performs well than other two models where the weights are keep growing until the last epoch and their magnitudes also larger than ELU weights. However, the changes are very minimal i.e., ReLU weights changes from 0 to 0.075 and Leaky ReLU weights changes from 0 to 0.08 whereas ELU weights most constant around 0.05. Hence it is confirmed that ELU performs well compared to other activations functions considered in this work.
Figure 6. Mean Weight plots of different activation functions
4.3 Hypothesis3
Among the two DNN models, hybrid BiLSTM showed better performance and thus statistical analysis has been done for the same and follows some additional combinational activation function also. This was done because not a major difference was observed with plain ELU, ReLU and Leaky ReLU as discussed earlier. For statistical analysis, Friedman chisquare test was used to determine the statistical difference between the models with different activation function. The results of the test are shown in Table 7 with the pvalue of 0.001518 which is less than 0.05. Hence it is concluded that there are statistically significant differences between the models and the performance with null hypothesis can be rejected.
Table 7. Statistical test analysis results
Friedman Test Statistic 
19.5488 
pvalue 
0.001518 
After rejecting the null hypothesis in the Friedman test, DunnBonferroni test is carried out for pairwise comparisons in situations where multiple comparisons are made between the groups. For each pairwise comparison, the test results have provided a pvalue. The pvalues of Leaky ReLU with model 2 is 0.0106 and model 3 is 0.0247. Additionally, the pvalues of ELU is significantly different from model 2 is 0.0205 and model 3 is 0.0445.
Table 8. Performance measure of the model for different activation functions
Model No. 
Model 
Activation Function 
Accuracy 
Loss 
Precision 
Recall 
F1Score 
Model 1 
Hybrid BiLSTM 
Tanh+ ReLU 
90.3% 
0.353 
0.89 
0.89 
0.89 
Model 2 
Hybrid BiLSTM 
ReLU + ELU 
89.3% 
0.374 
0.88 
0.88 
0.88 
Model 3 
Hybrid BiLSTM 
Tanh + ReLU + ELU 
89.6% 
0.384 
0.89 
0.88 
0.88 
Model 4 
Hybrid BiLSTM 
ReLU 
92.6% 
0.312 
0.91 
0.92 
0.91 
Model 5 
Hybrid BiLSTM 
Leaky ReLU 
94.0% 
0.207 
0.93 
0.93 
0.94 
Model 6 
Hybrid BiLSTM 
ELU 
94.3% 
0.209 
0.94 
0.93 
0.93 
Table 9. Summary results of hypothesis tests
Hypothesis 
Comparative Performance 

ReLU 
Leaky ReLU 
ELU 

H1 
Fair 
Fair 
Good 
H2 
Fair 
Fair 
Good 
H3 
Fair 
Fair 
Good 
Effectiveness (Accuracy and Loss Function) 
Fair 
Fair 
Good 
Efficiency 
Good 
Moderate 
Moderate 
The performance measures like accuracy, loss, precision, recall, and F1score of activation function which was experimented on Hybrid BiLSTM is given in Table 8. From the Table 8, it is understood that the ELU activation function has better emotion recognition than other activation functions. Finally, it was decided that ELU is the best activation function for EEG based applications. The summarized results of each hypothesis are given in Table 9.
With each of the hypothesis results and effectiveness and efficiency terminology discussed, it is clear that ELU shows improved performance for any model combination. However, it is not clear that the performance improvement is exactly due to mean gradients, the mean weight parameters or any other parameters considered in this article. Hence, eventually the performance improvement due to the activation functions is not completed by the results discussed in this article. Further experiments need to be conducted with different value of learning rate and parameters of activation function such as α in Leaky ReLU, ELU, for wide range of datasets.
A comparison of three different combinations of activation functions that were used for emotion recognition has been carried out. The comparison was done on a hybrid BiLSTM and CNN model. The potential use of ELU in DNN has given accuracy of 94% while the other models gave an accuracy of around 92%. The pragmatic choice of ELU as the activation function is best suited for dataset like DEAP if the data has distributed values of both positive and negative values. ELU has better performance due to its ability to process negative inputs. It also gives fast convergence and avoids overfitting of data. However as seen there is no marked differences in the accuracies and hence much denser analysis i.e., activation function in combination with other tuning parameters has to be done in order to truly appreciate the effect of role of the activation functions in any DNN.
[1] Yannakakis, G.N. (2018). Enhancing health care via affective computing. Malta Journal of Health Sciences, 5(1): 3842. https://doi.org/10.14614/HEALTHCOMP/9/18
[2] Margaret, M.J., Banu, N.M. (2022). A survey on brain computer interface using EEG signals for emotion recognition. In AIP Conference Proceedings, 2518(1): 040002. https://doi.org/10.1063/5.0103476
[3] Rached, T.S., Perkusich, A. (2013). Emotion recognition based on braincomputer interface systems. BrainComputer Interface SystemsRecent Progress and Future Prospects, 253270. https://doi.org/10.5772/56227
[4] Banu, N.M., Sujithra, T., Cherian, S.M. (2021). Performance comparison of BCI speller stimuli design. Materials Today: Proceedings, 45: 28212827. https://doi.org/10.1016/j.matpr.2020.11.804
[5] Khanna, A., Gupta, D., Bhattacharyya, S., Hassanien, A. E., Anand, S., Jaiswal, A. (2021). International conference on innovative computing and communications. Proceedings of ICICC, 2. http://doi.org/10.1007/9789811630712
[6] Niemic, C.P. (2002). Studies of emotion: A theoretical and emperical review of psychophysiological studies of emotion. Journal of Undergraduate Research, 1(1): 1518.
[7] Liu, H., Zhang, Y., Li, Y., Kong, X. (2021). Review on emotion recognition based on electroencephalography. Frontiers in Computational Neuroscience, 15: 758212. https://doi.org/10.3389/fncom.2021.758212
[8] Gustineli, M. (2022). A survey on recently proposed activation functions for Deep Learning. arXiv preprint arXiv:2204.02921. https://doi.org/10.31224/2245
[9] Clevert, D.A., Unterthiner, T., Hochreiter, S. (2015). Fast and accurate deep network learning by exponential linear units (elus). arXiv preprint arXiv:1511.07289. https://doi.org/10.48550/arXiv.1511.07289
[10] Dubey, S.R., Singh, S.K., Chaudhuri, B.B. (2022). Activation functions in deep learning: A comprehensive survey and benchmark. Neurocomputing, 503: 92108. https://doi.org/10.1016/j.neucom.2022.06.111
[11] Keelawat, P., Thammasan, N., Numao, M., Kijsirikul, B. (2021). A comparative study of window size and channel arrangement on EEGemotion recognition using deep CNN. Sensors, 21(5): 1678. https://doi.org/10.3390/s21051678
[12] Apicella, A., Isgro, F., Prevete, R. (2019). A simple and efficient architecture for trainable activation functions. Neurocomputing, 370: 115. https://doi.org/10.1016/j.neucom.2019.08.065
[13] Liu, X.Y., Xing, Q., Zhang, H., Chen, F. (2023). A Novel Activation Function of Deep Neural Network. Scientific Programming, 2023(1): 3873561. https://doi.org/10.1155/2023/3873561
[14] Madhu, G., Kautish, S., Alnowibet, K.A., Zawbaa, H. M., Mohamed, A.W. (2023). Nipuna: A novel optimizer activation function for deep neural networks. Axioms, 12(3), 246. https://doi.org/10.3390/axioms12030246
[15] Mastromichalakis, S. (2020). ALReLU: A different approach on Leaky ReLU activation function to improve Neural Networks Performance. arXiv preprint arXiv:2012.07564. http://arxiv.org/abs/2012.07564.
[16] Jebadurai, J., Jebadurai, I.J., Paulraj, G.J.L., Samuel, N.E. (2019). Superresolution of digital images using CNN with leaky ReLU. Int. J. Recent Technol. Eng, 8(2): 210212. https://doi.org/10.35940/ijrte.B1034.0982S1119
[17] Pan, B., Zheng, W. (2021). Emotion recognition based on EEG using generative adversarial nets and convolutional neural network. computational and Mathematical Methods in Medicine, 2021(1): 2520394. https://doi.org/10.1155/2021/2520394
[18] Chao, H., Dong, L., Liu, Y., Lu, B. (2019). Emotion recognition from multiband EEG signals using CapsNet. Sensors, 19(9): 2212. https://doi.org/10.3390/s19092212
[19] Pandey, P., Seeja, K.R. (2022). Subject independent emotion recognition from EEG using VMD and deep learning. Journal of King Saud UniversityComputer and Information Sciences, 34(5): 17301738. https://doi.org/10.1016/j.jksuci.2019.11.003
[20] Li, Y., Huang, J., Zhou, H., Zhong, N. (2017). Human emotion recognition with electroencephalographic multidimensional features by hybrid deep neural networks. Applied Sciences, 7(10): 1060. https://doi.org/10.3390/app7101060
[21] Garg, D., Verma, G.K. (2020). Emotion recognition in valencearousal space from multichannel EEG data and wavelet based deep learning framework. Procedia Computer Science, 171: 857867. https://doi.org/10.1016/j.procs.2020.04.093
[22] Asghar, M.A., Khan, M.J., Fawad, X., Amin, Y., Rizwan, M., Rahman, M., Badnava, S., Mirjavadi, S.S. (2019). EEGbased multimodal emotion recognition using bag of deep features: An optimal feature selection approach. Sensors, 19(23): 5218. https://doi.org/10.3390/s19235218
[23] Xing, X., Li, Z., Xu, T., Shu, L., Hu, B., Xu, X. (2019). SAE+ LSTM: A new framework for emotion recognition from multichannel EEG. Frontiers in Neurorobotics, 13: 37. https://doi.org/10.3389/fnbot.2019.00037
[24] Ozdemir, M.A., Degirmenci, M., Izci, E., Akan, A. (2021). EEGbased emotion recognition with deep convolutional neural networks. Biomedical Engineering/Biomedizinische Technik, 66(1): 4357. https://doi.org/10.1515/bmt20190306
[25] Acharya, D., Jain, R., Panigrahi, S.S., Sahni, R., Jain, S., Deshmukh, S.P., Bhardwaj, A. (2021). Multiclass emotion classification using EEG signals. In Advanced Computing: 10th International Conference, IACC 2020, Panaji, Goa, India, pp. 474491. https://doi.org/10.1007/9789811604010_38
[26] Yang, Y., Wu, Q., Fu, Y., Chen, X. (2018). Continuous convolutional neural network with 3D input for EEGbased emotion recognition. In Neural Information Processing: 25th International Conference, ICONIP 2018, Siem Reap, Cambodia, pp. 433443. https://doi.org/10.1007/9783030042394_39
[27] Devi, T., Deepa, N. (2021). A novel intervention method for aspectbased emotion using exponential linear unit (ELU) activation function in a deep neural network. In 2021 5th International Conference on Intelligent Computing and Control Systems (ICICCS), Madurai, India, pp. 16711675. https://doi.org/10.1109/ICICCS51141.2021.9432223
[28] Bejjagam, L., Chakradhara, R. (2022). Facial emotion recognition using convolutional neural network with multiclass classification and Bayesian optimization for hyper parameter tuning.
[29] Le, T.D.T., Van, L.T., Hong, Q.N. (2020). Deep convolutional neural networks for emotion recognition of Vietnamese. International Journal of Machine Learning and Computing, 10(5): 692699. https://doi.org/10.18178/ijmlc.2020.10.5.992
[30] Abdelwahab, M., Busso, C. (2018). Study of dense network approaches for speech emotion recognition. In 2018 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Calgary, AB, Canada, pp. 50845088. https://doi.org/10.1109/ICASSP.2018.8461866
[31] Schirrmeister, R.T., Springenberg, J.T., Fiederer, L.D.J., Glasstetter, M., Eggensperger, K., Tangermann, M., Hutter, F., Burgard, W., Ball, T. (2017). Deep learning with convolutional neural networks for EEG decoding and visualization. Human Brain Mapping, 38(11): 53915420. https://doi.org/10.1002/hbm.23730
[32] Nguyen, A., Pham, K., Ngo, D., Ngo, T., Pham, L. (2021). An analysis of stateoftheart activation functions for supervised deep neural network. In 2 021 International Conference on System Science and Engineering (ICSSE), Ho Chi Minh City, Vietnam, pp. 215220. https://doi.org/10.1109/ICSSE52999.2021.9538437
[33] Liu, X.Y., Xing, Q., Zhang, H., Chen, F. (2023). A Novel Activation Function of Deep Neural Network. Scientific Programming, 2023(1): 3873561. https://doi.org/10.1155/2023/3873561
[34] León, J., Escobar, J.J., Ortiz, A., Ortega, J., González, J., MartínSmith, P., Gan, J.Q., Damas, M. (2020). Deep learning for EEGbased Motor Imagery classification: Accuracycost tradeoff. Plos One, 15(6): e0234178. https://doi.org/10.1371/journal.pone.0234178
[35] Farahat, A., Reichert, C., SweeneyReed, C.M., Hinrichs, H. (2019). Convolutional neural networks for decoding of covert attention focus and saliency maps for EEG feature visualization. Journal of Neural Engineering, 16(6): 066010. https://doi.org/10.1088/17412552/ab3bb4
[36] Bai, Y. (2022). RELUfunction and derived function review. In SHS Web of Conferences, 144: 02006. https://doi.org/10.1051/shsconf/202214402006
[37] Jang, S., Moon, S.E., Lee, J.S. (2021). EEGbased emotional video classification via learning connectivity structure. IEEE Transactions on Affective Computing, 14(2): 15861597. https://doi.org/10.1109/TAFFC.2021.3126263
[38] Liang, Z., Zhou, R., Zhang, L., Li, L., Huang, G., Zhang, Z., Ishii, S. (2021). EEGFuseNet: Hybrid unsupervised deep feature characterization and fusion for highdimensional EEG with an application to emotion recognition. IEEE Transactions on Neural Systems and Rehabilitation Engineering, 29: 19131925. https://doi.org/10.1109/TNSRE.2021.3111689
[39] Koelstra, S., Muhl, C., Soleymani, M., Lee, J.S., Yazdani, A., Ebrahimi, T., Pun, T., Nijholt, A., Patras, I. (2011). Deap: A database for emotion analysis; using physiological signals. IEEE Transactions on Affective Computing, 3(1): 1831. https://doi.org/10.1109/TAFFC.2011.15
[40] Jehosheba Margaret, M., Masoodhu Banu, N.M. (2023). Performance analysis of EEG based emotion recognition using deep learning models. BrainComputer Interfaces, 10(24): 7998. https://doi.org/10.1080/2326263X.2023.2206292
[41] AlFahoum, A.S., AlFraihat, A.A. (2014). Methods of EEG signal features extraction using linear analysis in frequency and timefrequency domains. International Scholarly Research Notices, 2014(1): 730218. http://doi.org/10.1155/2014/730218
[42] Hu, Z., Chen, L., Luo, Y., Zhou, J. (2022). EEGbased emotion recognition using convolutional recurrent neural network with multihead selfattention. Applied Sciences, 12(21): 11255. https://doi.org/10.3390/app122111255
[43] Yang, J., Huang, X., Wu, H., Yang, X. (2020). EEGbased emotion classification based on bidirectional long shortterm memory network. Procedia Computer Science, 174: 491504. https://doi.org/10.1016/j.procs.2020.06.117
[44] Rhanoui, M., Mikram, M., Yousfi, S., Barzali, S. (2019). A CNNBiLSTM model for documentlevel sentiment analysis. Machine Learning and Knowledge Extraction, 1(3): 832847. https://doi.org/10.3390/make1030048
[45] Alzubaidi, L., Zhang, J., Humaidi, A.J., AlDujaili, A., Duan, Y., AlShamma, O., Santamaría, J., Fadhel, M.A., AlAmidie, M., Farhan, L. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of big Data, 8: 174. https://doi.org/10.1186/s40537021004448
[46] Wang, H., Czerminski, R. and Jamieson, A.C. (2021). Neural Networks and Deep Learning. Einhorn, M., Löffler, M., de Bellis, E., Herrmann, A. and Burghartz, P. (Ed.) The Machine Age of Customer Insight, Emerald Publishing Limited, Leeds, pp. 91101. https://doi.org/10.1108/978183909694520211010