© 2024 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
The evaluation of audiometric tests, which assess an individual's ability to perceive various sounds and frequencies, is crucial for diagnosing and monitoring hearing loss. This study aims to evaluate the effects of the audiological testing process on individuals by classifying their galvanic skin response (GSR) with a one-dimensional convolutional neural network (1D-CNN). GSR, which reflects physiological changes due to psychological states such as stress and relaxation, was measured during audiological tests to distinguish between resting and active states. Various transformations of the GSR data were applied to the 1D-CNN input to determine the most effective method in classification. The results demonstrate that GSR data, when processed through 1D-CNN, can reliably reflect the physiological and emotional impacts of audiological testing on individuals. This approach provides a novel method for enhancing the understanding of the audiological test experience through objective physiological measures.
audiological test, convolutional neural network (CNN), Fourier transform, galvanic skin response (GSR)
GSR is used to measure the electrical conductivity of the skin in response to certain stimuli. When individuals experience an emotional event, it causes their sweat glands to become more conductive of electricity. GSR shows small changes in the electrical conductivity of our skin in response to certain stimuli.
Audiometric tests are a process that measures a person's ability to hear different pitches, sounds, or frequencies, and with these tests, hearing thresholds in individuals can be determined. These tests provide relevant data regarding the ability to detect noise-related hearing losses from birth to aging [1].
In this study, it is aimed to determine the effects of the audiological testing process on individuals, based on galvanic skin response, using deep learning. There are studies in different fields in the literature aimed at determining emotional changes by classifying GSR data [2-7].
Sun et al. [2] designed a method to generate a signal spectrogram with GSR and convolutional neural network-long short-term memory (CNN-LSTM) joint learning model for sensitivity classification using the MDSTC data set, they used the GSR channel, which has six types of emotion labels for 100 volunteers, to perform human sentiment analysis.
Gündoğdu et al. [3] recorded GSR and heart rate variability (HRV) data from volunteers while resting and playing, and extracted a total of 9 features, 4 from GSR and 5 from HRV. They determined their classification success by creating subsets with different combinations of these features, and achieved the highest success rate in the average and maximum value features extracted from GSR.
Seo et al. [4] found a suitable classification algorithm, they classified boredom using features in GSR and electroencephalogram (EEG) datasets trained and tested by thirty models based on different machine learning algorithms. They stated that the Multilayer Perceptron model showed the best performance with an average accuracy of 79.98%.
Sharma and Gedeon [5] systematically investigated the classification of stress during reading for men and women based on an ANN (artificial neural network) model. They experimented with stressful and non-stressful reading as stimuli, stating that reading stress was significantly different (p < 0.01) in men compared to women.
Arsalan et al. [6] extracted time domain features in their study, namely kurtosis, entropy, standard deviation, variance, and mean absolute rate, from GSR, photoplethysmography (PPG), and EEG data were taken from 28 volunteers sitting with their eyes open in. The different machine learning enhancements were used for detected stress cells, with the best improvements of 75% emerging using the multi-layer perceptron (MLP) classifier.
In their study, Arsalan and Majid [7] recorded the GSR, PPG, and EEG signals at rest and during public speaking. They extracted time domain features from GSR and PPG and frequency domain features from EEG signals. The highest accuracy of %96.25 was achieved using a support vector machine (SVM) classifier with a radial basis function.
Audiological tests are important in terms of correctly diagnosing hearing health and hearing loss conditions in individuals. The results obtained from these tests are used to understand the cause of the individual's hearing problem and determine appropriate treatment options according to audiological tests. It is frequently applied today, both for individuals applying for hearing problems and for people who want to certify that they do not have any hearing-related problems. There are also places where annual certification of the hearing thresholds of people working in noisy environments is required, and most testing is performed in the field or on-site, with many measured in mobile vans or single-wall sound booths [8]. It is important to determine the emotional effects that the audiological testing process, as well as the testing rooms and/or areas, will have on individuals. GSR data analysis is frequently used to determine and evaluate conditions such as stress, especially in different application areas. This study, in which the difference between the audiological testing process and the resting state is revealed by creating a 1D-CNN structure from GSR data, is a pioneer in determining the emotional impact of the audiological testing process on individuals based on deep learning.
The results of these audiological testing processes on individuals were determined by classifying GSR data. In the following section, the methods used are detailed.
2.1 Collection of data
Volunteers participating in the study were explained the content of the study to which they were subjected and had written informed consent forms filled out. The study included people between the ages of 18 and 50 of both genders and people with normal hearing [9].
Participants first underwent a clinical ear, nose, throat (ENT) examination, which included an external ear examination to exclude earwax, debris, discharge, polyps, and hearing loss due to perforation in the eardrum. After the otoscopic examination, audiological evaluations were taken.
38 healthy individuals, 26 women and 12 men, in the specified age range, were included in this study. GSR data were recorded from volunteers at the rest stage and during the audiological test process in the specially insulated room where audiological tests were carried out in the Audiology Unit of Akdeniz University Hospital Ear Nose and Throat Department. The demographic characteristics of the volunteers are shown in Table 1.
Table 1. Demographic characteristics of the volunteers
|
|
The Number of Participants |
Gender |
Female |
26 |
Male |
12 |
|
Total |
38 |
|
Age |
18-30 |
20 |
31-50 |
18 |
A sample of data recorded by a volunteer in the insulated room where audiological tests were carried out is shown in Figure 1.
The audiological evaluation in the study was performed in a soundproof room using TDH 50 (Telephonics, USA) headphones together with the ''Grason-Stadler'' GSI Audio Star Pro clinical audiometer system. At this stage, airway and speech thresholds were determined, and a speech discrimination test was applied. Additionally, bone conduction thresholds were determined using the "Radioear" B-71 brand bone vibrator.
GSR from volunteers was recorded using the NeuLog GSR logger sensor NUL-217, shown in Figure 2. In the study, GSR data was recorded at 10 samples per second. Electrodes were placed on the participant's ring and index fingers.
Figure 1. Sample data collection image from a volunteer in the audiometric testing room
Figure 2. GSR sensor and USB connection module
2.2 GSR
GSR results from autonomic activation of sweat glands in the skin, and when individuals are emotionally aroused, GSR data show distinct patterns that can be seen with the naked eye and measured statistically. This response is caused by physiological changes in the individual, and variations in the psychological state of the individual, such as relaxation and stress [10]. The electrical resistance of the skin fluctuates rapidly with mental, physical, or emotional stimulation [11].
The GSR is a physiological reflection of the body's reactions, for example, due to excitement or stress. When the individual is excited, the amount of salt in the skin increases as the body sweats, and this salinity has the effect of increasing the electrical resistance of the skin and reducing the current passing through the skin. This situation creates a measurable electrical conductivity value, which can be detected by electrodes attached to the person's two fingers [12]. The electrodes consist of metal plates that create a safe low voltage when in contact with the skin. Constant contact between the electrodes and the person's fingers is ensured by a sheath [12]. Figure 3 shows the GSR changes in volunteer participation in the study during the rest and audiological test periods.
(a)
(b)
Figure 3. GSR changes of a volunteer during a) rest, b) test phase
2.3 1D-CNN
Today, deep learning offers very successful results in many applications, such as object recognition, object detection, anomaly detection, and emotion recognition [13-16]. Traditional CNNs are designed to work only on 2-D data, such as images. Alternatively, it is a modified version of 2-D CNNs, 1-D CNNs. In CNN structures, feature extraction and classification processes are combined in a single process so that they can be optimized to maximize classification accuracy [17].
There are various application areas where 1-D CNNs are preferred over 2-D CNN structures. 1-D CNNs offer very successful results in applications such as real-time electrocardiogram (ECG) monitoring, speech recognition, and different EEG applications.
1D-CNN configuration recommended in the study:
The proposed 1D-CNN input layer consists of 600 nodes for GSR data input. This represents the length of each GSR record. The subtraction part includes a convolution layer; RELU has activation and normalization layers. There is also a fully connected layer, a softmax layer, and a classification layer. The number of nodes in the 1D-CNN input layer varies for cases where GSR transformations are taken. The number of nodes in the Fast Walsh-Hadamard transformation is 1024. There is no change in the other layers of the 1D-CNN in the structures from which the transformations are taken.
2.4 The Goertzel Algorithm
The Goertzel algorithm, a digital signal processing tool started by Gerald Goertzel, is the method that calculates preferred frequency components of smaller values from a signal. It is one of the techniques used to monitor and control a single frequency determined from the input signal with a minimum calculation [18-23].
The Goertzel algorithm, which is a special algorithm used to efficiently compute the individual terms of the discrete Fourier transform without calculating the entire spectrum, is used especially in detecting the presence and strength of a certain frequency component in a signal [24]. Due to this feature and the advantage of using it in real-time applications, this transformation of GSR data was taken, and its effect on success was examined.
2.5 Fast Walsh-Hadamard transform
The Fast Walsh Hadamard transform is used in signal processing applications and is a generalized version of the Fourier transform. This transformation is an orthogonal and lossless transformation. The fundamental functions of this transform (Walsh-Hadamard functions), which provides both time and frequency information, has only two values, +1 and/or -1 [25, 26].
(a)
(b)
Figure 4. Fast Walsh-Hadamard transform variations of GSR of a volunteer during a) resting phase, b) test phase
The Walsh-Hadamard transformation is a mathematical operation that has a wide application area. It is especially used in areas such as digital signal processing and data compression. Since the Walsh-Hadamard transformation can mainly be calculated using only addition and subtraction operations, it has gained importance in various digital signal processing applications [27]. Since it has the advantage of providing computational efficiency and analytical convenience, this transformation of GSR data was taken, and its effect on success was examined, and the highest success was achieved for this transformation.
Figure 4 shows the Fast Walsh Hadamard transform variations of GSR for volunteer participation in the study during the rest and audiological test periods.
In the study, 27 of the GSR data of 38 volunteers was used in the network training phase, and the remaining 11 were used in the testing phase. The data acquired from the volunteers during the resting state and audiological test phases was applied to the input of the network for the same length. If the filter size is five in the proposed deep learning model, the training and test success rates obtained for different filter numbers are presented in Table 2.
Table 2. Average training and testing success rates with cross-validation for different filter numbers in CNN architecture
Number of Filters |
Average Training Accuracy |
Average Test Accuracy |
8 |
85.1% |
75.7% |
16 |
88.2% |
74.2% |
32 |
87.0% |
74.2% |
Within the scope of the study, cross-validation was carried out by using three different sets of 11 data points out of a total of 38 in the testing phase and the remaining 27 in the training phase for the phase where raw GSR data was input.
Figure 5. Confusion matrix for the highest test accuracy
Training and test success results have been tested for different solver types and different filter sizes, and the highest success was achieved with the "Adam" type solver. Additionally, it has been observed that different values of the filter size do not affect the classification success. As can be seen from the obtained results in Table 2, as the number of filters increases, Average training accuracy with cross-validation increases when the number of filters is increased from eight to sixteen. Figure 5 shows the confusion matrix for the test data. This is for the situation where the highest test success is achieved.
The study also examined the effects of applying different transformations of GSR data to the input of the 1D-CNN structure. At this stage, the accuracy results obtained for the Fourier transform, the discrete Fourier transform with the second-order Goertzel algorithm, and the fast Walsh-Hadamard transform, are presented in the table below. As can be seen from the obtained results in Table 3, the training success for the Fourier transform and discrete Fourier transform with the second-order Goertzel algorithm is 100%. For test success, the highest success was achieved with the fast Walsh-Hadamard transform. Figure 6 shows the confusion matrix of the test data for the fast Walsh-Hadamard transform. This is for the situation where the highest test success is achieved.
The columns of the confusion matrix represent the neural network’s prediction classification, while the rows represent the actual classes. This representation helps evaluate the neural network's classification of each class. When the average obtained test successes are examined, there does not appear to be excessive sensitivity in terms of any class. In addition, according to the results obtained in the confusion matrix for the test data for the Fast Walsh-Hadamard transformation given in Figure 6, there does not appear to be a significant difference in terms of the oversensitivity of a class.
Table 3. Accuracy results obtained for different transformations of GSR data
Transform |
Training Accuracy |
Test Accuracy |
Fourier transform |
100% |
54.5% |
Discrete Fourier transform with second-order Goertzel algorithm |
100% |
68.1% |
Fast Walsh-Hadamard transform |
92.5% |
77.2% |
Figure 6. Confusion matrix for test data of the fast Walsh-Hadamard transform
In this study, it was aimed at determining the effects of the audiological testing process on the GSR changes in individuals using 1D-CNN. GSR is affected by variations in the psychological state of individuals, such as physiological changes, relaxation, and stress. The audiological test process status was determined by classifying the resting state and GSR data taken during the audiological test from the volunteers participating in the study. At this stage, the highest testing success with cross-validation was achieved for the case where the number of filters in the CNN structure was eight. Within the scope of the study, the effects of applying different transformations of GSR data, such as the Fourier transform, the discrete Fourier transform with the second-order Goertzel algorithm, and the fast Walsh-Hadamard transform, to the input of the 1D-CNN structure, were also examined. At this stage, the highest test success was achieved with the fast Walsh-Hadamard transform. The results obtained using CNN show that the emotional variations created by the audiological testing process on individuals can be evaluated using GSR data.
The proposed approach can be used especially to evaluate the stress that the audiological testing process and closed test rooms or areas will create on individuals and whether this stress will affect the test result. Thus, it can help plan the testing process and testing environment in a way that will have less impact on individuals.
In the literature, two studies were found in which GSR changes were analyzed for the purpose of evaluating speech intelligibility, the decrease in the individual's pronunciation, and speech dissatisfaction, including oral and dental problems [28, 29]. Additionally, the researches [28, 29] evaluated the effects of mobile phone use and the audiological testing process on GSR changes in healthy individuals. In this study, the 1D-CNN approach was used in the GSR-based evaluation of the audiological testing process. The results obtained show that the GSR data change is distinctive in the audiological testing process, and from here, the physiological effects of the testing process on individuals can be evaluated.
We would like to thank the staff of the Audiology Unit of the Akdeniz University Hospital, and also the volunteers who participated in the study.
The study was approved by the Medical Ethical Committee of the Akdeniz University and the experiment was undertaken in compliance with national legislation and the Declaration of Helsinki.
[1] Genç, G.A., Belgin, E. (2004). Kulak burun boğaz hastalıkları ve baş-boyun cerrahisi. Güneş Kitabevi, pp. 73-88.
[2] Sun, X., Hong, T., Li, C., Ren, F. (2019). Hybrid spatiotemporal models for sentiment classification via galvanic skin response. Neurocomputing, 358: 385-400. https://doi.org/10.1016/j.neucom.2019.05.061
[3] Gündoğdu, S., Çolak, Ö.H., Polat, Ö. (2022). Determination of effective feature subset for puzzle video game players from physiological signals. In 2022 Innovations in Intelligent Systems and Applications Conference (ASYU), Antalya, Turkey, pp. 1-4. https://doi.org/10.1109/ASYU56188.2022.9925515
[4] Seo, J., Laine, T.H., Sohn, K.A. (2019). An exploration of machine learning methods for robust boredom classification using EEG and GSR data. Sensors, 19(20): 4561. https://doi.org/10.3390/s19204561
[5] Sharma, N., Gedeon, T. (2011). Stress classification for gender bias in reading. In: Lu, BL., Zhang, L., Kwok, J. (eds) Neural Information Processing. ICONIP 2011. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-24965-5_39
[6] Arsalan, A., Majid, M., Anwar, S.M., Bagci, U. (2019). Classification of perceived human stress using physiological signals. In 2019 41st Annual International Conference of the IEEE Engineering in Medicine and Biology Society (EMBC), Berlin, Germany, pp. 1247-1250. https://doi.org/10.1109/EMBC.2019.8856377
[7] Arsalan, A., Majid, M. (2021). Human stress classification during public speaking using physiological signals. Computers in Biology and Medicine, 133: 104377. https://doi.org/10.1016/j.compbiomed.2021.104377
[8] Chung, K. (2023). Calibration matters: II. Measurement of ambient noise in test rooms/areas. Journal of Communication Disorders, 101: 106293. https://doi.org/10.1016/j.jcomdis.2022.106293
[9] Özdinç Polat, L.N. (2023). Evaluation of the effect of electromagnetic waves on speech intelligibility from electrophysiological signals. Akdeniz University, Graduate School of Natural and Applied Sciences, Electrical and Electronics Engineering, PhD Thesis.
[10] Gokay, R., Masazade, E., Aydin, C., Erol-Barkana, D. (2015). Emotional state and cognitive load analysis using features from BVP and SC sensors. In 2015 IEEE International Conference on Multisensor Fusion and Integration for Intelligent Systems (MFI), San Diego, CA, USA, pp. 178-183. https://doi.org/10.1109/MFI.2015.7295805
[11] Westeyn, T., Presti, P., Starner, T. (2006). ActionGSR: A combination galvanic skin response-accelerometer for physiological measurements in active environments. In 2006 10th IEEE International Symposium on Wearable Computers, Montreux, Switzerland, pp. 129-130. https://doi.org/10.1109/ISWC.2006.286360
[12] Beyaz, A., Beyaz, R. (2015). Determination of galvanic skin response sensor usage possibility for tractor safety purposes. Suleyman Demirel University Journal of Engineering Sciences and Design, 3(3): 121-125.
[13] Yang, H., Meng, C., Wang, C. (2020). Data-driven feature extraction for analog circuit fault diagnosis using 1-D convolutional neural network. IEEE Access, 8: 18305-18315. https://doi.org/10.1109/ACCESS.2020.2968744
[14] Lu, W., Lan, C., Niu, C., Liu, W., Lyu, L., Shi, Q., Wang, S. (2023). A CNN-transformer hybrid model based on CSW in transformer for UAV image object detection. IEEE Journal of Selected Topics in Applied Earth Observations and Remote Sensing, 16: 1211-1231. https://doi.org/10.1109/JSTARS.2023.3234161
[15] Nizam, H., Zafar, S., Lv, Z., Wang, F., Hu, X. (2022). Real-time deep anomaly detection framework for multivariate time-series data in industrial IoT. IEEE Sensors Journal, 22(23): 22836-22849. https://doi.org/10.1109/JSEN.2022.3211874
[16] Karamipour, A., Sadati, S.H. (2023). Tactile object recognition using fluid-type sensor and deep learning. IEEE Sensors Letters, 7(9): 1-4. https://doi.org/10.1109/LSENS.2023.3303077
[17] Kiranyaz, S., Avci, O., Abdeljaber, O., Ince, T., Gabbouj, M., Inman, D.J. (2021). 1D convolutional neural networks and applications: A survey. Mechanical Systems and Signal Processing, 151: 107398. https://doi.org/10.1016/j.ymssp.2020.107398
[18] Kumari, J.S., Lenine, D., Satish, A., Kumar, T.S., Kalaivani, C., Kumar, M.D., Kumara Swamy, G., Vijaya Suresh, Y., Nagarjuna Reddy, J., Kanna Kumar, J., Rao, Y.M. (2022). A model predictive Goertzel algorithm based active islanding detection for grid integrated photovoltaic systems. Microprocessors and Microsystems, 95: 104706. https://doi.org/10.1016/j.micpro.2022.104706
[19] Kim, J.H., Kim, J.G., Ji, Y.H., Jung, Y.C., Won, C.Y. (2011). An islanding detection method for a grid-connected system based on the Goertzel algorithm. IEEE Transactions on Power Electronics, 26(4): 1049-1055. https://doi.org/10.1109/TPEL.2011.2107751
[20] Pechlivanis, C., Rigogiannis, N., Tichalas, A., Frantzeskakis, S., Christodoulou, C., Papanikolaou, N. (2024). Active fault detection device for LV electrical installations with Goertzel-based impedance estimation and IoT connectivity. Engineering Proceedings, 60(1): 22. https://doi.org/10.3390/engproc2024060022
[21] Goertzel, G. (1958). An algorithm for the evaluation of finite trigonometric series. The American Mathematical Monthly, 65(1): 34-35.
[22] Jaber, M.A., Massicotte, D. (2021). The JM-filter to detect specific frequency in monitored signal. IEEE Transactions on Signal Processing, 69: 1468-1476. https://doi.org/10.1109/TSP.2021.3053509
[23] Wibowo, F.W., Wihayati, W. (2023). Goertzel algorithm design on field programmable gate arrays for implementing electric power measurement. In 2023 International Conference on Computer Science, Information Technology and Engineering (ICCoSITE), Jakarta, Indonesia, pp. 489-494. https://doi.org/10.1109/ICCoSITE57641.2023.10127757
[24] Tan, L., Jiang, J. (2013). Chapter 8 - Infinite Impulse Response Filter Design, Digital Signal Processing (Second Edition), Academic Press, 301-403. https://doi.org/10.1016/B978-0-12-415893-1.00008-1
[25] Rusch, T.L., Sankar, R., Scharf, J.E. (1996). Signal processing methods for pulse oximetry, Computer in Biology and Medicine, 26(2): 143-159. https://doi.org/10.1016/0010-4825(95)00049-6
[26] Saka, K., Aydemir, Ö., Öztürk, M. (2016). Classification of EEG signals recorded during right/left hand movement imagery using Fast Walsh Hadamard Transform based features. In 39th International Conference on Telecommunications and Signal Processing (TSP), Vienna, Austria, pp. 413-416 https://doi.org/10.1109/TSP.2016.7760909
[27] Ahmed, N., Rao, K.R. (1975). Walsh-Hadamard Transform. In: Orthogonal Transforms for Digital Signal Processing. Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-642-45450-9_6
[28] Nishigawa, G., Maruo, Y., Okamoto, M., Oka, M., Oki, K., Kinuta, Y., Minagi, S. (2003). Galvanic skin response as a parameter for evaluating speech disability. Journal of Dental Research, 82(B): 187.
[29] Nishigawa, G., Natsuaki, N., Maruo, Y., Okamoto, M., Minagi, S. (2003). Galvanic skin response of oral cancer patients during speech. Journal of Oral Rehabilitation, 30(5): 522-525. https://doi.org/10.1046/j.1365-2842.2003.01131.x