Enhancing Road Safety Using Deep Learning-Based Driver Behavior Detection System

Enhancing Road Safety Using Deep Learning-Based Driver Behavior Detection System

Ali Fadhil Yaseen Althabhawee Reem M. Ibrahim Bushra Kadhim Oleiwi*

Educational Planning Department, Directorate General of Education in Holy, Karbala 56001, Iraq

College of Control and Systems Engineering, University of Technology - Iraq, Baghdad 19006, Iraq

Corresponding Author Email: 
bushra.k.oleiwi@uotechnology.edu.iq
Page: 
317-324
|
DOI: 
https://doi.org/10.18280/ijtdi.090209
Received: 
18 May 2025
|
Revised: 
19 June 2025
|
Accepted: 
25 June 2025
|
Available online: 
30 June 2025
| Citation

© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

Most road accidents are caused by drivers engaging in driving practices and being distracted while driving, which contributes to the concern of road safety awareness in society today. Activities like using a phone while driving carelessly and displaying driving habits increase the likelihood of these actions leading to an accident. In this research paper, we introduce a driver behavior detection system based on CNN technology that employs a 22-layer convolutional neural network (CNN) to identify intricate behaviors in real time situations. The proposed method systematically incorporates layers with 3×3 kernels and ReLU activations, along with max pooling layers to classify five main categories: turning movements, using a phone for texting or talking, safe driving practices, and other activities. The system underwent training and testing on a dataset of 10776 RGB images, in 480×640 pixels resolution, depicting driving situations and surroundings. The first test results showed a notable drop in misclassification errors and a notable rise in accuracy rates for classification tasks using a CNN approach could have advantages for enhancing vehicle safety systems by accurately and swiftly detecting driving behaviors to reduce accident risks and enhance road safety overall. The experiment findings were obtained through GPU processing in Matlab. Resulted in a training accuracy of 100% along with a testing accuracy of 100%, achieved within 23.46 seconds. The method suggested for assessing driving habits has been effectively executed.

Keywords: 

classification, driver behavior detection, deep learning, surveillance system, convolutional neural network

1. Introduction

Around the world, there's an increasing need to enhance road safety and decrease traffic incidents. One practical way to achieve this goal is by observing and studying driver actions to identify reasons for accidents. An advanced tool that plays a role in this research area is convolutional neural networks (CNN). Using CNN to explore driver behavior helps in analyzing intricate datasets related to driver actions. This study can help create warning systems that alert drivers before accidents happen by recognizing behaviors that suggest driving performance. Such advancements could also support driver training and promote driving on the road.

The primary applications of driver behavior detection using CNN technology include implementing warning systems to evaluate driver behavior and alert drivers when unsafe driving habits are detected; developing training and education systems that utilize data to assess driver training programs and suggest ways to improve performance; and conducting accident investigations by analyzing traffic data to determine the factors contributing to accidents. In recent years, there has been progress in the development of CNN-based systems for detecting driver behavior. The advancements consist of the expansion of uses, such as self-driving cars and real time monitoring systems; enhancements in model precision due to progress in deep learning methods and the abundance of datasets that can be accessed at no cost. They are starting to be integrated with other AI systems for wider data assessments and more detailed suggestions.

As stated in the report released by the World Health Organization on December 13th of 2023, the accidents that occur on highways claim 1.19 million lives annually, making it the leading cause of death for kids and young individuals aged between 5 to 29 years old [1]. In 2023, in the US, 8 percent of the fatalities in accidents were attributed to distracted driving, which accounted for 3.275 deaths on the road [2]. Despite the concerning numbers mentioned above, many safety systems in vehicles do not have the ability to monitor driver behaviors closely in time to allow for intervention. To address this issue, we suggest the development of a driver behavior detection system based on deep learning technology. This system would utilize a 22-layer network to classify important driving actions, like phone usage, unsafe turns and maintaining a normal driving posture instantly. This study is driven by the pressing requirement for the identification of behaviors to enhance road safety and minimize accident risks.

This article is organized in a way that Section 2 gives a literature review of driver behaviors detection. Section 3 introduces the suggested driver behavior system with a discussion about the dataset in Section 3.1. The presentation of CNN is highlighted in Section 3.2 and followed by an overview of the suggested CNN architecture in Section 3.3. Moving forward to Section 4, you will find the outcomes and examination from the simulation, while Section 5 covers the study’s conclusion.

Exploring the significance of grasping the methods for identifying driver behavior is crucial. The next section focuses on advancements in research within this field.

2. Related Work

In this section, various research papers on recognizing driver behavior with deep learning are discussed comprehensively and systematically based on the approach used and the type of sensors employed for clarity and cohesion purposes. A critical viewpoint is taken to emphasize how current techniques tackle the identification of driving behaviors and their limitations are pointed out well.

2.1 Vision-based CNN models

Tamas and Maties [3] explored a method using CNN for identifying drivers by adjusting the VGG16 design and testing out attention modules like squeeze and excitation along with activation functions such as Leaky ReLU and SELU, resulting in an impressive 95.82% accuracy in detecting driver distractions, for vehicle safety. A notable advancement compared to earlier methods. Chirra et al. [4] examined a system for detecting driver drowsiness using CNN technology that achieved an accuracy of 96%. This system can determine whether a driver is sleepy or alert based on their eye movements analyzed by the Viola-Jones algorithm. In another study, Huang et al. [5] developed a hybrid CNN framework (HCF), which employs deep learning techniques to identify drivers effectively. This framework utilizes transfer learning to capture driver behaviors by incorporating three trained models: ResNet50, Inception V3 and Xception. Their approach demonstrated a 96.74% accuracy in identifying ten instances of driving behaviors during testing. Ezzouhri et al. [6] suggested using a deep learning method to identify driver distraction by analyzing the driver's body movements captured in onboard camera footage and classifying them accordingly to address driver focus issues effectively. They were able to achieve an accuracy of 96% on their custom dataset and 95% on the AUC dataset after conducting tests involving nine drivers across ten different activities, in various real life and simulated scenarios. Majeed et al. [7] introduced a method using CNN technology to identify driver tiredness based on eye and mouth movements analysis; despite working with a dataset size, they utilized data enhancement through the DLIB tool to determine the Mouth Aspect Ratio (MAR). Their model achieved a detection accuracy of 96.69%. Tao and Ma [8] developed a neural network incorporating Hierarchical Bilinear Pooling (HBP) alongside Pose Image Distracted Driving Detection (PIDDD) models to detect distracted driving incidents. By analyzing the GADX901 dataset, they successfully classified the driver's head and arm gestures with an accuracy rate of 95.49%. Kidu et al. [9] discussed an LRCM model designed to identify nine driver behaviors in time without extensive preprocessing requirements, for both daytime and nighttime conditions alike, with impressive accuracy rates of 88.7% during daylight hours and 92.4% during the night. Delwar and colleagues introduced techniques to detect drowsiness by leveraging AI alongside cameras as remedies. Their models utilized AI via machine learning methods such as CNN, VGG16 and MobileNet while also implementing computer vision for recognition purposes. Their approach yielded a commendable accuracy rate of 92.75 percent by capturing key attributes from both the eyes and face regions to monitor drowsiness, in live scenarios, thereby mitigating the occurrence of road accidents [10].

2.2 Sensor and signal-based approaches

Escottá et al. [11] proposed a method to categorize driving incidents using one and two-dimensional CNNs. They utilized data from IMU sensors mounted on the dashboard of the vehicle to distinguish between non-aggressive driving behaviors, including braking actions and lane changes, with an accuracy of 82.40%. Abdurani and colleagues introduced a technique to identify driver fatigue utilizing EEG data and machine learning classifiers. Through the application of Independent Component Analysis (ICA), they achieved a detection accuracy of 96.07%. Moreover, they extracted features using Wavelet Transform (CWT) and an enhanced modified Z-score. Notably, their employment of the ANN classifier resulted in an accuracy rate of 99.65% [12]. In a study by Sharma and colleagues, they focused on developing a Bidirectional Long Short-Term Memory (BLSTM) paired with a 1D CNN utilizing EEG data to classify mental effort levels effectively. They achieved an accuracy rate of 95.36% for classification (including low workload/effort level and high workload/effort level). Additionally, the model generated an accuracy of 96.77% for binary classification, distinguishing between high workload/effort levels [13].

2.3 Multi-modal and hybrid architectures

Ed‑Doughmi et al. [14] suggested employing 3 Convolutional Networks and Recurrent Neural Networks to assess and forecast driver drowsiness levels. Upon training their system on a dataset, they achieved an accuracy of 92%, empowering their model to create a live monitoring tool for drivers that has the potential to decrease road accidents. Xiang et al. [15] focused on creating a system for identifying driving using a 3D CNN and a channel attention mechanism technique. Their method involved utilizing an SVM-based classifier with a binary tree structure, for categorizing the four driving states post extracting features from grayscale images and applying gradient and optical flow information as inputs into the model. On the FDF dataset employed in the study, the researchers were able to achieve an accuracy rate of 95% in detecting driver fatigue. Abbas and colleagues introduced a technology called Hypo Driver, which detects driver fatigue and inattention in time by combining data from biosignal sensors and multiple cameras to categorize driving behaviors into five phases. This innovative approach outperformed existing methods by utilizing CNN-RNN-LSTM models to achieve a detection accuracy of 96.5% [16]. Gao et al. [17] proposed a technique for identifying fatigue in drivers by utilizing an architecture combined with CLIP. This approach allowed them to capture prolonged patterns in driver video sequences effectively. Their CT-Net model demonstrated a 36% enhancement, in accuracy compared to the CNN-LSTM model, achieving top tier performance with an impressive AUC score of 0.892. Namburi et al. [18] introduced a CNN-LSTM approach to identify driver tiredness and inattention employing FaceMesh to pinpoint landmarks around the mouth and eyes for marking along with IOU to capture hand movements covering the mouth during yawning episodes. Their method yielded an accuracy of 93.60%, in detecting instances of distraction and fatigue behind the wheel.

2.4 Summary of reviewed studies

The studies that have been looked at show that they're quite accurate in identifying driver states using CNN and various deep learning methods combined. However, there are some issues with many of them to be aware of. These problems include depending too much on behavioral categories only and not being used in real time situations or having a limited range of data to work with. Additionally, some models are very complex which makes them unsuitable for use in low power car technology, which means there is a need for a simpler system that can detect driver behavior quickly using a well-rounded dataset.

The method suggested in this study tackles these concerns by introducing a 22-layer CNN designed to optimize both accuracy and efficiency.

3. Proposed System Structure

AI has become incredibly important in the field of image analysis as it helps in extracting and understanding visual elements on a large scale. This progress is built upon traditional machine learning approaches, like learning (using labeled examples). Unsupervised learning (finding patterns in unlabeled data) which laid the foundation for AI applications in imaging [19]. The introduction of deep learning techniques such as multilayer CNNs that can learn features directly from raw pixel data has brought about advancements, in computer vision technology.

The initial stage in the procedure of implementing the suggested learning CNN system for recognizing five types of driving behavior involves gathering a substantial number of accurately labeled driver behavior images. Following this step is the categorization of these images into three datasets, training set, validation set and testing set. The purpose of using the training and validation sets is to facilitate the models learning process by training it with these datasets. Upon completing the training and validation phases with the datasets provided, the model will proceed to analyze the testing dataset to evaluate its learning capabilities and generate results in the form of classifying driver behavior images into five categories; turning movements, talking and texting on phone, safe driving practices and other miscellaneous actions. A visual representation of the systems functionality is depicted in Figure 1.

Figure 1. System structure

3.1 Driver behavior detection dataset

The dataset known as ReViTSoNE Driver Behavior Detection [20] is a compilation intended for aiding in the advancement and assessment of algorithms utilized to recognize and categorize driver behavior patterns effectively. The dataset certainly plays a role in research endeavors focused on minimizing road incidents through the identification of distracted driving patterns responsible for global accidents. The dataset chosen contains a mix of balanced distracted driving scenarios to avoid any biases in model training. It also includes a range of real-world variations like driver demographics (like age and gender) and varied cabin lighting conditions (daytime and nighttime), making the model more reliable in different situations. Moreover, its open access license guarantees that the research findings can be easily replicated and verified by others. The dataset contains 10766 RGB images captured using a camera mounted on the dashboard during driving sessions, with drivers and lighting conditions to represent real world scenarios accurately. Each frame was carefully labeled by expert reviewers into five behavior categories (Safe driving, talking on phone, texting, turning, and engaging in other activities like adjusting controls or reaching for objects). Before the training process began, each image underwent resizing to 480×640 pixels and center cropping to eliminate areas, outside the drivers’ field of view.

The dataset was split into two sections. One, for training and validation and the other for testing purposes, with 80% allocated to the former and 20% to the latter, which is illustrated through sample images in Figure 2, and detailed information regarding the five categories explored in this study, alongside image counts per category, is presented in Table 1.

Table 1. Image count per class

Class Name

No. of Images

Safe Driving

2153

Talking Phone

2153

Texting Phone

2153

Turning

2153

Other Activities

2153

(a) Safe driving

(b) Talking phone

(c) Texting phone

(d) Turning

(e) Other activities

Figure 2. Samples of dataset images

3.2 Convolutional neural networks (CNNs)

CNN models stand out as a category, within the realm of learning tailored specifically for analyzing cell-like data, such as images and time series data. CNNs are driving progress in computer vision, making it possible for machines to process visual information in a manner that mimics humans and to auto-learn hierarchical patterns and features from raw data [21].

CNNs include various layers, that ultimately take an input and convert to an output class [22]. The primary layers include pooling and convolutional layers; the activation function comes into play with fully connected layers [23].

  • Convolutional layers: illustrate a set of 2-D convolutional layers. It starts by maps $x_{m}^{l}$ as input, where l is the layer index and m is the map index. $w_{n,m}^{l}$ is the kernel-based filter, and n is the filter index. The nth represents the output map $y_{n}^{l}$ of layer l, which can be computed as [24]:

$y_m^l=\sum_m^{M^{l-1}} w_{n, m}^l * x_m^l+b_n^l$                    (1)

where, the input maps are denoted by ${{M}^{l-1}}$, * is convolution, and the lth level nth map output bias is shown by $b_{n}^{l}$.

  • The rectified linear unit (ReLU) layer: The ReLU activation layer serves as the nonlinear component following the convolutional layer (1). Applying the activation function in (2) results in more precise training of the model, which will experience an increase in non-linearity.

$Softmax\left[ f\left( x \right)=\max \left( 0,x \right) \right]$                    (2)

  • Pooling layer: The pooling layer decreases the size by reducing the number of parameters needed to describe the network and consequently cuts down the workload for training the model.
  • Fully connected layer: The fully connected layer, where all neurons interconnect (similar to traditional neural networks), serves as the final network segment that computes class probabilities. Eq. (3) is defining the linear combination ${{O}_{n}}$, and the input to the output layer is an mth feature map, which is ${{x}_{m}}$ [24]. This layer calculates the class probabilities for the available classes to extract the high-level features. The Softmax function described in Eq. (4) will be used to calculate the likelihood of the input data, across C categories [24]:

${{O}_{n}}=~\underset{m=1}{\overset{M}{\mathop \sum }}\,\left( {{w}_{n,m}}*~{{x}_{m}}+~{{b}_{n}} \right)$                    (3)

${{p}_{u}}=~\frac{\exp ~\left( {{O}_{u}} \right)}{\mathop{\sum }_{n=1}^{C}\exp ~\left( {{O}_{n}} \right)}$                    (4)

3.3 Proposed CNN model

In this system, a CNN is designed for detecting driver behavior through automated feature extraction and classification. The proposed CNN architecture, as illustrated in Figure 3, comprises 22 layers, structured as follows: an input layer processing three-channel images with dimensions 480×680×3, followed by six convolutional layers, six ReLU activation layers, and six pooling layers. After that, a completely connected layer followed by a softmax layer and a final categorizing output layer, for the five categories in the dataset structure.

During the training phase, we utilized the Adam optimizer for its effectiveness in managing gradients and dynamic objectives that change over time. After tuning through experimentation, a learning rate of 1e–5 was chosen to ensure convergence. The model underwent training for 100 epochs with 2 iterations, in each epoch, totaling 200 iterations to reach convergence.

The decision to use a 22-layer CNN was influenced by the goal of capturing driving behaviors in a computational manner that works well with related studies using deep architectures successfully demonstrated by Huang et al. [5]. They implemented a hybrid CNN and achieved an impressive accuracy of 96.74% in detecting distracted driving. Additionally, Alzubaidi et al. [21] pointed out that deeper CNN structures can enhance the learning of features in tasks based on images. Therefore, our structure aligns with known frameworks that have demonstrated success in identifying behavior while also being tailored for the task involving five classes utilizing the ReViTSonE dataset.

4. Results and Discussion

The creation of this system involved using the MATLAB programming language on a desktop computer operating on a 64-bit Windows 10 Pro system equipped with an Intel (R) core (TM) i7-4790 4th generation CPU operating at a speed of 3,60 GHz and containing 8.00 gigabytes of RAM along, with a 3 GB NIVIDIA GeForce GTX 1060 GPU. The CPU's four cores and the GPU worked together in synchrony in realization of the proposed approach utilizing a parallel computing methodology. In a singular database, the simulation was run to analyze the workload of the proposed system. The proposed model was confirmed to be successful and classification performance was analyzed using data that was extracted from the training and test sets. The training set had 80% of data extracted and the test set had 20%. The patterns shown in the database that were used during training process are illustrated in Figure 2. The outcomes of the suggested system's performance are summarized in Figures 4 and 5. The performance assessment of the CNN is demonstrated in Figure 4. Nevertheless, the number of epochs may be increased to improve training progress as well as to achieve the optimum results and ensure reliability. As displayed in the proposed behavior of the CNN model in Figure 4, correctness was achieved in the training phase with fixed values of the number of epochs and iterations, and the validity frequency and learning rate, along with the performance. The percent training accuracy amounted to 100%, and the total consumption time for 200 epochs was 23.46 seconds. The accuracy of the test was perfect, at 100%, which took 6.95 seconds to complete. Driver behavior can be separated into five types, as shown in Figure 5. It is clearly evidenced that the proposed CNN-based driver behavior detection system presented in Figure 5 is successfully implemented, and the training and validation results clearly showed that the overall accuracy is admirable. The results clearly indicated that the proposed method is capable of successfully detecting the driver's behavior as intended and the proposed method is able to carry out complex calculations in a relatively short period of time.

The setup for running learning tasks was optimized for performance by allowing parallel processing and cutting down on training time significantly with a balanced configuration in place. The training phase took a total of 23.46 seconds over the span of 200 iterations and 100 epochs and testing was completed swiftly in 6.95 seconds. This is achieved by utilizing of GPU acceleration in tasks related to image classification using CNNs.

While the setup discussed shows that a researcher can train and assess such systems, it must be kept in mind that longer processing time might be faced if a less powerful hardware setup is used. However, tweaking the model design and training settings, like cutting down the layers or adjusting the image resolution, can make the system works in environments with less powerful resources without affecting classification accuracy.

The CNN-based system we proposed achieved 100% accuracy in both the training and testing datasets. Showcasing its detection capabilities when compared to previous studies by Huang et al. [5], who achieved 96.74% accuracy with a hybrid CNN model, and Majeed et al. [7], who reached 96.69% for driver drowsiness detection. Our system outperforms these works due to factors, like a balanced dataset, deeper feature extraction layers and optimized training parameters.

The driver behavior detection system being suggested shows promise for being incorporated into Advanced Driver Assistance Systems (ADAS). Its primary goal is to improve road safety by issuing alerts in situations of unsafe driving behavior. Moreover, it can be used to bolster in car monitoring systems, aid in driver training initiatives and facilitate the analysis of collision behaviors.

However, the effectiveness of the system is hindered by its dependency, on one dataset and a controlled testing setting. To ensure adaptability in the tasks of real life, it should focus on dealing with the difficulties of real time situations and evaluating performance across different datasets and environments.

Figure 4. Accuracy and loss rate of the training progress

Figure 5. Driver behavior detection

5. Conclusion

In this research work, a driver behavior detection system based on CNN was introduced to classify five driver actions with precision. The key innovation is the creation of a 22-layer CNN structure specifically tailored for behavior classification tasks, achieving a 100% accuracy rate, on both training and testing data sourced from the ReViTSonE dataset. This showcases the model’s capability in analyzing and understanding patterns depicted in images captured by in-car cameras.

We plan to improve the model’s adaptability by evaluating it on datasets and under different real-life situations. Furthermore, we will investigate the potential of implementing the system in real time settings using platforms like Raspberry Pi to evaluate its suitability for automotive use cases.

  References

[1] World Health Organization (WHO). Road traffic injuries. https://www.who.int/news-room/fact-sheets/detail/road-traffic-injuries, accessed on Jun. 1, 2025.

[2] National Safety Council Injury Facts. Distracted Driving. https://injuryfacts.nsc.org/motor-vehicle/motor-vehicle-safety-issues/distracted-driving/, accessed on Jun. 1, 2025.

[3] Tamas V., Maties, V. (2019). Real-time distracted drivers detection using deep learning. American Journal of Artificial Intelligence, 3(1): 1-8. https://doi.org/10.11648/j.ajai.20190301.11

[4] Chirra, V.R.R., Uyyala, S.R., Kolli, V.K.K. (2019). Deep CNN: A machine learning approach for driver drowsiness detection based on eye state. Revue d'Intelligence Artificielle, 33(6): 461-466. https://doi.org/10.18280/ria.330609

[5] Huang, C., Wang, X., Cao, J., Wang, S., Zhang, Y. (2020). HCF: A hybrid CNN framework for behavior detection of distracted drivers. IEEE Access, 8: 109335-109349. https://doi.org/10.1109/ACCESS.2020.3001159

[6] Ezzouhri, A., Charouh, Z., Ghogho, M., Guennoun, Z. (2021). Robust deep learning-based driver distraction detection and classification. IEEE Access, 9: 168080-168092. https://doi.org/10.1109/ACCESS.2021.3133797

[7] Majeed, F., Shafique, U., Safran, M., Alfarhood, S., Ashraf, I. (2023). Detection of drowsiness among drivers using novel deep convolutional neural network model. Sensors, 23(21): 8741. https://doi.org/10.3390/s23218741

[8] Tao, C., Ma, S. (2023). Driver distraction recognition with pose-aware two-stream convolutional neural network. In Eighth International Conference on Electromechanical Control Technology and Transportation (ICECTT 2023), 12790: 127905B. https://doi.org/10.1117/12.2689437

[9] Kidu, T., Song, Y., Seo, K.W., Lee, S., Park, T. (2024). An intelligent real-time driver activity recognition system using spatio-temporal features. Applied Sciences, 14(17): 7985. https://doi.org/10.3390/app14177985

[10] Delwar, T.S., Singh, M., Mukhopadhyay, S., Kumar, A., Parashar, D., Lee, Y., Rahman, M.H., Sejan, M.A.S., Ryu, J.Y. (2025). AI- and deep learning-powered driver drowsiness detection method using facial analysis. Applied Sciences, 15(3): 1102. https://doi.org/10.3390/app15031102

[11] Escottá, Á.T., Beccaro, W., Ramírez, M.A. (2022). Evaluation of 1D and 2D deep convolutional neural networks for driving event recognition. Sensors, 22(11): 4226. https://doi.org/10.3390/s22114226

[12] Abdubrani, R., Mustafa, M., Zahari, Z.L. (2023). A robust framework for driver fatigue detection from EEG signals using enhancement of modified Z-score and multiple machine learning architectures. IIUM Engineering Journal, 24(2): 354-372. https://doi.org/10.31436/iiumej.v24i2.2799

[13] Sharma, V., Ahirwal, M.K. (2024). An end to end brain computer interface system for mental workload estimation through hybrid deep learning model. Human Centric Intelligent Systems, 4: 599-609. https://doi.org/10.1007/s44230-024-00086-y

[14] Ed‑Doughmi, Y., Idrissi, N., Hbali, Y. (2020). Real‑time system for driver fatigue detection based on a recurrent neuronal network. Journal of Imaging, 6(3): 8. https://doi.org/10.3390/jimaging6030008

[15] Xiang, W., Wu, X., Li, C., Zhang, W., Li, F. (2022). Driving fatigue detection based on the combination of multi-branch 3D-CNN and attention mechanism. Applied Sciences, 12(9): 4689. https://doi.org/10.3390/app12094689

[16] Abbas, Q., Ibrahim, M.E., Khan, S., Baig, A.R. (2022). Hypo-driver: A multiview driver fatigue and distraction level detection system. Computers, Materials & Continua, 71(1): 1999-2007. https://doi.org/10.32604/cmc.2022.022553

[17] Gao, Z., Chen, X., Xu, J., Yu, R., Zhang, H., Yang, J. (2024). Semantically-enhanced feature extraction with CLIP and transformer networks for driver fatigue detection. Sensors, 24(24): 7948. https://doi.org/10.3390/s24247948

[18] Namburi, A., Sitpasert, P., Duang‑Onnam, W. (2024). A CNN‑LSTM approach for accurate drowsiness and distraction detection in drivers. ICIC Express Letters, 18(9): 907-917. https://doi.org/10.24507/icicel.18.09.907

[19] Saeed, R.S., Oleiwi, B.K. (2022). A survey of deep learning applications for COVID-19 detection techniques based on medical images. Ingénierie des Systèmes d’Information, 27(3): 399-408. https://doi.org/10.18280/isi.270305

[20] Robinreni. Revitsone-5classes Driver Behavior Dataset [Data set]. Kaggle. https://www.kaggle.com/datasets/robinreni/revitsone-5class, accessed on Jun. 1, 2025.

[21] Alzubaidi, L., Zhang, J., Humaidi, A.J., Al Dujaili, A., Duan, Y., Al Amidie, M., Al Shamma, O., Santamaría, J., Fadhel, M.A., Farhan, L. (2021). Review of deep learning: Concepts, CNN architectures, challenges, applications, future directions. Journal of Big Data, 8(1): 53. https://doi.org/10.1186/s40537-021-00444-8

[22] Jalil, B.D., Noaman Al Hayanni, M.A. (2024). Intelligent deep learning system for enhanced pulmonary disease diagnosis through five class mode. Revue d’Intelligence Artificielle, 38(4): 1193-1199. https://doi.org/10.18280/ria.380413

[23] Khan, A., Sohail, A., Zahoora, U., Qureshi, A.S. (2020). A survey of the recent architectures of deep convolutional neural networks. Artificial Intelligence Review, 53(8): 5455-5516. https://doi.org/10.1007/s10462-020-09825-6

[24] Althabhawee, A.F.Y., Alwawi, B.K.O.C. (2022). Fingerprint recognition based on collected images using deep learning technology. IAES International Journal of Artificial Intelligence (IJ-AI), 11(1): 81-88. https://doi.org/10.11591/ijai.v11.i1.pp81-88