An Advanced AI Framework for Mental Health Diagnostics Using Bidirectional Encoder Representations from Transformers with Gated Recurrent Units and Convolutional Neural Networks

An Advanced AI Framework for Mental Health Diagnostics Using Bidirectional Encoder Representations from Transformers with Gated Recurrent Units and Convolutional Neural Networks

G. Pushpa M. Chaitra Lakshmi P. Kolur Shashank Dhananjaya Kavyasri M. N. R. Sunitha* Abhilasha P. Kumar

Department of Computer Science Engineering, RNS Institute of Technology, Bengaluru 560098, India

Department of Computer Science and Engineering, BNM Institute of Technology, Bangalore 560070, India

Department of Artificial Intelligence and Machine Learning, Basaveshwara Engineering College, Vidyagiri Bagalkot 587102, India

Department of Information Science and Engineering, The National Institute of Engineering, Mysore South 570008, India

Department of Computer Science and Engineering, Malnad College of Engineering, Hassan 573202, India

Department of Artificial Intelligence and Machine Learning, BNM Institute of Technology, Bangalore 560070, India

Corresponding Author Email: 
sunithar1389@gmail.com
Page: 
213-220
|
DOI: 
https://doi.org/10.18280/isi.300118
Received: 
21 June 2024
|
Revised: 
16 November 2024
|
Accepted: 
14 January 2025
|
Available online: 
25 January 2025
| Citation

© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

Mental health research and brain study have rapidly developed with advanced technologies including artificial Intelligence and deep learning. This research has grown enormously to solve the mental health issues of the current generation that are affected by various factors. The approaches driven by data with certain attributes are helping to detect, diagnose, and solve mental health disorders. Specifically, the rapidly developing discipline of precision psychiatry makes use of sophisticated computer methods to provide more individualized mental health care. This paper presents a model based on deep learning named Bidirectional Encoder Representations from Transformers and Gated Recurrent Unit-based Convolutional Neural Network (BERT and GRU-based CNN). It aims to transform the landscape of mental health diagnostics through the integration of cutting-edge deep learning models. BERT model Leveraging the power of a transformer focuses on developing a sophisticated system capable of accurately and efficiently diagnosing mental health disorders. A gated recurrent Unit used to analyze diverse datasets encompassing behavioral patterns, physiological signals, and contextual information, strives to provide timely and personalized insights. Finally, the Convolutional neural network will detect the final mental health condition of the person by analyzing all the patterns. The experimentation is done on the dataset to check the model accuracy resulted in 97%. The goal is to enhance early detection, enable targeted interventions, and ultimately improve the overall mental well-being of individuals. This paper outlines our commitment to harnessing technology for the advancement of mental health diagnostics and underscores the potential impact of this model in revolutionizing mental healthcare practices.

Keywords: 

machine learning, mental health disorders, deep learning, BERT, convolutional neural networks, gated recurrent unit

1. Introduction

An application of deep learning for mental health diagnostics has shown promise in recent years as a way to transform conventional methods of assessment and therapy. Mental health issues are a major global burden that impairs people's quality of life and puts pressure on global healthcare systems [1]. Even with the severity and frequency of these illnesses, prompt and precise diagnosis is still difficult to come by, which frequently causes delays in starting therapy and worse-than-ideal results [2]. Image identification, natural language processing, and medical diagnostics are just a few of the fields in which deep learning, a subset of artificial intelligence that draws inspiration from the structure and functions of the human brain, has shown impressive results [3]. Leveraging large datasets and complex neural networks, deep learning algorithms have shown the potential to enhance the accuracy and efficiency of mental health assessments [4].

The capacity of machine and deep learning algorithms to identify patterns is one of its key advantages. These techniques show some promise in identifying generally applicable trends among those who suffer from mental health disorders. For example, Carrillo et al. [5] demonstrated how transcribed textual data may be used to accurately identify healthy controls from patients experiencing depression using a Gaussian Naive Bayes classifier with an F1-score of 0.82. Considering the known challenges in identifying mental health issues, psychiatrists can gain from depression diagnosis methods. Mental health disorders lack objective illness indicators, in contrast to other areas of medicine [6]. One of the main difficulties in diagnosing psychopathology is the absence of an objective marker [7]. Because populations with the same diagnosis sometimes have significantly different symptoms, current diagnostic techniques are under scrutiny [8].

Unsupervised learning techniques assist in the identification of several subgroups of depression or maybe new diagnoses. Drysdale et al. [6] examined functional connectivity across patients diagnosed with depression to evaluate depressive heterogeneity using hierarchical clustering, an unsupervised learning technique. Unsupervised techniques allow researchers to discover previously unknown links, even if supervised techniques are used in the majority of the research examined in this study. They distinguished four distinct depression biotypes using fMRI scans. It has been shown that the responses to rTMS therapy vary throughout these biotypes. Every subtype probably represents a distinct illness because they all responded to therapies in various ways. The potential of artificial intelligence systems to facilitate the shift to new diagnostic taxonomies is demonstrated by this work.

Contemporary computing techniques not only facilitate the identification and diagnosis of mental health disorders but also present an opportunity to customize therapy recommendations. To determine which antidepressant is best for a patient, physicians now use a trial-and-error method [9, 10]. Chang et al.'s revolutionary research [11] shows that psychiatrists can assess an antidepressant drug's expected effect before administering it. Their research demonstrates that the Antidepressant Response Prediction Network, or ARPNet, is an artificial neural network that can accurately forecast an antidepressant's side effects before starting therapy. The potential for patient-specific treatment is increased by these technologies. Artificial intelligence's initial goal was to mimic human functions artificially [12]. Such research aimed to develop symbolic artificial intelligence in its early stages. "Carry out a series of logic-like reasoning steps over language-like representations" was the stated objective of work on symbolic artificial intelligence [13].

That being said, most artificial intelligence researchers are no longer primarily interested in symbolic artificial intelligence. Instead, pattern recognition based on artificial neural networks is currently dominating the field. A significant amount of the current research on neural networks is based on the first example of a perceptron, which can be found in Rosenblatt's seminal work [14]. The emergence of deep learning can be attributed to the expansion of these networks due to technical breakthroughs [15]. In deep learning, "depth" refers to an artificial neural network's quantity of hidden layers. However, there is disagreement on what constitutes a "deep" neural network [16]. Sheu [17] has proposed a deep neural network must include at least three layers: an input layer, a hidden layer, and an output layer. However, before classifying a network as a deep neural network, modern researchers typically need to identify multiple hidden layers.

By tackling these goals, this study hopes to add to the expanding corpus of research on the correlation between advanced technologies and mental health that offer insights into the deep learning technology may revolutionize mental health diagnoses. This paper explores the potential of deep learning technology to revolutionize mental health diagnostics by providing a comprehensive review of existing literature, methodologies, and applications. By analyzing the strengths and limitations of current approaches, we aim to make use of opportunities for future research and development in this rapidly evolving field.

The primary research contributions of this paper are given as follows:

  • Examine the state of mental health diagnosis today and the difficulties in using conventional methods of assessment.
  • Design the BERT- GRU-CNN framework to identify the patterns in the dataset to diagnose mental health issues.
  • Design a hybrid framework BERT-GRU-CNN with modified layers with clinical data settings for a range of mental health conditions, such as bipolar disorder, schizophrenia, anxiety, and depression.
  • Evaluate the designed model for the applicability, benefits, drawbacks, and moral issues surrounding the use of deep learning technology in mental health diagnosis.
2. Literature Survey

Leveraging the vast text corpus generated by social media, deep learning, and natural language processing have made tremendous progress in identifying mental health. Social media datasets used for mental health diagnosis can be thought of as a kind of supervised learning problem where postings are categorized as classes on mental stability. Out of the examined literature, two types of research stand out: those that rely exclusively on self-report and those that use self-report or psychometric testing to confirm the presence of depression in patients. The few books on mental health conditions and their identification, diagnosis, and treatment are listed below.

Mao et al. [18] thoroughly looked at the use of deep learning techniques for automated depression diagnosis utilizing a variety of modalities, including textual data and neuroimaging. Future research directions are suggested by the writers, who also address the potential problems in this developing field. The possibility of predicting anxiety disorders using longitudinal social media data obtained from individuals is investigated in a study by Ahmed et al. [19]. The authors show encouraging outcomes in the early identification and tracking of anxiety-related symptoms by utilizing deep learning algorithms, underscoring the promise of digital phenotyping in mental health diagnostics. Using resting-state functional MRI data, Ruiz de Miras et al. [20] presented a deep learning framework for separating patients with schizophrenia from healthy controls. The model's great accuracy in differentiating between the two groups suggests that it could be used as an additional tool to support clinical diagnosis and comprehend the neurological basis of schizophrenia.

Convolutional neural networks (CNNs) were applied for the automatic identification of Alzheimer's disease using brain MRI data by Ebrahimi et al. [21]. With its strong ability to distinguish Alzheimer's patients from healthy persons, the suggested CNN model presents a promising non-invasive method for early diagnosis and disease monitoring. A thorough assessment of deep learning methods for multimodal analysis of mental health diseases is given in the Shamshirband et al. [22] review paper. This includes methods for combining heterogeneous data sources such as neuroimaging, genetic, and clinical data. Future research directions are suggested by the writers, who also address the potential and problems in this multidisciplinary field.

Iyortsuun et al. [23] recognizes the difficulties in implementing machine learning in the field of mental health, including issues with data quality, interpretability, and ethics. It also outlines future research goals to overcome these challenges and apply machine learning models to clinical practice for improved diagnosis and treatment planning. Understanding the state-of-the-art in deep learning for neuroimaging-based mental disorder classification is made possible by Zhang et al.'s [24] review, which highlights the technology's potential to advance our understanding of the connections between the brain and behavior in mental health disorders and to boost diagnostic precision. The review addresses the challenges and limitations associated with applying deep learning methods to the neuroimaging-based classification of mental illnesses. These comprise data heterogeneity, small sample sizes, deep learning model interpretability, and reproducibility of findings.

Gomes et al. [25] conducted a comprehensive review of the analysis of wearable sensor data for mental health monitoring. The paper addresses the challenges and limitations associated with the use of wearable sensor data processing for monitoring mental health. These include issues with data quality, inconsistent sensor locations, noncompliance from users, and privacy issues. The necessity of standardizing and validating wearable sensor-based mental health assessment instruments is another topic covered. Naslund et al. [26] conducted a thorough analysis of the relationship between social media use and mental health. This comprehensive analysis emphasizes the need for holistic ways to address the issues that social media use poses in the digital era by highlighting both the good and negative elements of the complex relationship between social media use and mental health.

Stahlschmidt et al. [27] offer insightful information about the state-of-the-art in multimodal data fusion for mental health diagnostics, emphasizing how it has the potential to revolutionize the evaluation, identification, and treatment of mental health diseases. The benefits of multimodal data fusion in mental health diagnostics are discussed in the review, including increased precision in diagnosis, a better comprehension of disease heterogeneity, and customized treatment planning. Nevertheless, it also tackles difficulties related to multimodal data fusion, such as problems with data integration, feature selection, and fused data interpretation. The summary of the epidemiology of anxiety-related conditions in the twenty-first century is provided by Bandelow and Michaelis [28]. They have investigated several risk variables, such as genetic predisposition, stressful circumstances, early adversity, and neurobiological factors, that are linked to the emergence of anxiety disorders. They go over how all of these risk factors work together to cause anxiety disorders to develop and persist throughout life.

Chekroud et al. [29] present a unique machine-learning method to forecast depression treatment results in several clinical trials. Through the application of machine learning techniques, physicians may be able to customize treatment plans according to the unique features of each patient, improving therapeutic results and lessening the need for trial and error when choosing a course of action. Ahmed et al.'s machine learning method [30] for identifying anxiety and depression makes use of supervised learning strategies. The work creates opportunities for more investigation into the use of machine learning in mental health diagnosis. Subsequent research endeavors may delve into the amalgamation of multimodal data sources, substantiation of models across heterogeneous populations, and implementation of automated screening instruments in authentic healthcare environments.

Sau and Bhakta [31] are looking at the use of machine learning technologies to predict anxiety and depression in elderly people. The study offers suggestions for future developments in the area of machine learning applications for evaluating senior adults' mental health. Further research projects could focus on adding more sources of data, improving prognostic models, and validating results in real-world clinical settings. The application of machine learning technologies for the screening of anxiety and depression in seafarers is examined by Sau and Bhakta [32]. The study offers suggestions for further research directions regarding the application of machine learning to the evaluation of the mental health of sailors. Subsequent research endeavors may delve into the assimilation of supplementary data sources, corroborating predictive models across heterogeneous populations, and executing screening measures in authentic maritime environments.

Using data from a UK military cohort, Leightley et al. [33] employed supervised machine learning algorithms to assess the possibility of post-traumatic stress disorder (PTSD). The article offers suggestions for future research directions on the application of supervised machine learning methods for mental health screening in military populations. Utilizing ensemble machine learning techniques, Papini et al. [34] conducted a study to predict the screening status for posttraumatic stress disorder (PTSD) following hospitalization in the emergency room. The study makes suggestions for other research directions on ensemble machine learning's use in mental health screening in emergency rooms. US veterans' speech-based markers of posttraumatic stress disorder (PTSD) were investigated in a study by Marmar et al. [35]. The study offers suggestions for future lines of inquiry into speech-based PTSD markers. A study on the adaptive detection of brain and subcortical imaging indicators linked to posttraumatic stress disorder (PTSD) and early life stress (ELS) was carried out by Salminen et al. [36].

The most extensively studied applications of machine learning, natural language processing, and deep learning in mental health care support now involve detection systems. Section 1 introduces the mental health conditions and the factors affected, the various advanced technologies along with objectives and the scope of the proposed work. The method of contemporary computational tools influencing the identification of mental health disorders along with the proposed research contributions are given in Section 2. Section 3 provides a proposed hybrid deep learning and natural language processing-based model to detect mental health disorders with a description of the cutting-edge technologies being employed to transform the diagnostic systems of today. Furthermore, Section 4 offers an implementation and evaluation results of the proposed model with real-time suitability. Finally, the last section concludes the overall work and suggestions for further enhancement followed by references.

3. Proposed Method

3.1 Data gathering

Data gathering is the initial stage of the proposed model design and it is necessary to gather the data required to design and evaluate the model. The gathered post data from six conditions relating to mental health are depression, anxiety, bipolar, BPD, schizophrenia, and autism, each of which is said to be linked to a particular disorder. To further evaluate the proposed framework with general health information, also gathered post data from the most popular Subreddit for health-related content. We gathered the user IDs from every Subreddit where at least one post had something to do with mental health. Using the web scrapping tool, we gathered titles and posts in addition to user IDs. In total, 348,637 people who posted 732,384 messages in the different Subreddit provided data for the current study.

3.2 Procedure for pre-processing data

Data pre-processing is the second very important step in the research for cleaning the post-data collection from social media. Following data collection, every title and related post was merged. For every post, we eliminated extraneous punctuation and white space. Then, we filtered frequently used words (stop words) and tokenized user postings using the Python implementation of the Natural Language Toolkit.

We made advantage of the NLTK library's stop word list. Common terms like pronouns, prepositions, and conjunctions that are usually eliminated in natural language processing tasks are included in this list. As a starting point, we modified the typical stop word list by eliminating words like "anxiety" and "depression", which are pertinent to discussions about mental health, and adding filler words like "lol", "umm", and "lol" that are commonly used in casual writing.

3.3 BERT-based GRU-CNN model

The proposed model is a deep learning and natural language processing model for predicting mental health disorders. The model is designed by using three models BERT, GRU, and CNN. The complete design is explained in the below section. The initial stage of the model is using the BERT natural language model to process the data by training in both directions. The Bidirectional Encoder Representations from Transformers (BERT) deep learning model is intended to jointly condition on both left and right contexts in every layer in order to pre-train bidirectional representations. Natural language processing (NLP) activities including named entity identification, sentiment analysis, and question answering have been transformed by BERT, a Google creation. The essential component of BERT's design is the self-attention mechanism, which enables the model to assess each token's significance in relation to every other token in a sequence. The dot product of a token's T vector and the V vectors of every other token in the sequence is used to calculate the attention scores. To stabilize gradients, the scores are scaled using the square root of dimensionality of K. An attention score AS is calculated using the Eq. (1):

$A S=\frac{T V^K}{\sqrt{d_v}}$        (1)

The input to the BERT is a sequence of tokens, that are the combination of the words and special tokens. Each token i is then converted into three embedding token embedding, segment embedding, and position embedding as shown in Eq. (2):

$E i=T i+S i+P i$          (2)

The position-wise feed-forward network and multi-head self-attention are the two sublayers that make up each encoder layer. Between each of the two sublayers, the normalization and residuals layers are applied. In this study, we employ the BERTBASE model, which has 110 million parameters, 12 self-attention heads, a hidden size of 768, and an encoder transformer blocks of twelve. We employ the BERT model's feature-based methodology to get a text data in quantitative representation. This is a feature-based approach hence takes a pre-trained model and uses it to extract fixed features.

The feature-based technique offers two benefits over the fine-tuning approach. First, a feature-based approach may be used to add several unique model architectures to each job. This is done because not all issues can be promptly resolved by the transformer encoder architecture. Second, because the precomputation representation procedure is only done once, the feature-based method increases computational efficiency.

The variant of RNN considered as a simplest form of the conventional LSTM is the gated Recurrent Unit. The GRU introduces a gate called the reset gate, that helps combine the input. It also merges the functionality of the forget and input gates from the LSTM into a single gate, known as the update gate. Additionally, GRU creates a single hidden state by fusing two LSTM states the hidden state and the cell state. At the $t$ th time step, GRU, like LSTM, feed input as a vector $x_{\mathrm{k}}$ and a hidden state $h_{(t-1)}$. Finding out what data from the prior time step has to be replicated is the first stage in the GRU process. The update gate $z_t$ as indicated in Eq. (3), is used for this phase.

$z_t=\sigma\left(W_{z x} x_t+W_{z h} h_{t-1}+b_z\right)$          (3)

That data from the previous time period should be discarded is decided in the following step, which is to reset the gate $r_t$. The Eq. (3) illustrates this phase. The hidden state candidate value $h_t$ is then determined using the reset gate's findings, which aids in preserving important historical data. The candidate value of concealed state is calculated using Eq. (4).

$r_t=\sigma\left(W_{r x} x_t+W_{r h} h_{t-1}+b_r\right)$         (4)

$\tilde{h}_t=tanh\ tanh \left(W_{h x} x_t+r_t \odot W_h h_{t-1}+b_h\right)$            (5)

Finding the hidden state $h_t$, that is required to collect and forward data to the following stage is the final step. The hidden state may be computed using the formula in Eq. (5):

$h_t=z_t \odot h_t-1+\left(1-z_t\right) \odot \widetilde{h}_t$            (6)

Here, $W_{\{z z, z h, r r, r h, h x, h\}}$ considered as a weighted matrix and $b_{\{z, r, h\}}$ considered as a biased vector [6].

The GRU processes sequential data and outputs a hidden state for each time step. This hidden state is a vector representation that captures temporal dependencies and sequential patterns in the data. The GRU's output can be a sequence of hidden states: Each time step has a corresponding output. Or a final hidden state: Summarizes the entire sequence. It will be feed as input to the CNN.

An outline of the CNN-based approach that has been suggested. This model's convolutional layer receives word vector input and applies a variety of filters with five different filter sizes. To avoid over-fitting problems, we also used a dropout rate of 0.25. Similar to many f filters, U will use the convolution layer to evaluate the input and determine the salient features of each region within a given region size. Assume that the $j$ th word vector representation is represented by $y_j$ and that the vector combination of the words $y_j$ to $y_{j+k-1}$ is represented by $Y_{[j: j+k-1]}$. The feature vector $V=$ $\left[v_1, v_2, \ldots \ldots \ldots, v_{m-k+1}\right]$ for each filter is determined using Eq. (2), where l and m stand for the matrix's rows and columns and n is a nonlinear activation function. The convolutional layer sends the output to max pooling layer as shown in the form of Eq. (7).

$v_j=n\left(\sum_{m=1}^k \sum_{l=1}^a Y_{[j: j+k-1] m, l} . U_{m, l}\right)$           (7)

The max-pooling layer, which has a size of 128 and uses the maximum values inside the CNN filters. The output of the convolutional layer is processed by the max pooling layer by extracting the most prominent features v , thus $\hat{v}=\max \{\mathrm{v}\}$.

GRU-CNN was one of the hybrid deep learning models employed in this study. Prior to doing sentiment analysis with these models, BERT is used to convert the input text data into a numerical representation. GRU-CNN based on BERT. Every input sentence in GRU-CNN models passes the procedure. To identify the key characteristics in the data, GRU first processes the text representation that is the result of the BERT model. After that an output is fed into the CNN model that considers the data order to produce a new representation. The GRU-CNN models' architecture is made to handle mental health data with a variety of significant feature patterns. Every input sentence goes through a GRU-CNN model procedure. Here, GRU learns the features by observing the sequence of features quality in the data. CNN then receives the GRU output and searches the data for key characteristics. Additionally, the output is mapped into two categorization classes using a fully linked layer.

4. Results and Discussions

This section provides detailed information about the experimentation and the results. The experimentation is done with the help of Python programming with various available libraries in Google Colab. This research started with the data collection, preprocessing for cleaning, and further processing. The suggested model uses data processing to provide a numerical representation of the data. In order to categorize and label the data for sentiment analysis, this output is then fed into the hybrid ensemble model GRU-CNN. Lastly, the model will forecast the individual's mental health status depending on the different input inputs. So, the proposed model is evaluated and analyzed for real-time implementation using various parameters.

Following this process, 586,573 posts from 348,637 users' data were used for the analysis. For the model evaluation, the dataset is divided into training (80%) and testing (20%) sets. Then, the proposed model is employed and we excluded the posts of users who wrote posts across multiple Subreddit in the learning phase. The following platform was used to conduct the experiments: Processor: 3.90GHz Intel Core i9-12900K, GPU: NVIDIA RTX 3090, RAM Storage: NVMe SSD 1TB, PyTorch 2.0 framework with scikit-learn 1.3.0 and NumPy 1.24.3 libraries System software: Ubuntu 22.04 LTS. Effective model testing and training were made possible by this setup.

There are two types of data used in this study: test data, training data, and validation data. In which, 10% of each dataset will be used for testing, and validation, the remaining 80% is used for training. Subsequently, the data is fed into the CNN-GRU hybrid model after passing through the BERT text data representation process; the results are compared with the few mechanisms that are currently in place to demonstrate the effectiveness of the model. We use the possible hyperparameter values to construct each model built using 50 epochs to train the data during the learning phase. Adam optimization is used with a batch size of 32 and a learning rate of 1×10−3. Then, categorical cross-entropy is used as the loss function, and Bayesian optimization will be used to optimize the hyperparameter values with a maximum of 10 trials. During the model training process, we employ early stopping with five patience levels. Validation loss is the value being watched, thus if it rises across five epochs, the learning.

To determine if the model could be deployed, we examined its inference time in addition to analyzing accuracy and other performance indicators. Inference time was assessed for batch sizes of 1, 8, and 16 in experimentation. The model's suitability for real-time applications was demonstrated by the findings, which indicated that the average inference time for a single input was 10ms. However, implementation on devices with limited resources could necessitate additional optimization, like model quantization.

The total effectiveness of the suggested model and a comparison with current methods are displayed in Table 1. The hybrid deep learning natural language processing model is the BERT-GRU-CNN model that has been suggested. Parameters like as accuracy, precision, recall, F1-Score, and error are used to evaluate the model. Support Vector Machine (SVM), Linear Regression (LR), Random Forest (RF), Naïve Bayesian (NB), GRU-CNN, and Convolutional Neural Network, LSTM-CNN are examples of current methods.

Figure 1. Performance evaluation of the proposed model with the existing model

Table 1. The area under the curve for each algorithm's performance

Parameter

SVM

RF

LR

NB

CNN

LSTM-CNN

SVM-PCA

GRU-CNN

BERT-GRU-CNN

Accuracy in %

75

88

85

72

77

85

89

92

97

Precision in %

78

80

82

78

79

88

90

93

98

Recall in %

75

84

89

79

80

84

89

90

96

F1- Score in %

73

85

81

76

78

89

91

92

95

Error in %

24

22

14

14

23

14

11

8

3

Accuracy, Recall, Precision, F1-Score, and Error rate are the five important performance metrics used to compare various machine learning models in Figure 1. approximately 75%, 78%, 75%, 73%, and 24% were achieved by Support Vector Machine (SVM); approximately 88%, 80%, 84%, 85%, and 22% were achieved by Random Forest (RF); approximately 85%, 82%, 89%, 81%, and 14% were achieved by Logistic Regression (LR); approximately 72%, 78%, 79%, 76%, and 14% were achieved by Naive Bayes (NB); approximately 77%, 79%, 80%, 78%, and 23%, 85%, 88%, 84%, 89%, and 14% for LSTM-CNN Hybrid, about 89%, 90%, 89%, 91%, and 11% for SVM with PCA, approximately 92%, 93%, 90%, 92%, and 8% for GRU-CNN Hybrid, and 97% accuracy for the proposed BERT-GRU-CNN Hybrid, indicating superior performance in creating accurate predictions. Precision: 98% indicates that the majority of its optimistic forecasts came true. Recall: 96% demonstrates that it successfully recognizes real positives; F1-Score: 95% indicates that the model is effective and balanced. Error: 3% denotes the most dependable outcome.

This indicates that the BERT-GRU-CNN Hybrid framework that has been suggested performs exceptionally well on all criteria, proving the value of fusing transformer-based architectures with deep learning models. Advanced hybrid models especially those that combine CNN, BERT, and GRU perform better than classic machine learning models on all measures, making them the optimal option for this task.

5. Conclusion

Deep Learning and Natural Language Processing are two examples of the cutting-edge technologies that have accelerated the study of the brain and mental health. The field of research has greatly expanded in an attempt to address the many aspects affecting the mental health of the current generation. Methods based on data with specific characteristics assist in the identification, diagnosis, and treatment of mental health issues. In particular, precision psychiatry a fast-growing field uses advanced computer techniques to deliver more specialized mental health services. This research introduces a deep learning model called Gated Recurrent unit-based Convolutional Neural Network (GRU-based CNN) and Bidirectional Encoder Representations from Transformers (BERT and GRU-based CNN). It utilizes the deep learning algorithms to change the way mental health diagnoses are conducted. BERT framework Making advantage of transformer power focuses on creating an advanced system that can quickly and precisely diagnose mental health issues. A gated recurrent unit analyses a variety of variables, including physiological signals, behavioral patterns, and environmental data, to produce timely and individualized insights.

Ultimately, by analyzing every pattern, a convolutional neural network will determine the individual's ultimate state of mental health. Overall, the model accuracy is about 97% resulting in the experimentation to verify the model's correctness. Enhancing early identification, enabling focused interventions, and eventually improving people's general mental health are the objectives. The proposed model has shortcomings in handling long texts and the multilingual inputs. The model can be further integrated with AI Chabot to increase the scalability and usage experience on the real-world data. Large Learning Models can be used to learn a large amount of data and suggestions or tips can be provided for depressed people to come out of it.

  References

[1] Martinelli, A. (2024). Sustainable promotion of mental health and prevention of mental health disorders across the world. In Integrated Science for Sustainable Development Goal 3: Empowering Global Wellness Initiatives, pp. 1-23. https://doi.org/10.1007/978-3-031-64288-3_1

[2] Öztürk, E.N.Y., Çöl, M. (2024). Community mental health from public health perspective. Dental and Medical Journal-Review, 6(2): 109-119.

[3] LeCun, Y., Bengio, Y., Hinton, G. (2015). Deep learning. Nature, 521(7553): 436-444. https://doi.org/10.1038/nature14539

[4] Dwyer, D.B., Falkai, P., Koutsouleris, N. (2018). Machine learning approaches for clinical psychology and psychiatry. Annual Review of Clinical Psychology, 14(1): 91-118. https://doi.org/10.1146/annurev-clinpsy-032816-045037

[5] Carrillo, F., Sigman, M., Slezak, D.F., Ashton, P., Fitzgerald, L., Stroud, J., Nutt, D.J., Carhart-Harris, R.L. (2018). Natural speech algorithm applied to baseline interview data can predict which patients will respond to psilocybin for treatment-resistant depression. Journal of Affective Disorders, 230: 84-86. https://doi.org/10.1016/j.jad.2018.01.006

[6] Drysdale, A.T., Grosenick, L., Downar, J., Dunlop, K., Mansouri, F., Meng, Y., Fetcho, R.N., Zebley, B., Oathes, D.J., Etkin, A., Schatzberg, A.F., Sudheimer, K., Keller, J., Mayberg, H.S., Gunning, F.M., Alexopoulos, G.S., Fox, M.D., Pascual-Leone, A., Voss, H.U., Casey, B.J., Dubin, M.J., Liston, C. (2017). Erratum: Resting-state connectivity biomarkers define neurophysiological subtypes of depression. Nature Medicine, 23(2): 264. https://doi.org/10.1038/nm0217-264d

[7] Yassin, W., Nakatani, H., Zhu, Y., Kojima, M., Owada, K., Kuwabara, H., Gonoi, W., Aoki, Y., Takao, H., Natsubori, T., Iwashiro, N., Kasai, K., Kano, Y., Abe, O., Yamasue, H., Koike, S. (2020). Machine-learning classification using neuroimaging data in schizophrenia, autism, ultra-high risk and first-episode psychosis. Translational Psychiatry, 10(1): 278. https://doi.org/10.1038/s41398-020-00965-5

[8] Allsopp, K., Read, J., Corcoran, R., Kinderman, P. (2019). Heterogeneity in psychiatric diagnostic classification. Psychiatry Research, 279: 15-22. https://doi.org/10.1016/j.psychres.2019.07.005

[9] Hasanzadeh, F., Mohebbi, M., Rostami, R. (2019). Prediction of rTMS treatment response in major depressive disorder using machine learning techniques and nonlinear features of EEG signal. Journal of Affective Disorders, 256: 132-142. https://doi.org/10.1016/j.jad.2019.05.070

[10] Khodayari-Rostamabad, A., Reilly, J.P., Hasey, G.M., de Bruin, H., MacCrimmon, D.J. (2013). A machine learning approach using EEG data to predict response to SSRI treatment for major depressive disorder. Clinical Neurophysiology, 124(10): 1975-1985. https://doi.org/10.1016/j.clinph.2013.04.010 

[11] Chang, B., Choi, Y., Jeon, M., Lee, J., Han, K.M., Kim, A., Ham, B.J., Kang, J. (2019). ARPNet: Antidepressant response prediction network for major depressive disorder. Genes, 10(11): 907. https://doi.org/10.3390/genes10110907

[12] Dick, S. (2019) Artificial Intelligence. Harvard Data Science Review, 1: 1-9. https://doi.org/10.1162/99608f92.92fe150c

[13] Garnelo, M., Shanahan, M. (2019). Reconciling deep learning with symbolic artificial intelligence: Representing objects and relations. Current Opinion in Behavioral Sciences, 29: 17-23. https://doi.org/10.1016/j.cobeha.2018.12.010

[14] Rosenblatt, F. (1958). The perceptron: A probabilistic model for information storage and organization in the brain. Psychological Review, 65(6): 386-408. https://psycnet.apa.org/doi/10.1037/h0042519

[15] Schmidhuber, J. (2015). Deep learning in neural networks: An overview. Neural Networks, 61: 85-117. https://doi.org/10.1016/j.neunet.2014.09.003

[16] Zhang, W.J., Yang, G., Lin, Y., Ji, C., Gupta. M.M. (2018). On definition of deep learning. 2018 World Automation Congress (WAC), Stevenson, WA, USA, pp. 1-5. https://doi.org/10.23919/WAC.2018.8430387

[17] Sheu, Y.H. (2020). Illuminating the black box: Interpreting deep neural network models for psychiatric research. Frontiers in Psychiatry, 11: 551299. https://doi.org/10.3389/fpsyt.2020.551299

[18] Mao, K., Wu, Y., Chen, J. (2023). A systematic review on automated clinical depression diagnosis. NPJ Mental Health Research, 2: 20. https://doi.org/10.1038/s44184-023-00040-z

[19] Ahmed, A., Aziz, S., Toro, C.T., Alzubaidi, M., Irshaidat, S., Serhan, H.A., Abd-alrazaq, A.A., Househ, M. (2021). Machine learning models to detect anxiety and depression through social media: A scoping review. Computer Methods and Programs in Biomedicine Update, 2: 100066. https://doi.org/10.1016/j.cmpbup.2022.100066

[20] Ruiz de Miras, J., Ibáñez-Molina, A., Soriano, M., Iglesias-Parro, S. (2022). Schizophrenia classification using machine learning on resting state EEG signal. Biomedical Signal Processing and Control, 79: 104233. https://doi.org/10.1016/j.bspc.2022.104233

[21] Ebrahimi, A., Luo, S., Initiative, D.N. (2021). Convolutional neural networks for Alzheimer’s disease detection on MRI images. Journal of Medical Imaging, 8(2): 024503. https://doi.org/10.1117/1.JMI.8.2.024503

[22] Shamshirband, S., Fathi, M., Dehzangi, A., Chronopoulos, A.T., Alinejad-Rokny, H. (2020). A review on deep learning approaches in healthcare systems: Taxonomies, challenges, and open issues. Journal of Biomedical Informatics, 113: 103627. https://doi.org/10.1016/j.jbi.2020.103627

[23] Iyortsuu, N.K., Kim, S.H., Jhon, M., Yang, H. J., Pant, S. (2023). A review of machine learning and deep learning approaches on mental health diagnosis. Healthcare, 11(3): 285. https://doi.org/10.3390/healthcare11030285 

[24] Zhang, L., Wang, M., Liu, M., Zhang, D. (2020). A survey on deep learning for neuroimaging-based brain disorder analysis. Frontiers in Neuroscience, 14: 779. https://doi.org/10.3389/fnins.2020.00779

[25] Gomes, N., Pato, M., Lourenco, A.R., Datia, N. (2023). A survey on wearable sensors for mental health monitoring. Sensors, 23(3): 1330. https://doi.org/10.3390/s23031330

[26] Naslund, J.A., Bondre, A., Torous, J., Aschbrenner, K.A. (2020). Social media and mental health: benefits, risks, and opportunities for research and practice. Journal of Technology in Behavioral Science, 5: 245-257. https://doi.org/10.1007/s41347-020-00134-x

[27] Stahlschmidt, S.R., Ulfenborg, B., Synnergren, J. (2022). Multimodal deep learning for biomedical data fusion: A review. Briefings in Bioinformatics, 23(2): bbab569. https://doi.org/10.1093/bib/bbab569

[28] Bandelow, B., Michaelis, S. (2015). Epidemiology of anxiety disorders in the 21st century. Dialogues in Clinical Neuroscience, 17(3): 327-335. https://doi.org/10.31887/DCNS.2015.17.3/bbandelow

[29] Chekroud, A.M., Zotti, R.J., Shehzad, Z., Gueorguieva, R., Johnson, M.K., Trivedi, M.H., Corlett, P.R. (2016). Cross-trial prediction of treatment outcome in depression: a machine learning approach. The Lancet Psychiatry, 3(3): 243-250. https://doi.org/10.1016/S2215-0366(15)00471-X

[30] Ahmed, A., Sultana, R., Ullas, M.T.R., Begom, M., Rahi, M.M.I., Alam, M.A. (2020). A machine learning approach to detect depression and anxiety using supervised learning. In 2020 IEEE Asia-Pacific Conference on Computer Science and Data Engineering (CSDE) Gold Coast, Australia, pp. 1-6. https://doi.org/10.1109/CSDE50874.2020.9411642

[31] Sau, A., Bhakta, I. (2017). Predicting anxiety and depression in elderly patients using machine learning technology. Healthcare Technology Letters, 4(6): 238-243. https://doi.org/10.1049/htl.2016.0096

[32] Sau, A., Bhakta, I. (2019). Screening of anxiety and depression among seafarers using machine learning technology. Informatics in Medicine Unlocked, 16: 100228. https://doi.org/10.1016/j.imu.2019.100228

[33] Leightley, D., Williamson, V., Darby, J., Fear, N.T. (2019). Identifying probable post-traumatic stress disorder: Applying supervised machine learning to data from a UK military cohort. Journal of Mental Health, 28(1): 34-41. https://doi.org/10.1080/09638237.2018.1521946

[34] Papini, S., Pisner, D., Shumake, J., Powers, M.B., Beevers, C.G., Rainey, E.E., Smits, J.A.J., Warren, A.M. (2018). Ensemble machine learning prediction of posttraumatic stress disorder screening status after emergency room hospitalization. Journal of Anxiety Disorders, 60: 35-42. https://doi.org/10.1016/j.janxdis.2018.10.004

[35] Marmar, C.R., Brown, A.D., Qian, M., Laska, E., Siegel, C., Li, M., Abu-Amara, D., Tsiartas, A., Richey, C., Smith, J., Knoth, B., Vergyri, D. (2019). Speech‐Based markers for posttraumatic stress disorder in US veterans. Depression and Anxiety, 36(7): 607-616. https://doi.org/10.1002/da.22890

[36] Salminen, L.E., Morey, R.A., Riedel, B.C., Jahanshad, N., Dennis, E.L., Thompson, P.M. (2019). Adaptive identification of cortical and subcortical imaging markers of early life stress and posttraumatic stress disorder. Journal of Neuroimaging, 29(3): 335-343. https://doi.org/10.1111/jon.12600