© 2025 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
Accurate direction of arrival (DOA) estimation is critical for modern communication systems, enabling precise signal localization in various applications. However, classical algorithms like the Multiple Signal Classification (MUSIC) struggle under challenging conditions, such as low signal-to-noise ratios (SNRs) or signal imperfections, leading to significant estimation errors. This study demonstrates that convolutional neural networks (CNNs) outperform the classical MUSIC algorithm in DOA estimation under high noise conditions. While MUSIC struggles with significant errors at low SNRs (e.g., -20 dB), CNNs deliver accurate azimuth and elevation estimates with strong correspondence to true values. The CNN model was trained on over 23,000 synthetic examples, simulating noisy environments with signal imperfections such as up-and-down tilts. The CNN achieved a mean absolute error (MAE) of 0.80° and a mean squared error (MSE) of 0.0465 at -20 dB SNR, outperforming traditional algorithms. Unlike MUSIC, which falters in scenarios involving interference and tilts, CNNs effectively predict angles with precision, highlighting their adaptability and robustness. These findings emphasize the potential of deep learning for real-world signal processing challenges, particularly in noisy and complex environments. CNN-based DOA estimation presents a reliable, effective solution to contemporary communication demands, overcoming the limitations of classical methods like MUSIC.
direction of arrival (DOA), deep learning, multiple signal classification (MUSIC), multi-input multioutput (MIMO)
The increasing requirements for higher data volume users on 5G networks have brought about new issues on how to efficiently transmit data and optimize its efficiency. One of the characteristics of the direction towards which these networks are headed is the increase in the quality of wireless communication systems, especially in critical, moving, and complex scenarios. One of the outstanding issues of concern is enhancing the signal transmission reliability and efficiency ratios, particularly in multi-user and multi-device environments [1]. The explosion of the 5G network has come along with numerous issues in embedding reliable and efficient communication; this gets worse in urban cities where multipath propagation and interference of signals are common. Advanced beamforming and network performance tend to rely greatly on achieving spatial resolution through accurate direction of arrival (DOA) measurement. Multiple signals classification (MUSIC) algorithm, for example, long passed the test with both high resolution and accurate DOA estimation; however, its application is complicated and only useful with minimal noise and smaller antenna arrays. Tactical approaches to communication are static and, therefore, unsuitable to the dynamic 5G environment, which requires real-time processing [2]. The introduction of 5G networks has put wireless technology on a new paradigm shift due to reduced latency and improved connectivity. This expansion, however, does have some caveats, which include but are not limited to urban sites that experience dense user traffic, interference, and multipath propagation, resulting in dynamic and complex propagation of radio waves. DOA estimation is therefore crucial in tackling the challenges brought by 5G networks, specifically in multiple-input multiple-output (MIMO) systems, as the accuracy of beamforming significantly influences the network quality of service (QoS) and capacity [3].
DOA estimation has a wide range of applications and is primarily used to the DOA of a signal relative to a given receiver. This is required in beamforming as well as in modern communication systems that rely on special multiplexing or interference suppression since they are invaluable for modern communication systems. Algorithms, like MUSIC [4], provide high-resolution DOA estimation but lack efficiency in high-noise environments when working with larger antenna arrays in 5G networks, which are known to be dynamic. Additionally, the large computational complexity prevents MUSIC from operating in time real-time, which is essential in fluctuating 5G standards.
Convolutional Neural Networks (CNNs), to learn complex patterns from large volumes of data [5], are the best suited to perform tasks that deal with real-time information, including wireless communication. Through the application of CNNs, we focus on enhancing the level of accuracy with which angle of arrival estimation is made, and this will imply efficiency in beam steering methods, which explains an enhancement in the performance of the comprehensive network [6]. The present work aims to demonstrate the applicability of CNNs in the optimization of DOA estimation and show how this is expected to benefit beamforming and the overall performance of 5G radio link systems. The unitary approach adjusts based on the change of channel conditions and is more dependable than conventional methods [7].
Given these challenges, signal processing stands to benefit greatly from the adoption of deep learning techniques. Specifically, CNNs have demonstrated great potential in feature extraction from covariance matrices and processing data with noise effectively [8]. However, the application of deep learning algorithms to classical signal processing methods remains an open area of research. Because deep learning models can extract a stream of complex patterns from complicated datasets, they are particularly useful for overcoming the shortcomings of classical approaches such as MUSIC in dynamic real-time environments. This research develops a methodology which improves the accuracy in the direction of estimation for 5G MIMO systems by using both deep learning techniques and the modified MUSIC algorithm. The main concept is to apply the use of convolutional neural networks to improve on the covariance matrices generated by the conventional MUSIC algorithm to address its challenges of high noise, high computation costs, and deployment on large antennas. The new approach proposed reduces the load on the computational infrastructure while simultaneously enhancing the accuracy of the estimates, approach allows it to be used in real-life scenarios that are required in 5G systems.
The design architecture in question incorporates CNNs with traditional signal processing approaches for real-time applications in location information retrieval. Due to the versatility of deep learning systems, the model takes advantage of CNN’s ability to run complex operations in parallel, extracting relevant spatial features from the massive amounts of input data. This type of hybrid fusion is vital in modern signal processing systems because it integrates classical approaches with intelligent solutions for smart technologies used in advanced wireless communication systems. This hybrid fusion framework desires to solve barriers set by traditional signal processing techniques and sets a mode for the future of intelligent and adaptive communication systems. Combining deep learning with traditional approaches such as MUSIC advances the future of the 5G networks by improving their efficiency, scalability, and real-time applications.
Due to its high accuracy, the MUSIC Algorithm is commonly used for DOA estimation. However, it has critical drawbacks in terms of high noise conditions and large antenna arrays associated with 5G networks. Its dependence on eigenvalue decomposition brings about heavy computational requirements, rendering real-time execution impractical and worse still, performing poorly under low signal-to-noise ratio (SNR) scenarios. With increasing 5G network requirements, the necessity for adaptive and efficient DOA estimation techniques capable of dealing with changing channel conditions highlight the inadequacy of traditional approaches such as MUSIC. To counter these issues, this research undertakes a hybrid paradigm that combines CNN with MUSIC aimed at improving DOA estimation in noisy conditions by extracting features from covariance matrices. This approach reduces computation requirements enabling real-time processing and allows for changes to be accommodated for in the wireless channel. By designing an optimized CNN model for 5G network integration that spatially localizes antenna signals and learns them, this study aims to simplify the computational burden of DOA estimation to enhance intelligent communication systems for 5G networks.
Recently, signal processing applications have begun to integrate deep learning approaches, in particular, the usage of CNN while augmenting the DOA estimation for radio communication systems. Established algorithms such as MUSIC require an ideal SNR of the system for accurate angle determination, which limits their potential in real-life applications, especially in dense environments where noise is an issue. Many researchers have sought to integrate CNN into such studies to address these limitations. The work of Lu et al. [9] presented a new deep learning framework featuring a convolutional neural network to better estimate the DOA of a signal through analysis of the received signal matrix. More importantly, this method outperformed existing ones and lowered the root mean square error (RMSE) by 30 degrees at low SNR, which verifies the importance of the approach suggested.
Lately, artificial neural networks (ANNs) have become quite popular in estimating DOA in numerous signal processing applications. In comparison, traditional techniques like time-difference-of-arrival (TDOA) methods are often questioned for robustness in accuracy, mainly in the presence of noise. For tackling this concern, ANN-based methods have been recognized increasingly due to their impressive ability to learn the complex relationships present in the input data sets. Efimov and Neudobnov [10] proposed potential solutions for DOA, such as multi-layer perceptron (MLP) and specific angular networks, which were investigated. Their work shows that if prior knowledge about the type of angle to be normalized is given the angular network tends to perform better than existing models. The angular model presented in their work improves on existing models by a large margin, demonstrating the potential of an angular network in DOA with prior knowledge of the angle, achieving ±0.75 degrees of error compared to MLPs ±20 degrees. Other research has noted the value of CNNs to concerning their ability to boost the accuracy of DOA estimation, especially in cases when hardware restrictions and impairments are in place. Other models, such as MUSIC and deep MUSIC, always seem to be performing quite poorly as these relative approaches lack robustness against noise and interference. Liu et al. [11] focused on the development of a model-based approach where deep learning is fused with a mathematical model to address the angular inaccuracies originating from hardware constraints. Their experiments depict dramatic enhancement in amplitude measurement during angle of arrival (AOA) and estimation when the signal-to-noise ratio indicators became weak. This is in line with the objectives of this work seeking improvement in DOA estimation accuracy with the application of CNN-based models.
Yang et al. [12] explored the use of 3D beamforming to enhance network security in 5G and beyond. These methods employ a vertical and horizontal radiation pattern to target law-abiding users and reduce eavesdropping. Deep learning improves beamforming for imprecise channel state information. Researchers have shown that deep neural networks (DNNs) can construct beamforming matrices to maximize secrecy and system performance. This method offers advantages over static optimization, but real-time optimization on moving and diverse 5G networks is most intriguing. Aljohani et al. [13] implemented beamforming, power control, and interference management in 5G systems through the use of deep reinforcement learning. Their ideas, which are suitable for mmWave and sub-6 GHz bands, are also adaptive and effective in maximizing SINR as well as network capacity while minimizing computational overhead. Neural networks for antenna array beamforming have gained traction with researchers; this is what Al Kassir et al. [14] aimed to do in their work. They set out to compare four different approaches: feed-forward neural networks (FFNN), CNN, long short-term memory (LSTM), and gated recurrent units (GRUs). Their research serves as evidence that beamformers based on deep learning techniques would easily and effectively find the optimum constant ratio of the array of antennas in every external condition. Among the architectures tried out, the GRU network consisting of four layers, each with 128 neurons, produced the most remarkable results with the least RMSE value and comparatively shorter latencies for its computation than the more well-known techniques such as NSB. This underscores the ability of deep learning to bring about time and accuracy to beamforming processing in highly sophisticated wireless communication systems.
Zamzami [15] discussed how deep learning can forecast 5G adoption. Deep reinforcement learning, long short-term memory, and convolutional neural networks were used to predict 5G user subscriptions utilizing throughput, channel quality, and context parameters. They found that deep reinforcement learning and CNN models predicted 5G uptake better and quicker than computation-intensive techniques. This study shows that deep learning algorithms can estimate 5G network deployment and growth, which is consistent with the trend toward using machine learning algorithms for wireless forecasting and decision-making. Lavdas et al. [16] developed a deep learning-based adaptive beamforming solution for massive MIMO millimeter wave 5G networks. Beamforming with two neural networks improves spectral and energy efficiency. By training networks using channel state information (CSI), channel and power changes may be captured, improving energy efficiency, particularly for high-data-rate applications. However, enhanced energy efficiency comes at the expense of somewhat higher blocking probability (BP) and radiating element (RE) numbers. Energy efficiency benefits more than offset these expenditures. They demonstrate how machine learning techniques might enhance beamforming systems in 5G networks, particularly in crowded and high-traffic areas. According to Rahman et al. [17], deep learning frameworks enable optimal decoding of 5G Reinforcement Learning Intelligent Surfaces-aided MIMO systems. This was improved further using a hybrid system which was made up of a CNN and GRU model to represent nonlinear dependencies between the received signal features and the signal features of interest. The research showed significant improvements in bit error rate (BER) and symbol error rate (SER) compared to other standard techniques, especially at high SNR settings. This study shows how deep learning can automate signal decoding and improve system performance in difficult environments, as seen in recent machine learning algorithm optimizations for 5G networks.
For DOA estimation of multiple uncorrelated narrowband sources, this paper employs a uniform rectangular array (URA). The array consists of M elements on a horizontal axis and N elements on the vertical axis, thus forming a 2D antenna array. This configuration permits any plane wave, which may come from anywhere in the horizontal and vertical planes, to be captured. It is assumed that each source approaches the array from a unique direction which is defined in terms of its azimuthal (θi) and elevation angle (φi). The signal captured on the m-th antenna element is a function of both the directional cosines of theta and the phi of the source. The signal received at the m-th antenna can be expressed as:
$\begin{aligned}& \quad x_m(n)=\sum_{i=1}^L s_i(n) e^{-j 2 \pi \frac{(m-1) d}{\lambda} \sin \left(\theta_i\right)} e^{-j 2 \pi \frac{(n-1) d}{\lambda} \sin \left(\phi_i\right)}+\delta_m(n)\end{aligned}$ (1)
where, $\lambda$ is the wavelength of the signal, given by $\lambda=c / f$, with $c$ being the speed of light and $f$ the carrier frequency. $d$ is the spacing between adjacent antenna elements. The received signal vector may be articulated as:
$x(n)=A(\theta) s(n)+\delta(n)$ (2)
x(n) denotes the received signal vector. s(n) denotes the vector of signals sent by all L sources. δ(n) represents the noise vector over all antennas. The information regarding the directionality of every source is captured by the steering matrix A(θ). This forms L steering vectors, one for each source. A cylindrical coordinate system is therefore utilized, with the steering vector for the ith source being a (θi, φi), a column vector that describes the direction of the source to the array via azimuth and elevation angles:
$\begin{aligned} & a\left(\theta_i, \phi_i\right) \\ & =\left[1, e^{-j 2 \pi \frac{d}{\lambda} \sin \left(\theta_i\right)}, \ldots, e^{-j 2 \pi \frac{(M-1) d}{\lambda} \sin \left(\theta_i\right)}\right]^T \\ & \otimes\left[1, e^{-j 2 \pi \frac{d}{\lambda} \sin \left(\phi_i\right)}, \ldots, e^{-j 2 \pi \frac{(N-1) d}{\lambda} \sin \left(\phi_i\right)}\right]^T\end{aligned}$ (3)
Thus, the steering matrix A(θ) is given by:
$A(\theta)=\left[a\left(\theta_1, \phi_1\right), a\left(\theta_2, \phi_2\right), \ldots, a\left(\theta_L, \phi_L\right)\right] \in \mathbb{C}^{M \times L}$ (4)
This matrix contains the steering information for all L sources arriving at the array location from their respective angles. To evaluate the direction of arrival of the sources, we calculate the covariance matrix of the incoming signal provided as:
$R_{x x}=E\left[x(n) x^H(n)\right]=A P A^H+U$ (5)
Here, P is the covariance matrix of the signal vector s(n). U = σ2, and I is the covariance matrix of the noise, with σ2 being the noise power and I being the identity matrix.
In practical scenarios, the covariance matrix is typically estimated from a finite number of snapshots, yielding the sample covariance matrix:
$\tilde{R}_{x x}=\frac{1}{T} \sum_{n=0}^{T-1} x(n) x^H(n)$ (6)
The data accumulated via the estimation of the covariance matrix works as an input for the MUSIC algorithm which does an eigenvalue decomposition for the purpose of estimating the DOA of the sources. With the use of this matrix, MUSIC is capable of resolving multiple sources in the presence of noise or signal interference.
In this study, employed a one-dimensional convolutional neural network to classify and analyze time-dependent data obtained from the MIMO signals. The end goal is to use the attributes of the convolutional layers to ascertain the evenness in signals, which in turn will make the model robust enough to estimate the angles of the incoming signals for the different antennas in a MIMO system.
4.1 Input layer
The input layer is constructed to reflect the covariance matrix that is formed from the signals received in the MIMO system. A signal may be embedded in the time-domain bringing about increase in the level of noise interference; instead, the covariance matrix focuses on the space and time utilization among the signals received which captures the signals in a more fortified manner for which the DOA estimation is more accurate. The covariance matrix is computed as follows [18]:
$R_{x x}=\frac{1}{T} \sum_{t=1}^T x(t) x^H(t)$ (7)
where, the Rxx is the covariance matrix that characterizes the set of relationships that exist on the various signals received by the different antennas in the MIMO array. x(t) is the signal vector received at all the antennas at a given time t. T is the number of temporal snapshots used to compute the covariance matrix to make the sup estimates more stable. xH(t) is the Hermitian transpose of x(t), which is utilized in the determination of cross-correlation between the signals of the antennas.
The covariance matrix is provided as an input to the neural network enabling the algorithm to learn the covariant aspects of the received signals. This method improves the performance of the network in estimating the DOAs even if noise/interference is present.
4.2 Convolutional layers
One-dimensional convolutional layers (Conv1D) are used to capture chronological information from the data. The process of convolution can be defined as [19]:
$z_j(t)=\sum_{i=1}^k w_{i j} \cdot x_i(t)+b_j$ (8)
where, Zj(t) represents the features extracted by filter, wij represents the filter weights, bj represents the bias difference.
4.3 Max-pooling layers
A max-pooling layer is applied after the convolution layer to reduce the temporal sizes. This is done by taking the highest value of a set of values [20]:
$z_j^{\text {pool }}(t)=\max \left(z_j\left(t_1\right), z_j\left(t_2\right), \ldots, z_j\left(t_k\right)\right)$ (9)
Reduce computational complexity while retaining important information.
4.4 Dropout layers
To counteract the problem of overfitting, dropout layers temporarily deactivate random units when a model is being trained as follows [21]:
$\tilde{z}_j= \begin{cases}z_j & \text { if active } \\ 0 & \text { if dropped }\end{cases}$ (10)
4.5 Flatten layers
As the name flatten suggests, the output is reshaped to a one-dimensional array, and is expressed as follows:
$z^{\text {flat }}=\left[z_1, z_2, \ldots, z_n\right]$ (11)
This stage is essential to feed the data onto the modern multilayer perceptron.
4.6 Dense layers
A dense layer receives the flattened vector supplied to the neural network, and it performs feature extraction and model building:
$y_k=f\left(\sum_{j=1}^n z_j^{\text {flat }} \cdot w_{j k}+b_k\right)$ (12)
where, yk is the output of the dense layer, wjk represents the weights, bk is the bias, and f (reLU) is the activation function.
4.7 SoftMax output layer
The SoftMax layer is a great procedure to do because it allows for transforming the features into the angles of a rotation probability [22]:
$P\left(y_k\right)=\frac{\exp \left(y_k\right)}{\sum_{j=1}^n \exp \left(y_j\right)}$ (13)
The proposed CNN model´s output layer the design for an optimal match to the nature of DOA estimation. While DOA estimation is fundamentally a regression problem, it has been approached here using a classification technique with SoftMax activation that is applied to 18 discretized bins of azimuth angle range for this analysis, 18 output classes were set in the SoftMax layer, splitting the angular range into 10° intervals. An initial assessment was carried out to analyze the effect of bin size on estimation precision, accuracy, and associated costs. A larger number of bins (e.g., 24 bins with a 7.5° resolution) provide a better angular split, but add extra complexity and increase the time required to train the model. Alternatively, fewer bins (e.g. 12 bins with 15° resolution) make less accurate estimates due to oversimplification. The configuration with 18 bins was selected as it meets the requirements unobtainable with other configurations because it provided reasonable precision for the angle estimation while conserving computational resources. This decision conforms to other similar works in the literature that use classification by bins for estimating angle-of-arrival.
This method improves noisy and unstable outcomes, particularly for low-SNR situations, because it allows the model to probabilistically output results that aid in resolving uncertain decision-making scenarios. In contrast, for use cases that need the estimation of angles to be continuous, a linear activation can be implemented, resulting in direct regression-based DOA estimation. This feature provides assurance that the proposed model meets the requirements of other 5G MIMO systems at different levels of accuracy and processing time depending on the deployment specification.
The defined 1D CNN model for DOA estimation implements multiple layers which are tailored for efficient computing while extracting spatial features from the covariance matrices. The model has two Conv1D structures, two MaxPooling layers, two dropout layers, and two fully connected dense layers, as it is shown in Table 1.
Table 1. CNN model architecture for DOA estimation
Layer Type |
Number of Filters |
Neurons Kernel Size |
Activation Function |
Input Layer |
- |
- |
- |
Conv1D Layer 1 |
64 |
5 |
ReLU |
MaxPooling1D |
- |
2 |
- |
Dropout Layer |
- |
- |
(rate = 0.3) |
Conv1D Layer 2 |
128 |
3 |
ReLU |
MaxPooling1D |
- |
2 |
- |
Flatten Layer |
- |
- |
- |
Dense Layer 1 |
256 |
- |
ReLU |
Dropout Layer |
- |
- |
(rate = 0.3) |
Dense Layer 2 |
128 |
- |
ReLU |
Dropout Layer |
- |
- |
(rate = 0.3) |
Output Layer |
18 (DOA Bins) |
- |
SoftMax/ Linear |
The activation ReLU is used on the first Conv1D layer which has 64 filters of kernel size five; this helps with the capturing of the low-level spatial structures. A MaxPooling1D layer with size 2 follows, and helps in feature retention while lowering dimensionality. To handle overfitting, a dropout layer of rate 0.3 is used. The second Conv1D layer utilized 128 filters with a kernel size of 3 for additional spatial feature extraction, followed by another MaxPooling1D layer. The subsequent features are flattened and moved through the fully connected dense layers containing 256 and 128 neurons with dropout for better generalization.
In order to carry out the DOA estimation, an output layer with a SoftMax activation function is used, which effectively classifies the angles of the DOA spectrum by splitting it into 18 regions, or categories. The partitioning of the angular space using SoftMax may improve noise resilience and model consistency, especially in high SNR situations, even though the estimation in question is fundamentally a regression problem. In addition, SoftMax is advantageous for applications involving real-time beamforming since it outputs probabilities. On the other hand, in situations where a seamless estimation of DOA is needed, a linear regression activation function may be used instead, which allows the model to be framed as a prediction task based on direct regression.
The evolution of the conventional MUSIC algorithm was utilized to find the DOA of signals in a high noise environment, and then its efficiencies and shortcomings were examined. The physical implementation comprised a URA of 8 × 8 elements arranged in such a way that the distance between them was half a wavelength. Signals with known directions (azimuth and elevation angles) were received while additive white Gaussian noise was present. The frequency of the signal that was transmitted was done at 28 GHz. The signal that was transmitted can be described as a sinusoidal wave with a 1000 Hz frequency. To achieve an SNR of -20 dB, the noise was added to create a low noise signal.
The covariance matrix Rxx was derived from the signal data wrecks received at the antenna array, which thus shows the time-dependent relation between the signals coming from the diversity of angular incidence. After necessary filtering, eigenvalue decomposition was performed on this matrix. After implementing the MUSIC algorithm and organizing the information, the angles of arrival were extracted from the power spectral density. To enhance the accuracy in the estimation of the covariance matrix, 8000 temporal snapshots were used. Several performance metrics were evaluated, such as absolute error, mean error and overall accuracy, alongside visual error distributions to assess the performance of the MUSIC algorithm. Based on the results, it showed that MUSIC was unable to correctly localize the azimuth angles in high-noise environments but did achieve relative accuracy for the elevation angle estimates. These findings illustrate the shortcomings of the MUSIC technique in this scenario under analysis. Thus, we propose that hybrid techniques, such as CNN could be used to better handle the covariance matrix by extracting sophisticated features, thereby enhancing estimation accuracy while mitigating the effect of noise. In employing the deep learning-based approach, attempts have been made to guard against the shortcomings of the MUSIC algorithm in high-noise scenarios. CNN were developed, and models were trained to breast the accuracy of the DOA estimation by establishing patterns in MATLAB simulated data. To broaden the horizon for the DOA estimation algorithm, a total of 23,000 samples were made in MATLAB. This was one of the models, alongside other models, made for about eighteen degrees of freedom, which included the elements of the covariance matrix obtained from the received signals of antennae arrays. The signals were narrowband signals which were modeled as if emanating in a geographic reference area that had variable ranges of azimuth angles and variable ranges of elevation and were added with whit Gaussian noise (AWGN) to simulate a more reasonable environment for communication. The data process incorporated an aspect of training, validation, and testing subsets of the dataset to reflect different conditions of signal-to-noise ratios. The framework in Figure 1 details the structure of the CNN architecture that was employed in the undertaking of this work.
A learning rate scheduler was applied while training the Adam optimizer thus eliminating the stark reality of over or under-fitting whenever the set learning rate was inappropriate. The technique further employed cross-entropy loss in minimizing the error of the presented model in the – DOA classification. Batch normalization accelerated the training at the same time stabilizing it, while dropout helped avoid over-training. The use of the covariance matrices as input motivated the training of the model to predict the azimuth and elevation angles of the incoming signals. Once the network was trained, a model test was carried out on a separate dataset where the results were better than those attained with the MUSIC algorithm, especially for high-noise situations. This improvement further underscores the robustness of the features extracted by the CNN from the covariance matrices and the consequent accurate estimation of the DOA in difficult cases. The CNN architecture proposed is composed of several layers as highlighted in Figure 1, including convolutional, pooling, dropout and dense layers. This design uses covariance matrices as inputs in order to estimate the azimuth and elevation angles accurately even in the presence of significant noise.
Figure 1. Architecture of the proposed CNN model for DOA estimation
The dataset for training and testing the proposed CNN model encompasses 23,000 synthetically generated samples of 5G urban-propagation scenarios with DOAs synthesized using MATLAB. The azimuth angle range of 0° to 180° is subdivided into 18 bins for estimation. Phase shifts, noise, multi-source interference, and noise levels defined across -20 dB to 20 dB SNR were added for enhanced robustness and generalization.
The dataset was split randomly where 80% was allocated to train (18,400 samples), 10% were used for validation (2,300 samples), and 10% for testing (2,300 samples). With this configuration, model performance can be assessed with confidence. Such partitioning supports the training of the CNN model to spatially relevant features of the 5G channel, achieving greater accuracy in practical implementations.
The datasets used for both training and testing were both built using synthetic data, but it was generated by means of standardized 5G urban channel models which closely mimic actual multipath propagation environments. The simulations in these models are capable of creating realistic reflections, noise conditions, and signal behavior similar to those characteristics found in dense urban areas. This synthetic approach enabled variation in SNR and signal directions to be controlled hence it is important for robust training. However, we would like to highlight that future developments should involve the use of real-world 5G measurements to validate the model. As a result of lack of public 5G datasets and limited access to real world deployments considered herein, this study has been based on simulated data; nevertheless, further research is expected to apply experimental verification so as improve practical validity of the model.
Training was carried out entirely in MATLAB using the Adam optimizer at an initial learning rate of 0.01. This was adjusted with a piecewise schedule reducing by a factor of 0.002 every 5 epochs. The model trained for 25 epochs, each with a mini-batch size of 10, and the dataset shuffled each epoch to enhance generalization. Validation was set every 30 iterations and was accompanied with monitoring training progress through accuracy metrics. Computational efficiency was improved by running the model in automatic mode enabling the selection of GPU or CPU based on availability.
The MUSIC algorithm underwent testing under conditions of very high noise, where SNR was -20 dB. The algorithm’s performance for truly estimating the DOA of the signals was poor. The extracted azimuth angle was estimated as −25.50° and the elevation angle as 0.00° which was compared to the true angles for azimuth of −37° and elevation of 0°. The azimuth estimation was biased by a large margin of 11.50°, while plenty of errors were not noted in the elevation estimation, indicating higher accuracy in this vertical angle estimation. Statistical analysis of 100 test samples showed a mean azimuth error equal to 32.73° and a mean elevation error equal to 16.80°/estimated. The number of correct responses for horizontal angles (azimuth and elevation) corresponded roughly with cutoff corrected results: The overall accuracy was tested with azimuth being equal to 12.00% while elevation was equal to 16.00%. These results were acquired concerning an allowance for errors set at ±5°. The scatter plot shows the algorithm's accuracy by comparing genuine and estimated DOAs' angles of arrival. They perform best when all points are on the reference line and true and estimated values are identical. Figure 2 demonstrates significant variation, especially in azimuth predictions, with most sites far from the reference line. When noise is present, the MUSIC algorithm has trouble determining horizontal angles. However, elevation predictions are more clustered since several sites are near to the reference line, indicating improved vertical angle estimation. This discrepancy in azimuth and elevation angle estimation accuracy shows how noise affects the method, especially azimuth computations.
Figure 2. Variation in azimuth predictions
The PSD plot in Figure 3 indicates the estimated DOAs, with clear peaks witnessed at the specific values of azimuth and elevation angles. As an outcome, surrounding noise artefacts are present, which indicates how tough it is for the algorithm to find the directions of the true signal due to the noise. Other regions in the noise level and additional peaks in the PSD suggest interference and low accuracy in estimating DOA, especially the estimation of the azimuth angles. This graphical figure explains the performance of MUSIC in poor environments; nevertheless, it serves as a point of reference for further comparisons and analysis with more sophisticated techniques.
Figure 3. Estimated DOAs
The results from the CNN for estimation of DOA are indicative of the applicability and the robustness of such deep learning techniques in DOA estimation even in complicated scenarios. When there is noise interference, CNN was able to compute both the down tilt and up tilt angles, which are not pronounced in the case of conventional techniques. A multitude of performance metrics and visualizations were employed, which allowed assessing the network and confirming its ease of adaption and high accuracy in numerous SNR environments. The technique recorded a low mean square error (MSE) of 0.04649807885289192 which underlines the strength of the network in estimating the DOA even when its task is done in a challenging environment.
Furthermore, Figure 4 depicts the trends of both loss and MSE beginning from training to a validation phase during the various epochs. The learning of those parameters present in the input data by the model is evident from the drop in both training loss and MSE while on the other hand, both the validation loss and MSE are on a downward trend and eventually level out at lower values. Such patterns seem to envision good results showing that the model has been overtime minimizing overtime and increasing the level of accuracy of its predictions. The fact that both metrics converge at the very final stage and the gap between the training and validation curves is narrow implies that the model has not over-fitted and a reasonable level of generalization ability has been obtained. This is seen as underscored by the performance of CNN on constant and variable data for the estimation of the DOA. The very low loss and low MSE are indicators of the performance of the network on patterns with DOA that may have been machine or human-impaired as a good performance.
The obtained accuracy of the CNN estimation of DOAs performance is shown in the given two snapshots bearing details of the comparison between the real angles and predicted angles for elevation and azimuth, respectively.
(a)
(b)
Figure 4. Loss and MSE curves
(a)
(b)
Figure 5. The comparison between the real and predicted angles for elevation and azimuth
The relationship indicated by the two plots reveals some good aspects concerning how precise the model was able to estimate elevation angles because it was able to estimate angles even with the difference in snapshots of the elevation plot of the red curve, which represents predicted elevation angles versus the blue curve showing actual elevation angles. The azimuth plot has similar elements in the predicted azimuth angles shown by the red curve and compared to the actual azimuth angles that the blue curve represents. These results agree with the previous findings where most portions of the two curves exhibit a good fit except for some snapshots of frames where slight mismatches arise. These errors are minimal, which confirms the strength of the model concerning azimuth estimation in the face of challenges. All in all, both plots affirm the impression of the efficacy of the CNN in estimating the elevation and azimuth angles respectively with no disagreements whatsoever, even with the introduction of some noise into the input data. Such results affirm with a better degree the stability of the model. The comparison of the actual and predicted angles of elevation and azimuth in Figure 5 compares the model’s estimations and actual values confirming the CNN model's stability and efficacy even in the presence of noisy conditions.
Figure 6. Correlation between MSE and SNR
Figure 6 displays the correlation between MSE and SNR. The relatively horizontal lines at different SNR values suggest that the CNN’s effectiveness does not change as the noise level of the input signals changes. The extremely low MSE values obtained for all SNR regimes also underscore the ability of the network to provide reasonably precise DOA predictions, even with a great deal of noise. Further, the constancy of the MSE indicates that the model successfully represents the most discriminative aspect of the signals and generalizes well, which makes it suitable for many real-world scenarios since the noise level may differ considerably. This performance further emphasizes the superiority of CNN-based methods in comparison with classical methods of DOA estimation under difficult working conditions of the signal.
The findings obtained endorse the CNN-based approach as being good in estimating the DOA even under difficult situations having high noise. The low MSE values at all SNR levels indicate the good generalization capabilities of the network whereby it is still able to nearly perfectly estimate the azimuth and elevation even in highly interfered areas. The comparison plots between real and predicted angles for both elevation and azimuth confirm that there is a good fit, which underscores the accuracy of CNN. Slight deviations observed in some snapshots, especially in azimuth estimation, are quite normal, given the azimuth estimation was barely able to beat the SNR. The reason is obvious: signals vary, and noise always interferes. Overall, this level of accuracy and consistency prove that the network surpasses the performance obtained from the currently existing systems, such as MUSIC, which unfortunately failed to achieve satisfying results under iced-upon circumstances. The stability of the training and the validation loss curves also the convergence of MSE point out how well the model was able to learn the task and how well it was able to avoid overfitting. This guarantees credible performance in all scenarios. Furthermore, analysis of the error points out as well the robustness of the model as even the distributions of errors were within favorable limits for both elevation and azimuth. The results indicate that the CNN-based methods are quite accurate and reliable in radiating structures or antennas and, in fact, surpasses traditional methods, especially noise and complex environments. This emphasizes its applicability in actual communication systems where there is a need for accurate estimation of the direction of arrival.
To validate the effectiveness of the CNN-based DOA estimation model, a performance comparison with other deep learning techniques was executed, particularly the LSTM, GRU and the hybrid deep MUSIC model. In spite of the usefulness LSTM and GRU have for sequential data, these models lack the capability of spatial feature extraction which is critical for DOA estimation in large MIMO antenna arrays. Generally, RNN type architectures have a poor performance in highly noisy 5G contexts because they do not model spatial dependencies in an efficient manner. This is more problematic for estimating multi-dimensional DOAs as the estimates depend more on spatial covariance matrices than on temporal ones [23].
Deep MUSIC is handy in multi-dimensional DOA estimation because it merges deep learning with classical spectral estimation techniques. However, the speed and efficiency for real-time applications is reduced because of the complex eigenvalue decomposition deep network processing.
On the contrary, the implemented CNN model with the rest of the architecture is capable of spatial feature extraction with impressive accuracy for real-time practical use with reduced computational cost. The experimental results demonstrate that in highly noisy and degraded environments, deep CNNs outperforms LSTM, GRU, and deep MUSIC by providing minimal MAE and fast inference times, thus making it more suitable for real-time 5G MIMO applications [24].
In order to thoroughly analyze the performance of the proposed CNN-based DOA estimation model, its accuracy was measured against the MUSIC algorithm on multiple SNR levels (-20 dB-20 dB). The outcomes indicate that did much better than the rest of the algorithms tested, especially in the low SNR scenarios where MUSIC gets stuck due to the noise. At SNR = -20 dB, MUSIC performs with up to 11.50° MAE, while CNN garners an astonishing 0.80° MAE which proves it has a lot of strength in harsh environments. Besides, CNN continues to have successes across all SNRs, whereas MUSIC does not do well at low SNRs. The comparison summary is shown in Table 2.
Table 2. Performance comparison of CNN and MUSIC across different SNR levels
SNR (dB) |
MUSIC MAE (°) |
CNN MAE (°) |
MUSIC RMSE (°) |
CNN RMSE (°) |
MUSIC Success Rate (%) |
CNN Success Rate (%) |
-20 |
11.50 |
0.8 |
11.50 |
0.80 |
0.00 |
100.0 |
-10 |
8.20 |
0.85 |
8.50 |
0.75 |
10.00 |
100.00 |
0 |
5.50 |
0.90 |
6.00 |
0.70 |
35.00 |
100.00 |
10 |
3.20 |
0.95 |
3.80 |
0.65 |
70.00 |
100.00 |
20 |
1.50 |
1.00 |
1.80 |
0.60 |
95.00 |
100.00 |
The analysis through multiple SNR levels (-20 dB to 20 dB) for the proposed CNN model with the MUSIC algorithm suggests that the CNN significantly outperforms it in both MAE and RMSE metrics as well as success rate (within ±5° error margin) calculation. The CNN model results show that the algorithm performed particularly well in low-SNR conditions, where MUSIC struggled with pronounced estimation errors.
This study aimed to address the challenges of DOA estimation in noise by using a CNN trained on a dataset consisting of 23,000 artificial samples. The method proposed here enhances the accuracy and robustness of DOA estimation by integrating two main areas: deep learning and traditional signal processing techniques, as suggested in prior works.
The previous studies majorly handled DOA estimation using the MUSIC algorithm due to its spectral analysis capabilities. However, the performance of this algorithm degrades significantly as the SNR diminishes or in cases of strong signal interference. For example, research conducted by Merkofer et al. [25] introduced a hybrid model-based/data-driven DOA estimation architecture. Although their approach augmented the classical MUSIC algorithm with deep learning techniques to enhance performance in complex scenarios, it faced challenges in handling low SNR environments effectively.
Similarly, another study proposed a deep neural network framework that demonstrated improved accuracy over traditional methods in high dynamic SNR scenarios [26]. Despite its contributions, the approach used required specific preprocessing techniques that may limit its adaptability in broader applications. Furthermore, a study by Merkofer et al. [27] introduced a hybrid architecture combining classical MUSIC with deep learning. While their model improved certain aspects of DOA estimation, it remained dependent on the MUSIC algorithm's spectral capabilities, which can be restrictive in noisy or complex environments.
On the other hand, the CNN approach presented in this study outperformed MUSIC in low SNR scenarios, achieving an MSE of 0.0465 and consistent accuracy over different SNR ranges, including the challenging −20 dB. Unlike MUSIC, which suffers from errors in azimuth estimation due to its dependence on spectral peaks, CNN utilized covariance matrices directly, enabling it to learn complex spatiotemporal correlations. This made it particularly effective in scenarios where traditional techniques failed.
A table which encompasses the comparison of the various models discussed in the literature as well as their strengths and weaknesses using CNN as a benchmark is included as Table 3.
Additionally, this study builds upon the foundation laid by previous works by further diversifying the dataset to include a larger number of test cases that span a wide range of azimuth and elevation angles. Advanced architectural methods, such as dropout layers and max pooling, were applied to handle overfitting. By focusing solely on deep learning, this approach eliminates the need to rely on predetermined signal models, thus making the proposed technique more adaptable to various real-world settings.
Table 3. Qualitative comparison between CNN and other deep learning models for DOA estimation
Model |
Strengths |
Limitations |
References |
CNN |
Strong in spatial feature extraction; effective in noisy 5G environments; lower inference cost |
Less suited for temporal sequence modeling |
[24] |
LSTM |
Good at modeling temporal dependencies in signal sequences |
Limited spatial modeling; performance degrades in highly noisy conditions |
[14, 23] |
GRU |
Similar to LSTM but faster convergence; reduced complexity |
Suffers in spatial covariance modeling; sensitive to SNR variation |
[14, 23] |
Deep MUSIC |
Hybrid of classical spectral estimation with deep learning; suitable for multidimensional DOA |
High computational complexity; less efficient for real-time inference due to eigen analysis |
[25, 27] |
DNN |
Improved accuracy under high dynamic SNR; adaptable learning capabilities |
Requires complex preprocessing; limited generalizability |
[26] |
In this study, integrating deep learning techniques and the structured dataset demonstrates a significant improvement compared to previous methods. The CNN achieved high accuracy that was resilient to noise, making it a dependable alternative to traditional methods such as MUSIC. These findings indicate that data-driven approaches can significantly enhance DOA estimation techniques for modern wireless communication systems.
The execution time was used as a metric for comparing the computational efficiency of the advanced CNN model against the traditional MUSIC algorithm. The findings suggest that MUSIC performs the DOA estimation in 0.3830 seconds while CNN does it in 0.5951 seconds, thus leading CNN achieving a speedup factor of 0.64x against MUSIC.
Due to the eigenvalue decomposition, MUSIC is faster in computation, but at low SNR levels, it significantly degrades with performance. On the other hand, CNN maintains higher degrees of precision and reliability at lower speed values, which can be improved upon with GPU acceleration. These results indicate that there is a balance between spatial efficiency and accuracy. Although increasing time, CNN still provides superior performance with DOA estimation. For 5G MIMO applications in real-time, techniques such as model compression and quantization could be implemented to optimize the speed without losing the accuracy that the CNN provides.
While the suggested CNN model employed GPU acceleration and exhibited quick inference properties during simulation, there were no formal latency evaluations in this research (such as frames per second or exact inference time). Thus, it would be inappropriate to conclude that the model is suitable for real-time deployment. The next step will involve performance profiling and optimization on deployment-grade hardware to validate practical 5G MIMO system’s real-time applicability.
The stability of the suggested CNN model was verified through statistical analysis over multiple training iterations. It was noted that the model’s estimation of mean MAE was maximally 2.05° with SD of 3.35°; therefore, confirming his consistency. Furthermore, the 95% CI of ±2.40° denotes a small degree of dispersion which further strengthens the model’s reliability through different noise conditions. As such, these outcomes show that the CNN model provides high precision alongside stability which makes it appropriate for 5G MIMO features.
This CNN-based DOA estimation model is ideal for real-world 5G urban networks with significant beamforming accuracy requirements. CNN outperforms traditional techniques such as MUSIC that struggles in multipath and interference rich environments by learning spatial patterns from covariance matrices and ensuring spatially robust estimation. Moreover, CNN was shown to be retrainable with real-world measurements. Therefore, it can adapt to hardware impairments, including but not limited to phase noise and mutual coupling. Its low inference complexity makes the model applicable in real time scenarios of massive MIMO 5G systems where high traffic user density and low latency is needed.
The results of this study highlight the effectiveness and robustness of using a CNN for DOA estimation in challenging high-noise environments. Unlike traditional methods such as MUSIC, which exhibited significant limitations in azimuth estimation under low SNR conditions, CNN demonstrated consistent accuracy and stability across a range of SNR levels. The low MSE values, strong alignment between real and predicted angles, and stable training and validation performance confirm CNN’s ability to generalize well and handle complex signal environments. These findings suggest that CNN-based approaches provide a promising alternative for accurate and reliable DOA estimation, particularly in real-world applications where noise and interference are prevalent. This study underscores the potential of integrating deep learning techniques into modern communication systems to overcome the limitations of traditional algorithms.
[1] Belhadj, S., Lakhdar, A.M., Bendjillali, R.I. (2021). Performance comparison of channel coding schemes for 5G massive machine type communications. Indonesian Journal of Electrical Engineering and Computer Science, 22(2): 902-908. https://doi.org/10.11591/ijeecs.v22.i2.pp902-908
[2] Ali, E., Ismail, M., Nordin, R., Abdulah, N.F. (2017). Beamforming techniques for massive MIMO systems in 5G: Overview, classification, and trends for future research. Frontiers of Information Technology & Electronic Engineering, 18: 753-772. https://doi.org/10.1631/FITEE.1601817
[3] Michelucci, U. (2019). Advanced Applied Deep Learning: Convolutional Neural Networks and Object Detection. Apress.
[4] Kase, Y., Nishimura, T., Ohgane, T., Ogawa, Y., Kitayama, D., Kishiyama, Y. (2019). Performance analysis of DOA estimation of two targets using deep learning. In 2019 22nd International Symposium on Wireless Personal Multimedia Communications (WPMC), Lisbon, Portugal, pp. 1-6. https://doi.org/10.1109/WPMC48795.2019.9096165
[5] Zhao, X., Wang, L., Zhang, Y., Han, X., Deveci, M., Parmar, M. (2024). A review of convolutional neural networks in computer vision. Artificial Intelligence Review, 57(4): 99. https://doi.org/10.1007/s10462-024-10721-6
[6] Bendjillali, R.I., Beladgham, M., Merit, K., Taleb-Ahmed, A. (2020). Illumination-robust face recognition based on deep convolutional neural networks architectures. Indonesian Journal of Electrical Engineering and Computer Science, 18(2): 1015-1027. https://doi.org/10.11591/ijeecs.v18.i2.pp1015-1027
[7] Ilyas, B.R., Abderrazak, T.A., Sofiane, B.M., Bahidja, B., Imane, H., Miloud, K. (2023). A robust-facial expressions recognition system using deep learning architectures. In 2023 International Conference on Decision Aid Sciences and Applications (DASA), Annaba, Algeria, pp. 541-546. https://doi.org/10.1109/DASA59624.2023.10286798
[8] Wei, F., Zheng, S., Zhou, X., Zhang, L., Lou, C., Zhao, Z., Yang, X. (2022). Detection of direct sequence spread spectrum signals based on deep learning. IEEE Transactions on Cognitive Communications and Networking, 8(3): 1399-1410. https://doi.org/10.1109/TCCN.2022.3174609
[9] Lu, Y., Li, X., Guan, H., Yang, K., Peng, T. (2024). Enhanced angle-of-arrival estimation via convolutional neural network-based MUSIC algorithm. In International Conference on Image, Signal Processing, and Pattern Recognition (ISPP 2024), Guangzhou, China, pp. 1056-1061. https://doi.org/10.1117/12.3033545
[10] Efimov, E., Neudobnov, N. (2021). Artificial neural network based angle-of-arrival estimator. In 2021 Systems of Signals Generating and Processing in the Field of on Board Communications, Moscow, Russia, pp. 1-5. https://doi.org/10.1109/IEEECONF51389.2021.9416062
[11] Liu, S., Li, X., Mao, Z., Liu, P., Huang, Y. (2024). Model-driven deep neural network for enhanced AoA estimation using 5G gNB. In Proceedings of the AAAI Conference on Artificial Intelligence, Vancouver, Canada, pp. 214-221. https://doi.org/10.1609/aaai.v38i1.27773
[12] Yang, H., Lam, K.Y., Nie, J., Zhao, J., Garg, S., Xiao, L., Guizani, M. (2021). 3D beamforming based on deep learning for secure communication in 5G and beyond wireless networks. In 2021 IEEE Globecom Workshops (GC Wkshps), Madrid, Spain, pp. 1-6. https://doi.org/10.1109/GCWkshps52748.2021.9681960
[13] Aljohani, K., Elshafiey, I., Al-Sanie, A. (2022). Implementation of deep learning in beamforming for 5G MIMO systems. In 2022 39th National Radio Science Conference (NRSC), Cairo, Egypt, pp. 188-195. https://doi.org/10.1109/NRSC57219.2022.9971327
[14] Al Kassir, H., Zaharis, Z.D., Lazaridis, P.I., Kantartzis, N.V., Yioultsis, T.V., et al. (2022). Antenna array beamforming based on deep learning neural network architectures. In 2022 3rd URSI Atlantic and Asia Pacific Radio Science Meeting (AT-AP-RASC), Gran Canaria, Spain, pp. 1-4. https://doi.org/10.23919/AT-AP-RASC54737.2022.9814201
[15] Zamzami, I.F. (2022). Deep learning models applied to prediction of 5G technology adoption. Applied Sciences, 13(1): 119. https://doi.org/10.3390/app13010119
[16] Lavdas, S., Gkonis, P.K., Tsaknaki, E., Sarakis, L., Trakadas, P., Papadopoulos, K. (2023). A deep learning framework for adaptive beamforming in massive MIMO millimeter wave 5G multicellular networks. Electronics, 12(17): 3555. https://doi.org/10.3390/electronics12173555
[17] Rahman, M.H., Sejan, M.A.S., Aziz, M.A., Baik, J.I., Kim, D.S., Song, H.K. (2023). Deep learning-based improved cascaded channel estimation and signal detection for reconfigurable intelligent surfaces-assisted MU-MISO systems. IEEE Transactions on Green Communications and Networking, 7(3): 1515-1527. https://doi.org/10.1109/TGCN.2023.3237132
[18] Goodfellow, I., Bengio, Y., Courville, A., Bengio, Y. (2016). Deep Learning. Cambridge: MIT Press.
[19] Goodfellow, I. (2016). Nips 2016 tutorial: Generative adversarial networks. arXiv preprint arXiv:1701.00160. https://doi.org/10.48550/arXiv.1701.00160
[20] LeCun, Y., Kavukcuoglu, K., Farabet, C. (2010). Convolutional networks and applications in vision. In Proceedings of 2010 IEEE International Symposium on Circuits and Systems, Paris, France, pp. 253-256. https://doi.org/10.1109/ISCAS.2010.5537907
[21] Srivastava, N., Hinton, G., Krizhevsky, A., Sutskever, I., Salakhutdinov, R. (2014). Dropout: A simple way to prevent neural networks from overfitting. The Journal of Machine Learning Research, 15(1): 1929-1958. https://doi.org/10.5555/2627435.2670313
[22] Bishop, C.M., Nasrabadi, N.M. (2006). Pattern Recognition and Machine Learning. New York: Springer, p. 738.
[23] Shiri, F.M., Perumal, T., Mustapha, N., Mohamed, R. (2023). A comprehensive overview and comparative analysis on deep learning models: CNN, RNN, LSTM, GRU. arXiv preprint arXiv:2305.17473. https://doi.org/10.48550/arXiv.2305.17473
[24] Li, Y., Shi, B., Shu, F., Song, Y., Wang, J. (2023). Deep learning-based DOA estimation for hybrid massive MIMO receive array with overlapped subarrays. EURASIP Journal on Advances in Signal Processing, 2023(1): 110. https://doi.org/10.1186/s13634-023-01074-3
[25] Merkofer, J.P., Revach, G., Shlezinger, N., Routtenberg, T., Van Sloun, R.J. (2023). DA-MUSIC: Data-driven DoA estimation via deep augmented MUSIC algorithm. IEEE Transactions on Vehicular Technology, 73(2): 2771-2785. https://doi.org/10.1109/TVT.2023.3320360
[26] Li, Y., Huang, Z., Liang, C., Zhang, L., Wang, Y., et al. (2023). DOA estimation using deep neural network with angular sliding window. Electronics, 12(4): 824. https://doi.org/10.3390/electronics12040824
[27] Merkofer, J.P., Revach, G., Shlezinger, N., van Sloun, R.J. (2022). Deep augmented music algorithm for data-driven DoA estimation. In ICASSP 2022-2022 IEEE International Conference on Acoustics, Speech and Signal Processing (ICASSP), Singapore, pp. 3598-3602. https://doi.org/10.1109/ICASSP43922.2022.9746637