Improved Reinforcement Learning for Reliable Routing in Medical Wireless Sensor Networks

Improved Reinforcement Learning for Reliable Routing in Medical Wireless Sensor Networks

Daggu Lingamaiah* D. Krishna Reddy Perumalla Naveen Kumar

Department of Electronics and Communication Engineering, University College of Engineering (A), Osmania University, Hyderabad-500007, Telangana, India

Department of Electronics and Communication Engineering, Chaitanya Bharathi Institute of Technology, Osmania University, Hyderabad-500075, Telangana, India

Corresponding Author Email:
15 September 2023
25 November 2023
12 December 2023
Available online: 
27 December 2023
| Citation

© 2023 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (



The medical field is one of the various evolving areas of wireless sensor network (WSN) applications. WSN is a self-creating entity that requires no pre-infrastructure support for data exchange, and this special characteristic of WSN is used for monitoring vital parameters of patients in hospitals. However, the latency and packet loss issues in WSN are critical to the monitoring of sensitive vital parameters. To develop reliable data exchange for WSN, a low-risk reliable routing (LRRR) approach is proposed. The LRRR method proposes an updated reward metric in reinforcement learning for optimal clustering and head selection in the WSN interface. In addition to the existing interface unit in WSN, a decision unit for measuring the packet forwarding factor is proposed. The proposed monitoring factor improves the existing reward metric with reference to the packet forwarding conditions in the network. An updated reward factor improves the reliability of packet exchange by monitoring the energy and forwarding condition of a node in the network. Performance of WSN communication using the LRRR method in vital parameter monitoring observed an improvement in network throughput of 30% and network life time by 13 msec. A decrease in the end-to-end (E2E) delay is observed for 5 sec. compared to existing cluster-based routing approaches in WSN.


cluster formation, cluster head selection, low-risk reliable routing, medical application, patient monitoring, reinforcement learning, residual energy, wireless sensor network

1. Introduction

Large-area patient monitoring to track vital parameters has become an increasingly important task in contemporary medical applications. The dual challenge of ensuring both the sensitivity of collected data and the accuracy of its transmission complicates the management and exchange of sensor data in practical healthcare scenarios. With the growing need for swift and precise data exchanges, wireless sensor networks (WSN) have emerged as a superior solution for data transfer, providing a broader and more reliable interface compared to traditional communication technologies. Their widespread adoption in healthcare monitoring systems is driven by their ability to offer robust wireless monitoring and dynamically adaptable networks. However, the implementation of WSN in sensitive applications like healthcare monitoring is not without its challenges.

The field of wireless healthcare monitoring has seen significant growth due to advancements in medical diagnostics and wireless communications. WSN, equipped with advanced sensors, facilitate real-time, remote monitoring of patients. The extensive distribution of patients and the consequent monitoring demands pose a challenge, often requiring substantial manpower and potentially leading to increased mortality rates in critical cases. To address this, recent developments in pervasive healthcare systems have introduced fusion models capable of monitoring various vital signs across different patients within a hospital setting.

An information system (IS) for patient vital sign monitoring using wireless communication was introduced by Wang et al. [1], featuring a data fusion model for processing multiple sensor readings in remote monitoring scenarios. However, with increased patient interactions, the system's data overhead also escalates. To counter this, Sriram et al. [2] developed a selective intelligence approach named ‘Health Care Context Aware Computing’ (HCCAC), which aims to minimize overhead by prioritizing vital parameters based on a ranking factor. Nonetheless, this approach does not account for deviations in monitoring parameters, which can impact the accuracy of alarm systems. Other initiatives, such as intelligent emergency management services and self-adaptable automation services within healthcare monitoring systems (HCMS) [3, 4], have been developed to wirelessly monitor patient vitals, yet they often overlook the critical aspect of routing reliability.

A variety of routing strategies have been devised to improve the efficiency of data exchange within WSN, with nodes serving as the principal devices collecting and communicating sensory data [5, 6]. Despite their critical role, power constraints of remote devices pose a substantial challenge in WSN [7]. Among the various techniques, clustering-based routing [8-10] has gained prominence because of its potential for energy conservation and efficient resource management. Nodes within a WSN form small clusters to facilitate communication, with a designated cluster head orchestrating the exchange of information. Selecting the optimal cluster head is, however, a complex task in WSN.

The Low-energy adaptive clustering hierarchy (LEACH) [11] protocol is a prevalent cluster-based communication method in WSN that employs a probabilistic approach for cluster head selection through energy polling. The cluster head is chosen randomly, favoring nodes with higher energy. Upon selection, the head node signals an update to its neighbors, and all nodes within the cluster coordinate data exchanges through this head node [12]. Selecting the node with the most energy as the cluster head can extend the network's lifespan by protecting lower-energy nodes from power depletion. Still, the random nature of head node selection can lead to suboptimal traffic conditions, resulting in delays and increased power consumption within the WSN. Energy-efficient cluster head selection methods in WSN, which make decisions based on the residual energy of nodes, can lead to increased network delays due to dynamic changes in power levels caused by varying traffic flows. Such effects have been explored through various models and algorithms [13-17]. These methods perform head selection based on the residual energy of nodes in the network. Varying traffic flow with such methods results in a dynamic change in power level, which results in a higher delay in the network.

Optimization of cluster head selection in WSN has been addressed through threshold-based methods that consider residual energy and cluster count, yet they face challenges with power optimization [18]. Simplified sensing approaches have been proposed to reduce computation overhead during data exchange, but they also encounter power-related constraints [19]. Techniques have been introduced to manage data packet communication with consideration for interference, which, while improving performance, do not fully address power optimization issues [20]. The ILEACH algorithm represents an advancement in this area by establishing a threshold that relates to total network energy consumption; however, it does not fully take into account traffic flow conditions, node placement diversity, or randomness in resource use, all of which can significantly affect WSN communication performance [21].

Intelligent methods for optimizing WSN routing have been presented, leveraging learning approaches to address dynamic conditions in networks. These methods include the particle swarm optimization (PSO) for cluster head selection in uniformly distributed nodes [22], an energy-efficient TDMA based PSO algorithm is proposed in [23] in which clusters are formed during the setup phase and data is transmitted during the steady phase, the Wolf optimizer method for cluster head selection and routing [24], and a fuzzy-based system with a modified K-mean approach for optimal cluster head selection [25]. Additionally, a firefly algorithm has been introduced to improve cluster formation in networks with heterogeneous node distribution [26]. Furthermore, a heuristic machine learning method using reinforcement learning has been proposed, utilizing a Q-learning approach to compute a monitoring factor for cluster head selection, which has been shown to improve network throughput, node lifetime, and reduce latency [27].

While the aforementioned intelligent methods enhance the performance of wireless sensor networks (WSN), the critical aspect of data exchange reliability remains unaddressed. While the selection of heads and routes based on residual energy and traffic conditions improves network throughput, the reliability of the head node in data exchanges must be closely examined to meet the increasing demands of WSN data transfer. In the context of healthcare monitoring systems, where vital parameters are highly sensitive, the reliability of data exchange is paramount. Consequently, the operational characteristics of a node play a pivotal role in the selection of cluster heads and routing paths within a WSN. It's important to note that nodes with higher residual energy can still fail during data exchanges, whether due to intentional disruptions or network errors.

Acknowledging these reliability limitations, this paper proposes a low-risk reliable routing (LRRR) method in WSN. This method selects cluster heads based on reliability and risk assessments within the network. Building on the recent advances of the Q-learning approach in WSN head selection [27], the reward factor is refined with a newly proposed data exchange metric, enhancing the reliability of nodes during data transfers and optimizing the performance of node registration and cluster head selection within the network.

To articulate the proposed approach, this paper is structured into five sections. Section 2 introduces the learning-based approach for cluster head selection in WSN. Section 3 delineates the proposed methodology for ensuring reliable head selection and routing in WSN. Section 4 provides an analysis of the proposed approach, and Section 5 offers a concise conclusion of the work presented.

2. Learning Method for Head Selection and Routing in WSN

2.1 Wireless sensor network

In recent years, wireless sensor networks have gained a lot of usage in practical applications. The self-deploying and communication properties of WSN offered the advantage of data exchange at remote locations. WSN is very suitable for usage where pre-infrastructure installation is a difficult task. Evolved WSN has developed interfaces for both indoor and outdoor usage [28]. Captured data from the sensor unit of WSN is processed for exchange over a wireless medium. The nodes are formed with an integrated model of the codec’s and sensor unit. Encoded data is exchanged over the wireless medium in cooperation with intermediate nodes, which forward communicating packets to the destination. The self-creating nature of WSN has the advantage of remote usage with no pre-infrastructure dependency; however, the paths used in data exchange are highly dynamic in nature, which minimizes the reliability of this network in critical application usage. The constraint on the battery source and randomly varying traffic conditions limit WSN usage in various real-time applications. Reliable routing is a prime requirement in WSN due to its sensitive data processing from sensor nodes. Communication in WSN is performed using a cluster-based approach, where clusters are formed based on node coverage range. A head node is chosen as a centralized link node for data exchange based on maximum coverage and energy level. All member nodes communicate via the selected head node. Optimal clustering and head selection are needed in WSN for varying node characteristics such as data sensitivity and gain factor in the cluster. Efficient cluster heads are to be selected in WSN for higher reliability for data exchange, governing traffic conditions, and energy dissipation. Reliable routing under varying link conditions for low latency and higher accuracy is needed in the current WSN. The existing approaches to cluster formation, head selection, and routing were developed based on communication range and energy constraints. To optimize head selection and routing in WSN, machine learning approaches were introduced in the recent past. Heuristic learning approaches were developed for head selection. In recent years, a reinforcement learning approach for head selection and routing in WSN has been presented by Wang et al. [22]. Limiting constraints such as, latency, energy, and node density were used in the reinforcement approach to optimize routing in WSN. Figure 1 illustrates the approach for data exchange in a wireless sensor network using cluster-based communication with member and head nodes interfaced. The formation of optimal clusters and head selection are the two critical needs for reliable and efficient communication in WSN.

Figure 1. Communication scenario in WSN

2.2 Challenges

In the current WSN, data is exchanged using cluster-based communication. As WSN are remotely deployed, communication is made using small subzones called ‘clusters’. In existing cluster-based communication, clusters are formed using a range constraint (where device power and the protocol used define the maximum coverage range). Nodes that fall within the range of multiple clusters have the probability of moving into any cluster; however, a random placement results in higher interference and power dissipation. All nodes in a cluster communicate using a centralized interface node called the head node. Head nodes are selected based on the maximum power and coverage of a node. As the head node is continuously engaged in data exchange, the possibility of node draining is higher, as is the maximum interference at this node, resulting in faster power dissipation. Rapid power dissipation results in faster failure of the network. The challenge of head node selection and optimal routing in WSN is a major concern that is required to be addressed.

2.3 Reinforcement learning approach for head selection

To overcome the issue of power dissipation and to extend the network life time, an approach to head selection and node placement in a cluster is outlined in the study of Mahmood et al. [27]. In monitoring of data flow through a head node, each head node updates the communication status to a centralized monitoring unit in a periodic manner, as depicted in Figure 2.

The centralized monitoring unit observes the residual energy level of each head node and controls the head selection process based on the residual energy at each node. The most popular head selection algorithm is the LEACH algorithm [11], which selects the head node based on maximum energy level. LEACH attained a 15% improvement of network lifetime in WSN. However, LEACH is observed to have less performance in head selection due to following factors:

(1) Random cluster head selection;

(2) Unmonitored cluster formation;

(3) High power dissipation;

(4) Highly volatile under topology variation.

Figure 2. Monitoring of sensor data in a wide area

To overcome the issue with LEACH, a machine learning approach using reinforcement learning was introduced in the study of Mahmood et al. [27]. Reinforcement learning is a significant subfield of machine learning that concentrates on action learning to varying environment for expected outcome. Q-Learning approach [28] is an optimal method used in reinforcement learning that is based on the decision process from Markovian approach with no prior knowledge. Q-Learning algorithm determines ideal route under dynamic network conditions. The Q-learning approach perform operation on action ‘a’, and compute a Q-metric (Qt) [27] which correlates a reward value (Rt) for an action (at) at iteration t, given as:

$\begin{aligned} & Q_t\left(a_t\right)=(1-\alpha) Q_t\left(a_t\right) +\alpha\left[R_a^t\left(a_t+1\right)+\gamma \max Q_t\left(a_t\right)\right]\end{aligned}$         (1)

where, α represents the rate of learning and γ the discount factor, $R_a^t$ is the reward value for an action at.

The reward value ($R_a^t$) is defined by the energy usage for a volume of data exchange ($d t_k^t$), over all paths in the network at time‘t’, defined as:

$R_a^t=\frac{\sum\left(\min E(n)+\sum E(n)\right)}{\sum n\left(P_{t h}\right)}-t$                 (2)

where, E(n) is the residual energy at a node after data exchange, which is defined as:

$E(n)=E_k^t-d t_k^t \times E_{\text {cost }}^t$                    (3)

where, $E_k^t$ is the available energy at time t for kth node, $d t_k^t$ is the volume of data in the buffer, $E_{\text {cost }}^t$ is the energy cost for a unit data, t is the time slot.

The path with the highest reward value, defined by the residual energy, is chosen as the optimal path for data exchange. The path selected for communication performs data exchange via cluster heads. Selection of a cluster head is limited by a threshold value (Thn), which is dynamically computed based on the probability of cluster head selection of a node, defined by:

$T h_n=\frac{Prob(p)}{1-Prob\times\left(rmod \times\left(\frac{1}{Prob}\right)\right)}$               (4)

where, Prob(p) is the probability of cluster heads at rth iteration.

All member nodes in a cluster with a reward factor above the computed threshold are declared heads. Registered member nodes in the cluster exchange data via selected head nodes. The selection of a head node is developed based on residual energy; however, the reliability of data exchange for the selected head node is not observed. Reliability of data exchange in WSN is a critical need, as the data communicated is sensitive in nature. Loss of information or delay in data exchange results in non-accurate decisions. Hence, to introduce the reliability factor in WSN head selection, routing, and cluster formation, an improved learning approach updating existing reward function is proposed. The proposed approach of low-risk reliable routing for WSN is interfaced for vital monitoring in medical application, as presented in the following section.

3. Low-Risk Reliable Routing (LRRR) for Vital Data Monitoring

For cluster-based communication in WSN, cluster heads are the main interlinks for data exchange. In many cases, head node fails to forward packets due to large traffic condition or power constraints. To avoid failure of packet loss, multi-head communication is adopted. Multi-head communication offers a higher guarantee of forwarding packet delivery in WSN. A two-head communication based on Q-leaning is outlined in the study of Mahmood et al. [27]. In this approach, among the probable head nodes, the node with the maximum reward is selected as the primal cluster head and the next as a secondary head. During data exchange, secondary heads are used in forwarding data when primal head fails in forwarding packets. The outlined approach dynamically selects head nodes based on node residual energy. The decisions are made using a varying threshold value computed based on active network conditions. The dynamic approach results in the selection of heads and communicating paths more efficiently, improving data delivery, network throughput, network life time, and decrease in latency metric. The selection of a head node for data exchange is defined by a reward factor measured based on action in the network. However, the varying condition of WSN during data exchange has an impact on multiple factors that effect the data exchange performance of WSN, listed as follows:

(1) Rate of power dissipation;

(2) Traffic flowing through the head nodes;

(3) Interference observed at the head node;

(4) Cluster density.

Random variation of these factors constraints the communication performance, effecting the accuracy of data exchange in WSN. For selecting head nodes and paths for communication, these factors need a simultaneous monitoring to improve the reliability of data exchange through selected head nodes. To improve the efficiency of data exchange, a low-risk, reliable approach for head selection and cluster formation is proposed. The approach defines a monitoring parameter for the existing reward metric to improve the reliability of the selected head node. A controlling and decision unit is interfaced with the existing monitoring unit to monitor the reliability factor of each node used in the selection of head node. Figure 3 illustrates the proposed approach to reliable monitoring in WSN communication.

The decision unit reads the status of packet forwarding using a sensing signal from each node, which is generated on request for an updated value of the forwarding parameter measured at the node. The decision unit generates a control signal for reading the status of forwarding parameter from each node. The signal flow for the proposed approach in WSN is illustrated in Figure 3. The proposed LRRR approach defines two parameters of forwarding and blockage of packet as φ and, ρ respectively. During communication, the head node updates these two monitoring factors and shares them with the decision unit to select the optimal path for member nodes.

Figure 3. WSN interface for reliability measure in WSN

The updation of the two factors is given as:

$\varphi t=\left(\varphi+\varphi^{\prime}\right)+\delta$                  (5)

$\rho t=\left(\rho+\rho^{\prime}\right)+(1-\delta)$                       (6)

where, δ is the updation factor which is given value 1 on forwarding and 0 on blocking.

The monitoring factor (M) for a path is defined as:

$M=\frac{\varphi t}{\rho t}$                (7)

The monitoring factor (M) is applied to the reward function, which define the updated reward value for an action ‘a’ as:

$R_a^{t \text { _update }}=\left(\frac{\sum\left(\min E(n)+\sum E(n)\right)}{\sum n\left(P_{t h}\right)}-t\right) \times M$                  (8)

Substituting M define the updated reward value as:

$R_a^{t_{-} \text {update }}=\left(\frac{\sum\left(\min E(n)+\sum E(n)\right)}{\sum n\left(P_{t h}\right)}-t\right) \times \frac{\left(\varphi+\varphi^{\prime}\right)+\delta}{\left(\rho+\rho^{\prime}\right)+(1-\delta)}$         (9)

Change in forwarding condition update δ, φ which effects the reward factor and the path selected for communication. With the proposed approach, the path selected for communication is energy efficient and reliable for data exchange. The reward factor is defined with respect to residual energy and packet forward factor. This provides a higher path existence probability and a longer life time for the network. Path selection at a node is developed with the prior knowledge of head forwarding conditions. The overhead of path re-requests due to packet failure is hence prevented. This offered a large reduction in power dissipation at head node and improves the overall network performance. The communication is performed among different sub-clusters via the selected head node. Random clustering and node registration into cluster results in an improper loading into the network. To minimize random loading into the network, a risk factor is introduced, define as:

$\operatorname{Risk}(N r)=\left(\operatorname{Prob}\left(\sum E(n)+E(n r)\right)-(N c) \times D\right)$            (10)

where, E(n)+E(nr) defines the aggregated energy due to a registering node E(nr) and the current energy level E(n) in the network. (NcD indicates the volume of data increase due to Nr nodes in the network, and the total node (Nc) is defined by:

$N c=(N+N r)$                 (11)

The proposed risk factor is measured by comparing total energy gain in a cluster with the volume of data overhead due to additional Nr nodes in the network. The computed risk factor is compared to a limiting value Lc, and a node with risk value below limiting value is registered into the cluster. Nodes with a risk factor above limiting value Lc are discarded for registering into the current cluster and processed for other cluster selection.

The limiting value Lc for a node registration in a cluster is defined by:

$L_C=\frac{Prob(p)}{1-Prob \times\left(rmod \times\left(\frac{1}{Prob}\right)\right)} \times \frac{E_{n r} \times N_C}{E_{i n i} \times N_C}$                    (12)

where, Enr and Eini are the overall network energy and initial energy respectively.

Optimal clustering and reliable head selection offer high performance operation in WSN, which increases its application scope. To validate the proposed work, the outlined method is applied to the vital monitoring of patient in a large distributed network. The reliable routing increases network performance in terms of network throughput and minimal delay for data exchange in the network. A representation of vital monitoring in WSN using proposed approach is illustrated in Figure 4.

Figure 4. Vital parameter monitoring using adaptive routing for WSN in medical data interface

The proposed approach interfaces multiple sensors for sensing patient vitals and exchanges the processed data via selected head node. Interfacing of vital parameter is performed using patient records, which are buffered as text file and encoded for transmission over wireless channel. The flow diagram of the proposed approach for vital monitoring is shown in Figure 5.

The proposed work monitors and communicates vital data such as temperature, heart rate and oxygen level through the interfacing sensor nodes. Reliable head selection and node with a lower risk factor contribute to faster and more accurate data exchange, which is much needed in medical applications. Flowchart for the proposed approach is shown in Figure 6.

Figure 5. Flow diagram of vital monitoring using WSN interface

Figure 6. Flow chart for developed LRRR method

Communication of the patient’s vital parameters via a selected path using an updated reward factor guarantees a reliable path with higher residual energy in data exchange. The dual monitoring factors improve network throughput and minimal loss of information in vital data exchange over WSN. The proposed work monitors and communicates vital data such as method finds its application in many real time applications such as wide area hospitals, military aids in remote location and battle fields, remote telemedicine applications etc. Wherein, existing method optimizes head selection and data exchange using residual energy defining reward factor, proposed method forwarding characteristic of node adds to the node reliability. Higher reliability of node observes minimum blockage and less power dissipation compared to existing approach. Higher reliability leads to faster data flow increasing network throughput and longer lifetime compared to existing methods. To the observed advantage there exist limitations which can be addressed in future. The outlined method integrates the observing metrics of residual energy and node characteristic with an assumption of ideal channelling conditions. To improve presented method under varying channel condition, outlined method can be extended with interference monitoring condition as an additional metric for further reliability improvement in future work. Observations of the proposed work for data packet exchange in WSN are presented in following section.

4. Result Observations

The evaluation of the proposed approach is developed for a randomly distributed network with network parameters listed in Table 1. The simulated network is randomly distributed, with nodes placed at random locations in the network area. Powers assigned to the nodes are randomly allocated to have a nonlinear distribution of power levels in the network. The network is simulated for 200 × 200 m2 network area, with the communication range of each node set to 45m. A simulation of the proposed approach is performed for varying number of nodes in the network and varying payload size for data exchange. The simulation is observed for network throughput, network lifetime, delay and number of alive nodes in the network for varying analyzing parameters.

Table 1. Network Parameters for simulation

Network Parameter


Node Layout


Route Discovery


MAC Interface

IEEE 802.11

Communication Range


Network Area

200 × 200 m2

Node Counts


Figure 7 shows the simulated network for a randomly distributed node in a 200 × 200 m2 network area. Each node in the network is allotted with a distinct node ID, bandwidth, and power level. Nodes with a range of 45m in distance are declared as direct link nodes. All link nodes exchange their registered link nodes to form possible paths for communication as shown in Figure 8. The proposed risk factor is computed to form a sub cluster as shown in Figure 9 below. Based on the power level and reward factor, a shared head node is selected and all linked nodes are registered to the head node for data exchange, selecting a suitable path as shown in Figure 9.

To observe the accuracy of vital parameter exchange over the simulated network, five vital parameters, namely temperature, diastolic, systolic, pulse rate, and SpO2, are observed. Vital parameters are generated using the Matlab interface, where a random, varying signal is generated with a range of vital limits. Observations for the vital signals are shown in Figures 10-14. The network is interfaced with vital parameters using external text files. Variations in patient vitals are observed for a period of 30 seconds, as illustrated in the below figures.

Figure 7. Layout network for communication

Figure 8. Possible within range network links (R=45m)

Figure 9. Path selected for commutation

Figure 10. Signal for temperature variation

Figure 11. Signal for diastolic pressure variation

Figure 12. Signal for systolic pressure variation

Figure 13. Signal for pulse rate variation

Figure 14. Signal for SpO2 variation

Sensed data is exchanged over the network using an interfacing head and gateway node to the destination. The volume of packets exchanged and time taken are measured for the computation of network parameter. Network performance measured with varying node counts and packet size is shown in Figures 15-19. An observation of network throughput for varying node density is shown in Figure 15. The network throughput of the simulated network is defined by the volume of data exchange for an observing time period. The proposed LRRR approach attains a network throughput of 14800 kbps at 100th node, whereas the throughput of 11200 kbps, 8200 kbps and 5200 kbps are observed for LEACH-EFT, TL-LEACH and LEACH methods, respectively. Reliable head node offers a high probability of data exchange with minimal data blockage, resulting in a higher number of data packet exchange over the network in an observing time period. With an increase in node density, the network forms clusters more optimally by proposed risk monitoring, resulting in lesser packet blockage in the network. A faster data flow in the network results in an increase in the throughput using LRRR approach as observed in Figure 15.

Figure 15. Network throughput with varying node counts

Figure 16 presents the number of alive nodes with varying node counts in the network. As energy dissipation is observed with every packet exchange, the nodes with lower energy levels drain faster and are eliminate from the network, resulting in faster node elimination. The node alive count defines the total number of nodes in the network for a given observation time. The alive node count is observed to be 89 for the proposed LRRR approach at 100th node, whereas it is observed to be 78, 65, and 57 for the existing LEACH-EFT, TL-LEACH, and LEACH methods, respectively. The proposed LRRR approach defines routes using risk and reliability measures which reduce the packet drop probability, hence minimizing the retransmission rate. Decrease in retransmission rate preserves energy levels at each node, resulting in more alive nodes in the network.

Figure 16. Alive node counts for varying node count

Figure 17 presents the observed network life time for varying node counts in the network. The network lifetime for the proposed LRRR approach is observed to be 100 msec whereas for the LEACH, TL-LEACH, and LEACH-EFT methods, it is observed as 33, 78, and 87 msec respectively. Monitoring of risk and reliability factors in cluster formation and head selection provides a less blockage path. Less blockage results in power saving, which improves the lifetime of the network.

Figure 17. Network lifetime with varying node count

The delay metric is measured as an End-to-End communication time duration observed for data exchange in the network. The delay parameter at 100th node count for the developed LRRR method has been decreased to 4.9 sec, 11.9 sec and, 17.7 sec when compared to the existing LEACH-EFT, TL-LEACH, and LEACH respectively. The network with selected head node offers a faster data exchange due to its selection based on the forwarding factor. This monitoring results in lower packet blockage and decreases End-to-End delays in the network. A comparison of the delay parameter for the proposed LRRR method compared to the existing LEACH, TL-LEACH, and LEACH-EFT methods is shown in Figure 18. Table 2 lists the performance values at 100th node.

Figure 18. E2E Delay for varying node counts

Table 2. Varying node count performance at 100th node







Alive nodes





Network lifetime





E2E delay





The packet delivery ratio (PDR), defined as a ratio of the volume of data received at destination over the volume of data being transmitted from the source, is presented in Figure 19. The observation of packet delivery ratio for varying node counts in the network is illustrated. With the rise in the number of nodes in the network, PDR for the network is observed to increase. The availability of additional nodes increases the routing probability and node availability, which results in an increase in higher data exchange rates. However, conventional LEACH, TL- LEACH, and LEACH-EFT methods perform data exchange over a less reliable path, as no measure is taken measures the reliability of the path before selection. The proposed method selects the path with the highest reliability factor using data exchange characteristics in the network. This minimizes the blockage factor in the network generated due to retransmission of packets. The proposed method selects the routing of data packets having a higher forwarding probability, hence improving the packet delivery ratio in the network. The simulated graph (Figure 19) shows that, at 100th node, the PDR values for LEACH, TL-LEACH, and LEACH-EFT are observed to be 87, 87.8, and 90.5 respectively. The proposed method is observed to increase PDR by 5.4%, 4.6%, and 1.9% as compared to the existing LEACH, TL-LEACH, and LEACH-EFT approach respectively.

Figure 19. PDR with varying number of nodes

The performance of the WSN for varying packet count is evaluated as shown in Figures 20-22. Setting a node count of 100 in the network, and the volume of packet counts for exchange is evaluated. Different packet sizes in communication have variations in the monitoring parameters, which affect the accuracy of cluster formation and optimal head selection. The observation of number of alive nodes, network lifetime, and delay is observed with varying number of exchange packets in the network.

Figure 20. Alive nodes for varying exchange packet counts

Figure 21. Network lifetime for varying packet count

Figure 22. E2E Delay for varying packet count

The volume of packet exchange has an impact on the traffic flow and resource allocation. Whereas existing approaches focus on energy-based head selection and data exchange, reliability of packet forwarding is not observed. Head nodes with higher energy levels could be blocked due to other factors in the network such as interference and traffic congestion. The proposed selection of head is governed by power and blockage level, which optimally select a path with higher reliability. Faster data exchange minimizes the blockage probability, which reduces power dissipation at each node in the network. Lower power dissipation increases the number of node alive counts, and network life time. A comparison of node life count and network lifetime is shown in Figures 20 and 21 respectively.

The E2E delay parameter is measured as the time taken from encoding sensed data to delivering it to the destination. The E2E delay is observed to be reduced by the LRRR method compared to the LEACH, TL-LEACH and LEACH-EFT methods.

Observation of simulated networks for varying packet exchanging counts with LRRR, illustrates an increase in alive node count by 34, an increase in network lifetime by 71.3 msec, and the delay is observed to be minimized by 8 sec, as compared to LEACH, at 1000th packet exchange. Also, there is an increase in alive node count by 24, increase in network lifetime by 33.2 msec, and the delay is observed to be minimized by 4.6 sec, as compared to TL-LEACH. The simulated results also show an increase in alive node count by 11, an increase in network lifetime by 21.1 msec, and the delay is observed to be minimized by 0.4 sec, as compared to LEACH-EFT. The performance metrics are tabled in Table 3 below:

Table 3. Varying packet count performance at 1000th level

Methods/ Parameters





Alive nodes





Network lifetime





E2E delay





Observations illustrate the significance of the integration of node forwarding characteristics as compared to the existing optimal data communication. The reward factor of Q-Learning updated with the proposed node characteristic results in an increase in node reliability in terms of data exchange. The methodology is seen to increase reliability, and network performance indicators such as network throughput, node lifetime, and packet delivery are also seen increased. The reliability factor increased the selection accuracy of the route for data exchange, which resulted in an increase in network parameters. In comparison to existing Q-Learning, where the objective was to conserve energy, higher energy dissipation is observed due to packet blockage, dropping, and retransmission processes. The reliability factor increased the probability of data exchange in the network, which relatively decreased the observed limiting factors. With observed performance and increases in reliability, the outlined method is observed to be more suitable for real-time applications under remote and critical usages.

5. Conclusions

This work proposes a method for cluster head selection and optimal cluster formation in a wireless sensor network, taking reliability factors into consideration. The formation of clusters and head selection is developed based on a modified reinforcement learning method to improve energy efficiency and network performance. A probabilistic prediction model using a learning approach to define a dynamic threshold for head selection is proposed. The proposed reliability factor, defined by the packet forwarding characteristic, improved the data exchange rate at the head node, which is a critical need for vital monitoring in medical applications. The simulated network is developed for monitoring vital parameters `in randomly deployed sensing nodes in the network and exchanging data using the LRRR approach. Network parameters defining network throughput, life time, and alive node count are observed to improve with the proposed LRRR approach with minimized end-to-end delay in the network. The packet delivery ratio is observed to improve for the proposed method by selecting a reliable path with an increase in node count and the number of packets exchanging in the network. The method outlined, considering reliability, improves network throughput and node life, which are critical parts of a real-time application. Applications such as health care monitoring demand a higher data exchange rate with minimal loss. The outline method offers the objective of reliability in data exchange, which extends life and hence makes it suitable for low-resource applications. The presented method can be further extended to improve reliability under varying channel conditions and different service interfaces. Channel interference has a considerable impact on the reliability measure, which could be added as a consideration parameter in future work.







reward value thermal


residual energy


monitoring factor


registering node


total node


volume of data


limiting value

Greek symbols


learning rate


discount factor


forwarding factor


updation factor


blockage factor



number of nodes


time (Sec)


overall network nodes


initial value


[1] Wang, M., Wang, X., Yang, L.T., Deng, X., Yi, L. (2020). Multi-sensor fusion based intelligent sensor relocation for health and safety monitoring in BSNs. Information Fusion, 54: 61-71.

[2] Sriram, R., Geetha, S., Madhusudanan, J., Iyappan, P., Venkatesan, V.P., Ganesan, M. (2015). A study on context-aware computing framework in pervasive healthcare. In Proceedings of the 2015 International Conference on Advanced Research in Computer Science Engineering & Technology (ICARCSET 2015), New York, USA, pp. 1-5.

[3] Tarannum, S., Farheen, S. (2020). Wireless sensor networks for healthcare monitoring: A review. Inventive Computation Technologies 4: 669-676.

[4] Ahmed, H.M., Rashid, A.N. (2022). Wireless sensor network technology and adoption in healthcare: A review. In AIP Conference Proceedings, 2400(1): 020021.

[5] Krishnaraj, N., Kumar, R.B., Rajeshwar, D., Kumar, T.S. (2020). Implementation of energy aware modified distance vector routing protocol for energy efficiency in wireless sensor networks. In 2020 International Conference on Inventive Computation Technologies (ICICT), Coimbatore, India, pp. 201-204.

[6] Kim, J., Lee, D., Hwang, J., Hong, S., Shin, D., Shin, D. (2021). Wireless sensor network (WSN) configuration method to increase node energy efficiency through clustering and location information. Symmetry, 13(3): 390.

[7] Shirazi, M., Vosoughi, A. (2020). On distributed estimation in hierarchical power constrained wireless sensor networks. IEEE Transactions on Signal and Information Processing over Networks, 6: 442-459.

[8] Sumathi, J., Velusamy, R.L. (2021). A review on distributed cluster based routing approaches in mobile wireless sensor networks. Journal of Ambient Intelligence and Humanized Computing, 12: 835-849.

[9] Han, B., Ran, F., Li, J., Yan, L., Shen, H., Li, A. (2022). A novel adaptive cluster based routing protocol for energy-harvesting wireless sensor networks. Sensors, 22(4): 1564.

[10] Shagari, N.M., Idris, M.Y.I., Salleh, R.B., Ahmedy, I., Murtaza, G., Shehadeh, H.A. (2020). Heterogeneous energy and traffic aware sleep-awake cluster-based routing protocol for wireless sensor network. IEEE Access, 8: 12232-12252.

[11] Mishra, P., Alaria, S.K., Dangi, P. (2021). Design and comparison of LEACH and improved centralized LEACH in wireless sensor network. International Journal on Recent and Innovation Trends in Computing and Communication, 9(5): 34-39.

[12] Kardi, A., Zagrouba, R. (2020). Rach: A new radial cluster head selection algorithm for wireless sensor networks. Wireless Personal Communications, 113: 2127-2140.

[13] Ali, H., Tariq, U.U., Hussain, M., Lu, L., Panneerselvam, J., Zhai, X. (2020). ARSH-FATI: A novel metaheuristic for cluster head selection in wireless sensor networks. IEEE Systems Journal, 15(2): 2386-2397.

[14] Shankar, A., Jaisankar, N., Khan, M.S., Patan, R., Balamurugan, B. (2019). Hybrid model for security-aware cluster head selection in wireless sensor networks. IET Wireless Sensor Systems, 9(2): 68-76.

[15] Dattatraya, K.N., Rao, K.R. (2022). Hybrid based cluster head selection for maximizing network lifetime and energy efficiency in WSN. Journal of King Saud University-Computer and Information Sciences, 34(3): 716-726.

[16] Goyal, A., Mudgal, S., Kumar, S. (2021). A review on energy-efficient mechanisms for cluster-head selection in WSNs for IoT application. In IOP Conference Series: Materials Science and Engineering, 1099(1): 012010.

[17] Rawat, A., Kalla, M. (2023). An energy efficient technique for improved network lifetime in wireless sensor network (WSN) through energy, distance, and density-based clustering. Instrumentation Mesure Métrologie, 22(2): 65-72.

[18] Panda, S., Behera, T.M., Samal, U.C., Mohapatra, S.K. (2020). Modified threshold for cluster head selection in WSN using first and second order statistics. IET Wireless Sensor Systems, 10(6): 292-298.

[19] Temene, N., Sergiou, C., Georgiou, C., Vassiliou, V. (2022). A survey on mobility in wireless sensor networks. Ad Hoc Networks, 125: 102726.

[20] Acevedo, P.D., Jabba, D., Sanmartín, P., Valle, S., Nino-Ruiz, E.D. (2021). WRF-RPL: Weighted random forward RPL for high traffic and energy demanding scenarios. IEEE Access, 9: 60163-60174.

[21] El-Sayed, H.H., Zanaty, E.A., Bakeet, S.S., Abd-Elgaber, E.M. (2021). Performance evaluation of leach protocols in wireless sensor networks. International Journal of Advanced Networking and Applications, 13(2): 4884-4890.

[22] Wang, J., Gao, Y., Liu, W., Sangaiah, A.K., Kim, H.J. (2019). An improved routing schema with special clustering using PSO algorithm for heterogeneous wireless sensor network. Sensors, 19(3): 671.

[23] Uppalapati, S. (2020). Energy-efficient heterogeneous optimization routing protocol for wireless sensor network. Instrumentation Mesure Métrologie, 19(5): 391-397.

[24] Zhao, X.Q., Ren, S., Quan, H., Gao, Q. (2020). Routing protocol for heterogeneous wireless sensor networks based on a modified grey wolf optimizer. Sensors, 20(3): 820.

[25] Arya, G., Bagwari, A., Chauhan, D.S. (2022). Simulation of extended clustering K-means (ECK) technique for multi-tier hierarchical WSN. In 2022 Global conference on wireless and optical technologies (GCWOT), Malaga, Spain, pp. 1-5.

[26] Pakdel, H., Fotohi, R. (2021). A firefly algorithm for power management in wireless sensor networks (WSNs). The Journal of Supercomputing, 77(9): 9411-9432.

[27] Mahmood, T., Li, J., Pei, Y., Akhtar, F., Butt, S.A., Ditta, A., Qureshi, S. (2022). An intelligent fault detection approach based on reinforcement learning system in wireless sensor network. The Journal of Supercomputing, 78(3): 3646-3675.

[28] Zhao, T., Xu, X.B., Wang, S.G. (2020). Centralized Q-learning based routing in EH-WSNs with dual alternative batteries. In Journal of Physics: Conference Series, 1544(1): 012083.