Detection of Forest Fire Using Modified LSTM Based Feature Extraction with Waterwheel Plant Optimisation Algorithm Based VAE-GAN Model

A crucial natural resource that directly affects the ecology is forests. Forest fires have become a noteworthy problem recently as a result of both natural and man-made climatic changes. A smart city application that uses a forest fire discovery technology based on artificial intelligence is provided in order to prevent significant catastrophes. A major danger to the environment, animals


INTRODUCTION
Forests are crucial to maintaining our way of life because they provide a wealth of priceless resources, such as minerals and materials required for several industrial operations [1].Beyond their obvious benefits, trees considerably improve the environment by purifying the air naturally, collecting carbon dioxide, and releasing oxygen that sustains life.Additionally, trees provide crucial habitat for a variety of species and act as a barrier against sandstorms, safeguarding crops besides maintaining ecological balance.But the widespread effects of climate change [2,3] are mostly to blame for the increasing frequency of forest fires in recent years.High temperatures and dry circumstances encourage the spread of flames, causing significant damage to ecosystems, animal habitats, natural reserves, and a clear danger to people's lives.Notably, coniferous woods, which are distinguished by their needle-or cone-shaped foliage, are especially vulnerable to fires because the sap found in their branches is flammable [4].Coniferous trees' dense growth patterns also contribute to the fast-moving spread of flames.Millions of acres of forest are annually destroyed as a result of this worrying trend, with catastrophic economic consequences.
Many nations, including the fires.In particular, the horrific Australian bushfires in 2020 provide as a sobering example of the intensity of these occurrences, resulting in the irreparable loss of forest resources, innumerable animal deaths, and human casualties.These flames destroyed 1,500 dwellings, approximately 500,000 animals, about 14 million acres of forest, and nearly a third of all living things [5].Similarly, in 2018 and 2019, devastating wildfires of equal size burned large areas of the Amazon rainforest and California's forests, causing enormous losses [6].Surprisingly, between 1992 and 2015, human activity was responsible for a whopping 85% of forest fires in the United States, with natural causes like lightning strikes and the effects of climate change accounting for the remaining 15%.More stringent regulations and reasonable practises may have prevented many of the humancaused fires [7].It is noteworthy to note that during the worldwide COVID-19 epidemic, the frequency of forest fires decreased as several nations instituted lockdown measures that restricted human activity, hence decreasing the risk of humaninduced fires [8].
DL networks have been a very successful method for addressing the crucial problem of forest fire detection.DL has shown its aptitude in a number of fields, including autonomous machine translation [9] and image and video categorization.This was made possible by DL's capacity to automatically extract and categorise properties from data stored on the same network [10].DL is especially good at detecting forest fires because to large datasets and improved processing capacity.For both ground-based and aerial images, DL-based systems have shown their superior performance over conventional machine learning techniques in handling the complexity of forest fire categorization and detection [11,12].The rise of automated DL fire detection systems holds enormous potential for the development of AI fire models are a useful tool for addressing this pressing environmental issue since they can quickly and precisely identify and track flames inside the camera's field of vision [13].
In subsequent work, we intend to look into the following possible improvements: Attention Mechanisms: By including attention mechanisms in the LSTM architecture, the model may be better able to identify significant patterns in the data by focusing on pertinent elements at certain time steps.
Ensemble approaches: By integrating several LSTM models or other recurrent neural network (RNN) types, one can improve the model's prediction presentation and robustness by using ensemble approaches.
Hybrid Architectures: Investigating hybrid architectures that integrate transformer models or convolutional neural networks (CNNs) with LSTM may offer supplementary benefits in sequence modelling and feature extraction.This paper's primary contributions are: -The dataset for this investigation was first pre-processed.This research then improves the quality of information related to forest fires by using cutting-edge DL methods and extracts useful temporal characteristics using LSTM networks.
-The performance and convergence of the LSTM model are improved by the application of the MFFA for weight optimisation.
-As a tool for classification, the VAEGAN model is used.The WWPA for hyperparameter tuning promises to greatly improve the ACC and efficacy of systems for detecting forest fires.Results are analyzed using five parameter metrics.The rest of the research is structured like shadows: The literature review is presented in Section 2, followed by a brief explanation of the proposed model in Section 3, the results and validation analysis in Section 4, and finally, a conclusion and summary are given in Section 5.

RELATED WORK
Abdusalomov et al. [14] developed a better strategy for spotting forest fires, according to their study.The Detectron2 platform, an enhanced version that was constructed from the ground up utilising DL techniques to replace the original Detectron library, is the foundation of their strategy.To help in the training of their model, they carefully chose and annotated a special dataset; this critical step ultimately resulted in a model with more ACC than rival methods.A dataset of 5200 photographs served as the testing ground for the researchers as they modified the Detectron2 model under various situations.Notably, their model proved the ability to recognise even little fires at large distances, day or night.Their method's usage of the Detectron2 algorithm, which enables long-range detection, has several advantages.Their investigations' real findings supported the ACC of their method for spotting forest fires.They were able to identify forest fires with a stunning ACC rate of 99.3%, proving the reliability and power of their recommended technique.
The GXLD technique, developed by Huang et al. [15], combines a defogging algorithm with a lightweight YOLOX-L model to identify forest fires.The dark channel prior approach is used by GXLD to remove fog from photos, producing sharper, fog-free images.On top of that, they improved the YOLOX-L model by adding elements from SENet, GhostNet, and depth separable convolution, resulting in YOLOX-L-Light.Then, using the defogged photos, this optimised model is used to identify forest fires.The researchers used the mean average Pr (mAP) metric to rate detection ACC and network parameters to determine the model's lightweightness in order to evaluate the performance of YOLOX-L-Light and GXLD.They ran trials on their dataset of forest fires, and the results showed a considerable improvement.YOLOX-L-Light increased the mAP by 1.96% while reducing the model's parameters by 92.6%.Notably, GXLD outperformed YOLOX-L by 2.46% with a remarkable mAP of 87.47%.Furthermore, GXLD provided an average frame rate of 26.33 frames per second when set up with an input picture size of 1280 720.Amazingly, GXLD displayed real-time forest fire detection skills with great ACC, strong target confidence, and sustained target integrity even under difficult foggy circumstances.
Chen et al. [16] introduced the YOLOv5s-CCAB, an improved variant of the YOLOv5s architecture for multi-scale forest fire detection, in their article.This model has seen a number of revisions.They initially introduced Coordinate Attention (CA) to YOLOv5s to direct the network's focus especially on traits associated with forest fires.Second, they developed a CoT3 module to improve the identification of forest fires, reduce parameter complexity, and have the capacity to capture global dependencies in photographs of forest fires.In order to raise the network's PR while detecting potential forest fire targets, the Complete-Intersection-Over-Union (CIoU) Loss function was enhanced.A Bi-directional Feature Pyramid Network (BiFPN) was constructed within the model's neck to increase its ability to correctly fuse the extracted forest fire features.According to the testing outcomes and their specially developed multi-scale forest fire dataset, YOLOv5s-CCAB resulted in significant improvements.It retains the high Frames Per Second (FPS) rate of 36.6 and reaches a rate of 87.7% for the AP@0.5 metric, a startling 6.2% increase.These results demonstrate the model's very fast and accurate identification.In light of this, YOLOv5s-CCAB provides an advantageous point of reference for applications requiring precise, real-time multiscale forest fire detection.
For their study, Zhang et al. [17] created the multi-scale feature extraction model (MS-FRCNN) for the detection of small target forest fires.This model enhances the conventional Faster RCNN detection technique.Instead of VGG-16, ResNet50 was employed as the backbone network to lessen the possibility of gradient dispersion or explosion during feature extraction.In order to benefit from multi-scale feature extraction, they also integrated a Feature Pyramid Network (FPN), which increased the MS-FRCNN's ability to capture comprehensive feature data.They also included a brand-new attention module called PAM to help the Regional Proposal Network (RPN) focus more on the semantic and geographic details of small target forest fires and decrease the distraction from complex backgrounds.The model also substituted the soft-NMS algorithm for the traditional NMS technique in order to reduce errors in identified frames.They conducted trials using their carefully curated multi-scale forest fire dataset, and the findings revealed a substantial 5.7% increase in detection ACC above baseline models.This shows how the multi-scale feature extraction approach forest fires.
A technique for recognising forest fires was proposed by Rahman et al. [18] using a Convolutional Neural Network (CNN) architecture and freshly created fire detection dataset from another study.Their approach utilised separable convolution layers for rapid fire detection, making it suitable for real-time applications.After training on their dataset, the method showed a remarkable 97.63% ACC in identifying forest fires in photos, along with a 98.00% F1 and an 80% Kappa value.These results show the method's potential to be a helpful tool for early fire breakout identification, enabling authorities to act promptly and put preventive measures in place to minimise damage.
A system for early fire detection and classification was constructed by Avazov et al. [19] using the Internet of Things (IoT) and YOLOv5.They use IoT devices in their investigation to verify if fires that YOLOv5 claimed to have seen may have been fabricated or unreported.The successful findings shown that IoT may be used to monitor and verify fire incidents in real-time.This approach may greatly improve its capacity to reduce forest fires.A system architecture for autonomous forest fire detection utilising DL image processing methods was suggested in a paper by Ye et al. [20], and it was especially created for tiny UAV applications.The optimisation process included a number of phases, including switching to ShuffleNetV2 as the backbone network, pruning the network, sparse training, tuning, and hardware acceleration.According to experimental findings, their forest fire detection system increased inference speed by 50%, decreased CPU utilisation and temperature by 35% and 25%, and consumed 10% less power while retaining an ACC of 92.5%.It's noteworthy that the model's ACC remained steady despite alterations in the bird's-eye view angle.
As an alternative to more traditional models like Fast R-CNN and Faster R-CNN, Al-Smadi et al. [21] investigated the efficacy of a framework intended to reduce the sensitivity of a number of YOLO detection methods.On a multi-oriented dataset for recognising forest smoke, they employed YOLOv5x to increase their model's mean average Pr (mAP) ACC from earlier gold-standard techniques to an astounding 96.8%.Additionally, YOLOv7 outperformed YOLOv3 with a 95% mAP ACC.These findings supported the method's outstanding ability to find forest fires in spite of challenging environmental conditions.
Talaat and ZainEldin [22] presented the discovery system (SFDS), based on the YOLOv8 algorithm, as an enhanced fire detection method for smart cities.This system employed DL to distinguish fire-specific traits in real-time, potentially improving fire detection ACC, reducing false alarms, and offering a more cost-effective alternative to traditional methods.The application, fog, cloud, and IoT layers of the recommended architecture employed cloud and fog computing to acquire and analyse data in real-time.The SFDS achieved a high success rate of 97.1% for all classes and is useful for various applications, such as fire safety management and intelligent security systems in smart cities.
Although these existing models for detecting forest fires have showed potential, they still have issues with Pr, dependability, and flexibility.Our model offers numerous significant advances in an effort to reduce the harmful impacts of forest fires and enhance early detection.

PROPOSED METHODOLOGY
We will discuss the compatibility of the proposed model with existing forest fire prevention and control systems, ensuring that it can seamlessly integrate into the current infrastructure without requiring significant modifications.Compatibility considerations may include data formats, communication protocols, and system architectures.
We will highlight the model's potential to serve as a realtime decision support tool for forest fire prevention and control.By continuously analyzing incoming data from various sources such as remote sensors, weather positions, and satellite imagery, the model can provide early warnings, identify highrisk areas, and assist in resource allocation and deployment strategies.Figure 1 shows the flow of the suggested model.

Dataset description
As exposed in Figure 2, the dataset utilised in this study consists of 3000 images of forest fires that were captured by drones and video surveillance equipment in various forest environments.It also includes additional forest fire datasets discovered using web crawling techniques and publicly available forest fire datasets [23].This collection's 1000 images all have hand annotations.Then, these 1000 annotated images were divided into two subsets: 300 served as a specialist test set to assess the model's ACC, and 700 were set aside for training purposes to construct a prototype forest fire detection model.2000 images of unlabelled forest fires were also included in the dataset and utilised in the training process.

Preprocessing
The dataset provides a wide range of photos taken from different perspectives, enabling the algorithm to identify between forest fire and non-fire events with greater ACC.With the help of this information, the model is equipped to recognise forest fires based on two key criteria: the existence of fire flames and the presence of fire flames mixed with smoke.Up to this point, our main attention has been on the exacting standards used to divide up the dataset into an equal number of images with fire (1) and those without fire (0) [24]: Fire (1): Images of forests and mountain ranges that are enveloped in flames and/or smoke brought on by fires.No-Fire (0): Consisting of a wide variety of pictures showing forest and mountain vistas devoid of fire.This categorization method was developed to make it easier to train models with a variety of images while avoiding misunderstanding with situations that could seem similar, such mountain sunsets.
The goal of this meticulous dataset refining method was to improve overall model performance and streamline the model training procedure.As they were the most contextually relevant to our study objective at this dataset curation phase, we particularly cropped pictures of fires in mountainous or forest settings.After then, every image was scaled consistently such that it had the same size, 250×250 pixels.These preprocessing methods were crucial in helping the model successfully include important information about forest fires.Figure 3 shows visual representations of both the fire and nofire categories within forest fire dataset.

LSTM feature extraction
After the preprocessing phase, the input characteristics are then passed to the LSTM module, a crucial component of our methodology [25].Due to the huge quantity of data gathered from the dataset, a typical RNN would not be sufficient for our purposes.Gradient disappearance and explosion issues are addressed via a customised RNN iteration known as LSTM.During the training phase, RNN creates temporal connections between prior states and the inputs to provide predictions.RNN, on the other hand, finds it challenging to maintain the past because to its limited memory capacity, especially when dealing with massive volumes of time series data.However, LSTM excels at classifying enormous time series datasets and locating temporal correlations.Its use covers several sectors and yields outstanding results for tasks like speech recognition and image classification.
The LSTM architecture seen in Figure 4 has special memory cells designed to make use of prior knowledge and maintain key characteristics from massive volumes of time series data.These memory cells may store and apply the information that was learnt, allowing the model to process and classify input effectively.

Figure 4. Architecture of LSTM
The output gate, also known as the   gate, the forget gate, also known as the ft gate, and the input gate all play distinct roles in regulating information flow in the LSTM architecture.
The major responsibility of the forget gate is to decide what data to preserve and what to discard within the cell state.It does this by conducting a pointwise multiplication operation using the inputs   , the current input, and ℎ −1 , the previous hidden state information.By using the sigmoid activation function, the forget gate generates an output that is either 0 or 1. Keeping important information in the cell state is indicated by a value of 1, whilst removing unimportant information is indicated by a value of 0. The forget gate, input gate, and output gate's core characteristics are explained in literature [26] by Eq. (1) to Eq. (6).
The forget gate, or   , is subjected to a bias called   , and the process weight   stands for that.The forget gate is activated using the activation function, represented by the letter "s," to allow choice.Next, a critical decision on whether data should be kept in the cell state,   , must be made by the input gate.It considers both the input,   , and the preceding hidden state, ℎ −1 to arrive at this conclusion.Eq. ( 2) and Eq.
(3) describe the pointwise multiplication of the forget gate,   , and hyperbolic tangent (tanh) activation functions in this decision-making process: While   and   stand for the biases of the neural network,   and   stand for the weights connected to the input gate (  ) and output gate (   ).The information about the previous concealed cell state is denoted by the word   .Eq. ( 4) shows how we combine Eq. ( 2) and Eq. ( 3) to conduct a pointwise addition operation in order to update the current cell state,   ′ : The output gate is calculated in Eq. ( 5).The current input, represented as   , and the prior hidden state, ℎ −1 , which incorporates the activation function s, are both used in this gate.In order to further hone the output network, a bias term called  0 is included.
A pointwise multiplication operation is used to combine the information from the cell state,   , with the updated output gate, indicated as   .The hidden state that follows, ℎ  , is produced by this procedure and is represented in Eq. (6).
The LSTM model may be improved significantly by using optimal parameters.The efficacy of feature extraction is substantially influenced by these characteristics.With the exception of the completely linked dense layer, 50 more neurons have been added to each layer to aid in better training.Furthermore, a 20% dropout rate has been used to allay any overfitting concerns.

Weight optimization in LSTM using MFFA
Firefly algorithm.In situations where it is necessary to optimise not one, but several competing objectives at once, WWPA can handle these difficulties.By efficiently exploring the trade-off between different objectives, WWPA can discover Pareto-optimal solutions that represent the best compromise between competing goals.
Efficient exploration and exploitation.WWPA balances exploration (searching diverse regions of the hyperparameter space) and exploitation (exploiting promising 0refine solutions) effectively.This balanced exploration-exploitation trade-off enables WWPA to quickly converge to high-quality solutions while avoiding premature convergence to suboptimal regions.

Figure 5. Flowchart of modified firefly algorithm
The Firefly Algorithm, a metaheuristic method that was motivated by the flashing behaviour of fireflies.This tactic is based on the idea that different fireflies have different levels of attraction and that this impacts how they mate [27,28].
The modified firefly algorithm [29][30][31] improves on the original Firefly Algorithm by reducing its inherent volatility and improving firefly movement.Figure 5 depicts the Modified Firefly Algorithm's flowchart and the sequential steps it goes through.The Modified Firefly Algorithm's randomization parameter  represents the start and finish values for each iteration as  0 and  ∞ , respectively.Higher values of this strategy  lead to better convergence while attempting to strike a balance between the capabilities of exploitation and exploration.The ith lightning bug motion and the distance function   are described in Eq. ( 7) and Eq. ( 8), respectively.
, = (  −   ) 2 + (  −   ) 2  (7) where,  =  − 1 ∕ 2,  =   .When a local best solution is not available close by, the ith firefly is drawn to the best choice.The MFFA reduces the possibility of becoming stranded in local optima by carefully limiting unpredictability.Fireflies may progress towards the global optimum thanks to the rapid convergence brought on by this well controlled randomness reduction.Please refer to Figure 5 for a flowchart illustrating the Modified Firefly Algorithm's steps and sequence.

VAEGAN classification
In this work, the classification of data related to forest fires is done using VAEGAN.during GAN (Generative Adversarial Network) excels at creating samples precisely, it often exhibits instability during learning.Contrarily, VAE (Variational Autoencoder) generates a variety of samples while retaining a respectable amount of stability during the course of learning.The VAEGAN framework may take use of the advantages that each of these generative models have to offer by combining them.VAEGAN is able to deliver samples that are both highly fidelity and variety while keeping their stability while learning is taking place.Typically, an encoder plus a decoder makes up a VAE [32].The input data must be transformed into a latent vector by the encoder, and the decoder must estimate the input from the latent vector.The mathematical representations of the encoder and decoder processes are shown in Eq. ( 9) and Eq. ( 10), respectively.
̂~() =   (|) (10) In this situation, the input, latent vector, and estimated input are each represented by x, z, and  ̂.The encoder and decoder models are affected by the parameters  and .The genuine posterior   (|) is approximated by the term   (|).The reconstruction error and a previous regularisation term, which are added together, make up the two halves of the loss function connected to VAE.
̂~() =   (|) where,   and   () represent the prior distribution of z and the Kullback-Leibler divergence.A GAN model's generator and discriminator are its usual components [32].Probability  and probability 1− are assigned by the discriminator, whereas the generator translates the latent vector to data space.The primary goal of a GAN is to discover a discriminator that can differentiate between generated and real data, while also adjusting the generator to fit the distribution of real data.As a function of both the discriminator and generator, the binary cross entropy represents the loss function of a GAN.
= log (()) + log (1 − (())) In this instance, "w" is a random variable represented by the probability density function p(w), and "u" represents a genuine sample.While it's true that GANs may produce synthetic data without using density functions in certain circumstances, such as when dealing with unbalanced data, there are other situations when getting fresh samples from the generator with specified distributions might be advantageous.In order to do this, a generative model is built by using the VAE's decoder component as the GAN's generator.A visual illustration of the VAE-GAN model's structure is shown in Figure 6.
The following is a representation of the loss function of the VAE-GAN [32]. ) where,   () denotes a Gaussian observation model with an identical covariance and   ( ̃) as the mean.

Hyper parameter tuning using WWPA
The WWPA is described in this section.Here, we explore the motivation behind the method and provide a thorough mathematical explanation of how it works.Inspiration of WWPA.Aldrovanda vesiculosa, the alternative name for the waterwheel (WW) plant, has broad petioles that contain its unusual traps, which are barely 1/12 inches in size and resemble tiny transparent flytraps [33].The interactions with other aquatic plants won't cause these traps to deteriorate or unintentionally activate because of their skilled design.They are protected by a ring of bristles that resemble hair.These traps include a variety of hook-like teeth along their edges that interlock when the trap catches its victim, much like the teeth seen in a typical flytrap.The Aldrovanda trap has more than 40 elongated trigger hairs, compared to the normal 6-8 trigger hairs on a Venus flytrap.When one or more triggers are pulled, these trigger hairs allow the trap to shut.These carnivorous plants have trigger hairs as well as glands that emit acid to help in the digesting of their caught prey.The sealant and the plant's interlocking teeth trap the prey.By leading the prey towards the hinge at the base of the trap, the seal successfully catches the prey.The body fluids of the prey are extensively broken down by the plant's digestive secretions, and any leftover material is excreted.Similar to a flytrap, an Aldrovanda trap can hold and digest two to before filling to capacity.The infrastructure of the waterwheel plant is shown in Figure 7.
The WWPA mathematical model.This section describes how WWPA is set up before going into detail about how the WW's location is updated throughout both the exploration and exploitation phases using a classical based on the actual behaviour of WWs.
Initialization.The population-based approach of the WWPA tries to locate the ideal solution by using the collective search capabilities of its population members within the solution space.Each of the WWs that make up the population of this algorithm represents a potential resolution to the problem and has a specific set of problem-related variables.Vectors may be used to formally represent these responses.The whole population of the WWPA, which consists of all WWs as given in Eq. ( 18), may be represented by a matrix.At the beginning of WWPA, the positions of these WWs inside the solution space are initialised at random using Eq. ( 19).
where, N stands for the quantity of WWs, and m stands for the quantity of variables.The limits for the j-th issue variable are represented by   and   , while the variable  ,.has random values between [0, 1].The population matrix of WW locations is designated as P, where   is the j-th WW, which corresponds to a problem variable, and  , denotes its i-th dimension.The target function may be calculated for each WW as they each stand in for a possible answer to the issue.Studies have proven that a vector may be used to properly represent the variables that make up the objective function in Eq. (20).
where, F denotes the vector containing all of the values for the objective functions, and   is the i-th WW.Most of the time, objective are used to choose the best keys.The best candidate solution, indicated by the highest objective function value, and the worst candidate solution, indicated by the lowest value, are thus the most crucial metrics.Given that WWs pass across the search region at varying speeds throughout each iteration, the optimal solution could evolve over time.
Stage 1: Recognising positions and hunting for insects WWs flourish as skilled hunters capable of locating pests because to their exceptional sense of smell.A WW attacks as soon as it notices an insect nearby, focusing on the bug's particular location and starting a chase to trap it.For the first stage of its populace update procedure, the WWPA simulates the behaviour of a WW.The WWPA improves its exploration skills by modelling the hunting behaviours of WWs, allowing it to find ideal places while avoiding being caught in local optima.This is accomplished by simulating the large motions of the WW as it approaches the insect within the solution space.This simulation of the WW's approach to the insect is integrated using an Eq. ( 21) as shown below, to predict the WW's new location.If moving the WW to the newly determined position increases the charge of the target function, as shown in Eq. ( 21) and Eq. ( 22), the old position is abandoned in favour of the new one.
Alternately, the WW's location may be changed using the following Eq.( 23) if the results do not improve after three consecutive repetitions: In this Eq.( 23) the random variables  ⃗⃗ 1 and  ⃗⃗ 2 may, respectively, have values of 0 or 2 and 0 or 1.The vector  ⃗⃗⃗ represents the radius of each circle that the WW plant evaluates as possible regions, and K is a variable with values between 0 and 1.

Stage 2: Carrying the insect in the suitable tube (exploitation)
The behaviour of insects being transported to feeding tubes by waterwheels serves as the model for the second stage of population updates in WWPA.By simulating this behaviour, WWPA may improve its convergence towards answers that are very similar to those it has already collected.WWPA modifies the location of the WW inside the problem region by simulating the insect's journey to the proper tube for ingestion.To do this, each WW was first placed in a fresh, arbitrary position that represented a "favourable region for insect consumption," according to the WWPA designers.Eq. ( 24) and Eq.(25) show that if the objective function produces a better value at the new position, the WW is moved.
⃗ ( + 1) =  ⃗ () +  ⃗⃗⃗ After three iterations, if the response still doesn't indicate progress, the method incorporates a mutation process akin to the exploration stage.The algorithm undergoes certain alterations throughout this mutation phase to avoid being stuck in local minima.In this adaptive method, the current answer, indicated as  ⃗⃗⃗ at iteration t, is represented as (P) at iteration t, and the ideal solution is written as  ⃗⃗⃗  .A random mutable with values between [0, 2] is  3 .This strategy aids the algorithm's robustness and effective escape from local maxima.
The proposed WWPA's pseudocode The iterative process used by the WWPA has the following three phases.Once the first and second steps are complete, each WW is moved in the third and final stage.This adjustment, which results in the key adjustments of the best candidate solution, is based on a comparison of target function values.The WW positions are then adjusted in preparation for the next iteration.This repeated process is carried out till the algorithm achieves its conclusion.Implement WWPA as instructed, following the detailed instructions in Algorithm 1.Based on its iterative development, WWPA offers the most promising candidate solution after being completely deployed.Find the finest position 23: Set  =  + 1 24: end while 25: Return the greatest solution

RESULTS AND DISCUSSION
In the paper, we will deliver an inclusive discussion of the identified failure cases, highlighting common patterns or themes observed across different instances.We will also discuss the implications of these findings for the practical application of the proposed approach and provide insights into the model's limitations.
Recommendations for Improvement: Based on our analysis of model failure cases, we will offer recommendations for improving the performance of the classical.This may include suggestions for refining the model architecture, collecting additional data to address specific challenges, or incorporating additional preprocessing steps to enhance model robustness.

Experimental setup
On a processer with a Core i5 CPU, 8   , besides a 500 GB hard drive, the trials will be carried out.The programming language used for this project is Python 3.Both Anaconda and Jupyter Notebook are used in the backend infrastructure.Table 1 lists some of the benefits of Jupyter Notebook, including its capacity to function on internet servers.

Performance metrics
The output metrics shown in Eq. ( 28) to Eq. ( 32) include ACC, PR, RC, SPEC, and F-score.Based on these criteria, the following tables compares the predicted and actual results: where, TRP is the true positive value, FLP is the false positive value, TRN is the true negative value, TRN is the true negative value.

Classification analysis based on 70:30 ratio
The performance indicators for several models are effectively summarised based on 70:30 ration in

Classification validation based on LSTM
The following performance metrics for several classification models were seen in the study shown in Table 3 and Figures 11-15.The study shown in Table 4 and Figures 11-15 compares the effectiveness of several classification algorithms using multiple metrics.With a PR of 83.7%, RC of 87.6%, F1 of 87.8%, and SPEC of 87.9%, the DBN model achieves an ACC of 86.6%.AE gets slightly better ACC, PR, RC, F1, and SPEC values of 88.6%, 85.2%, 89.5%, and 89.5%, respectively.With a high ACC of 93.9%, PR of 92.2%, RC of 94.6%, F1 of 95.2%, and SPEC of 94.4%, the VAE model, on the other hand, shows outstanding results.With an ACC of 97.8%, PR of 97.7%, RC of 96.26%, F1 of 97.3%, and Spec of 97.5%, the suggested model surpasses them all.According to these results, the suggested model performs very well in a number of areas related to forest fire detection, making it an ideal choice for this task.In LSTM feature extraction networks, weight optimisation is carried by using the MFFA.It results in improved convergence along with potentially higher ACC.The recommended model, however, exceeds them all, obtaining a remarkable accuracy of 97.8%.The geographic resolution of the photos in the collection for detecting forest fires will be improved in further studies.A CNN-based image segmentation method will also be suggested in order to further alarms for the issue of forest detection.By addressing these future research directions and overcoming technical challenges, the "Detection of Forest Fire using Modified LSTM based Feature Extraction with Waterwheel Plant Optimization Algorithm based VAE-GAN model" can continue to advance and contribute to the development of more effective and reliable forest fire detection systems, ultimately aiding in the preservation of natural ecosystems and protection of public safety.

Figure 2 .
Figure 2. Schematic diagram of forest fire data set

Figure 3 .
Figure 3. Images from (a) and (b) Fire class and (c) and (d) No-Fire class

Figure 7 .
Figure 7. Image of the WW plant.(a) A side view of a shot that is free-floating and loaded with traps.(b) Frontal view with both open and shut traps.(c) Just one open trap.(d) An open trap schematic illustration

Table 1 .
Specifications table

Table 2 and
Here the existing models such as Deep Belief Network (DBN), Auto encoder (AE) and Variational auto encoder (VAE) are tested with the proposed model to compare the results of performance metrics.

Table 2 .
Comparison of 70:30 ratio classification models AE, and VAE.VAEGAN combines adversarial training, allowing it to better capture complicated data distributions and provide more realistic examples, in contrast to DBN, AE, and VAE, which only concentrate on encoding and decoding data.With the assistance of this adversarial component, VAEGAN is able to develop more substantial and accurate latent representations, improving its capacity to reconstruct data and create fresh samples with more accuracy and variety.Because VAEGAN data creation and reconstruction, it is a more effective and adaptable solution for a variety of work.

Table 5 .
Learning rate