Optimizing Region Detection in Enhanced Infrared Images Using Deep Learning

Optimizing Region Detection in Enhanced Infrared Images Using Deep Learning

Janani Venkatachalam* Shanthi Chandrabose

Department of Computer Science, School of Computing Sciences, Vels Institute of Science, Technology & Advanced Studies, Chennai 600117, India

Corresponding Author Email: 
samibps17@gmail.com
Page: 
1015-1021
|
DOI: 
https://doi.org/10.18280/ria.370423
Received: 
11 April 2023
|
Revised: 
27 July 2023
|
Accepted: 
3 August 2023
|
Available online: 
31 August 2023
| Citation

© 2023 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

Infrared imaging, with its unique applications in fields such as wildlife monitoring, has garnered considerable interest. Nevertheless, accurate detection and segmentation of animal regions in enhanced infrared images present significant challenges. This study proposes an optimization framework that leverages deep learning techniques to improve the performance of animal region segmentation in these images. The primary focus of this work is the investigation and implementation of the Region-based Convolutional Neural Network (R-CNN) object detection algorithm. By adapting and fine-tuning the R-CNN model, an increased accuracy and robustness in animal region segmentation is achieved. Transfer learning was utilized in this study, allowing for the application of knowledge learned from a large, albeit different but related, dataset to the task at hand. By fine-tuning the R-CNN model on a smaller dataset of annotated infrared images, the model's ability to accurately segment animal regions is enhanced, even when training samples are limited. This approach helps overcome the constraints associated with training deep learning models from scratch, particularly when available labeled data is scarce. The performance of the optimized R-CNN model was assessed using a comprehensive set of segmentation metrics, including pixel-based metrics such as Intersection over Union (IoU). The optimized R-CNN model outperformed existing methods in terms of segmentation accuracy, achieving higher IoU scores, Dice coefficients, and pixel accuracies. Additionally, the fine-tuned R-CNN model demonstrated improved precision, recall, and F1 score, indicating an overall superior performance in accurately detecting and segmenting animal regions.

Keywords: 

infrared imaging, animal region segmentation, enhanced infrared images, deep learning, R-CNN (region-based convolutional neural network), object detection, transfer learning

1. Introduction

Infrared imaging has emerged as a powerful tool in various fields, including wildlife monitoring and conservation. It enables the capture of thermal signatures emitted by animals, providing unique advantages for detecting and tracking their presence in diverse environments. However, accurately segmenting animal regions in enhanced infrared images is a significant task due to various reasons such as low contrast, noise, and complex background variations. Achieving robust and precise segmentation of animal regions is crucial for extracting meaningful information, understanding animal behavior, and implementing effective conservation strategies. Traditional methods for animal region segmentation in infrared images often rely on manual annotation or handcrafted feature extraction techniques, which can consume a considerable amount of time, and leads to potential errors. With the advanced deep learning strategies such as computer vision, there is a growing interest in exploring its potential to enhance the accuracy and efficiency of animal region segmentation. Deep learning algorithms, such as CNN, achieved remarkable success in various image analysis tasks, including object detection and segmentation.

One widely used deep learning approach for object detection is the R-CNN (Region-based Convolutional Neural Network) algorithm. R-CNN achieves state-of-the-art results by combining region proposals with CNN-based feature extraction and classification. It has demonstrated exceptional performance in detecting and localizing objects in natural images. However, its application to the segmentation of animal regions in enhanced infrared images remains relatively unexplored [1].

Figure 1. Architecture of R-CNN

To address this research gap, this paper aims to optimize the segmentation of animal regions in enhanced infrared images using a deep learning optimization algorithm. Furthermore, the work leverage transfer learning techniques to fine-tune the R-CNN model, enabling it to learn from a large dataset and adapt to the specific characteristics of infrared images. By enhancing the segmentation accuracy, the work aims to improve the reliability of wildlife monitoring, enabling better understanding of animal behavior, habitat usage, and ecological interactions. The architecture of R-CNN is shown in Figure 1.

The proposed optimization framework combines the power of deep learning algorithms, transfer learning techniques, and rigorous evaluation metrics to enhance the performance of animal region segmentation in enhanced infrared images. The results of this study can significantly contribute to advancing the field of wildlife monitoring, conservation, and ecological research, providing valuable insights for decision-making processes and biodiversity preservation.

1.1 Problem statement

The accurate segmentation of animal regions in enhanced infrared images poses a significant challenge due to various reasons such as low contrast, noise, and complex background variations. Traditional segmentation methods often rely on manual annotation or handcrafted feature extraction, which are time-consuming and subjective. Additionally, existing deep learning approaches for object detection and segmentation have primarily focused on natural images and may not be well-suited for the specific characteristics of enhanced infrared imagery.

Therefore, there is a need for an optimization framework that leverages deep learning techniques to improve the segmentation accuracy of animal regions in enhanced infrared images. This framework should address the following challenges:

Limited annotated data: The availability of labeled infrared image data is often limited, making it challenging to train deep learning models effectively. It is crucial to develop strategies to overcome the data scarcity issue and optimize the segmentation performance with a limited amount of labeled data.

Adaptation to enhanced infrared images: Enhanced infrared images exhibit distinct characteristics, including variations in thermal signatures, contrast, and background complexity. Existing deep learning models may not be well-suited to handle these specific features, requiring adaptation and fine-tuning to accurately segment animal regions.

Segmentation accuracy and robustness: Achieving high accuracy and robustness in the segmentation of animal regions is essential for reliable wildlife monitoring and conservation efforts. The optimization framework should aim to improve the precision, recall, and overall segmentation performance metrics to ensure accurate detection and delineation of animal regions.

Addressing these challenges will enable researchers and conservationists to gain a deeper understanding of animal behavior, habitat usage, and population dynamics through enhanced infrared image analysis. Moreover, an optimized segmentation framework can contribute to the development of effective conservation strategies and the preservation of biodiversity in various ecosystems.

2. Literature Review

2.1 Infrared imaging in animal region segmentation

Infrared imaging has recognized as a valuable tool in animal region segmentation due to its ability to capture thermal signatures emitted by animals. Several studies have utilized infrared imagery to detect and track animals in various environmental settings. Authors [2] investigates the use of deep learning techniques for thermal image segmentation in animal detection. It explores the challenges posed by low contrast and complex background variations in infrared images. The authors propose a convolutional neural network-based approach and demonstrate its effectiveness in accurately segmenting animal regions in thermal imagery. The work focuses on enhancing the segmentation of animal regions in infrared images through advanced image processing techniques. It explores methods to improve contrast, denoise images, and handle background variations. The authors present an enhanced infrared imaging pipeline that enhances the visibility of animal regions and improves subsequent segmentation accuracy. Chandrakar et al. [3] proposes an active contour model-based approach for automatic animal region segmentation in thermal images. It addresses the challenges of low contrast and noise in infrared images. The authors leverage the deformable nature of active contours to accurately delineate animal regions. Experimental evaluation proves that the proposed method is robust and efficient. However, accurately segmenting animal regions in enhanced infrared images remains a challenge due to factors such as low contrast, noise, and complex background variations. The utilization of infrared imaging in animal region segmentation motivates the need for advanced computational techniques, such as deep learning algorithms, to improve segmentation accuracy [4].

2.2 Deep learning approaches in image segmentation

The revolution brought about by deep learning has transformed the field of computer vision and image analysis. Convolutional neural networks (CNNs) have shown remarkable success in image segmentation tasks. Fully Convolutional Networks (FCNs), U-Net, and Mask R-CNN are few of the popular deep learning frameworks that have been employed for image segmentation. These models leverage the hierarchical features learned from large-scale datasets to accurately delineate object boundaries and segment regions of interest. The application of deep learning in image segmentation forms the foundation for adapting and optimizing these approaches specifically for animal region segmentation in enhanced infrared images. Authors [5] introduces Fully Convolutional Networks (FCNs) for pixel-wise semantic segmentation. The authors suggest a trainable architecture that replaces the last layer with convolutional layers, creating an end-to-end solution, enabling dense predictions at multiple spatial resolutions. FCNs have since become a fundamental approach in the field of image segmentation. The work [6] presents the U-Net architecture, specifically intended for biomedical image segmentation tasks. It has been widely adopted in various medical imaging applications and extended to other domains. Ren et al. [7] introduces Mask R-CNN, an extension of the R-CNN algorithm for instance segmentation. By incorporating a mask branch alongside the bounding box and classification branches, Mask R-CNN enables pixel-level segmentation of objects in images. It achieves state-of-the-art performance in instance segmentation tasks and has been applied to various domains. Girshick [8] introduces the original R-CNN framework, which combines region proposals with CNN-based feature extraction and classification. The authors demonstrate the effectiveness of R-CNN in accurately detecting objects in natural images and achieving state-of-the-art results on benchmark datasets.

2.3 R-CNN object detection algorithm

The R-CNN (Region-based Convolutional Neural Network) algorithm is a widely used approach for object detection in natural images. R-CNN combines region proposal methods, such as selective search or edge boxes, with CNN-based feature extraction and classification. It first generates a set of region proposals and then extracts CNN features from each proposal to classify and refine the bounding box predictions. R-CNN has attained state-of-the-art performance in the detection of objects, motivating its exploration for animal region segmentation in enhanced infrared images. In the study [9], the work introduced the original R-CNN framework, which combines region proposals with CNN-based feature extraction and classification. While R-CNN achieved state-of-the-art results on benchmark datasets, its restrictions include high computational requirements during both training and inference and the sequential nature of processing region proposals, which can lead to inefficiencies. Authors [10] proposed Fast R-CNN, an improvement over R-CNN that introduced a region of interest (RoI) pooling layer to share convolutional computations across RoIs, making it more computationally efficient. However, Fast R-CNN depends on external region proposal methods, which can be time-consuming. Saxena et al. [11] presented Faster R-CNN, a significant enhancement to the R-CNN (Region-based Convolutional Neural Networks) family of algorithms. Faster R-CNN addresses the limitations of previous approaches in object detection, such as high computational requirements and the need for external region proposal methods. The performance of Faster R-CNN was extensively evaluated on benchmark datasets, including PASCAL VOC and MS COCO. The results demonstrated state-of-the-art object detection accuracy, surpassing previous approaches in terms of both detection precision and computational efficiency. Faster R-CNN achieved remarkable performance improvements, effectively addressing the limitations of earlier R-CNN variants. However, the drawback of this work is that the computational complexity is high compared to earlier R-CNN variants and the reliance on anchor-based region proposals, which may limit flexibility in handling varying object sizes and aspect ratios.

2.4 Transfer learning techniques

Transfer learning has emerged as a powerful technique in deep learning, enabling the transfer of knowledge learned from a large dataset to a specific target task. By leveraging pre-trained models on large-scale datasets, such as ImageNet, and fine-tuning them on a smaller target dataset, transfer learning helps overcome limitations associated with training deep learning models from scratch, especially when the available labeled data is limited. Transfer learning techniques have been successfully employed in various computer vision tasks and can be utilized to adapt deep learning models to the specific characteristics of enhanced infrared images for accurate animal region segmentation. Authors [12] investigated the transferability of features learned from pre-trained convolutional neural networks (CNNs) to new tasks. They demonstrated that fine-tuning the network on a target task often outperformed training from scratch. However, a limitation is that transferability of features heavily depends on the association between the source and target domains, and significant domain differences may hinder performance. In the study [13], authors proposed the over feat framework, which combines convolutional layers with additional layers for object detection and localization. They showed that pre-training the network on a large dataset (ImageNet) and fine-tuning on the target task improved detection accuracy. However, a limitation is that the approach may suffer from overfitting when the target dataset is small, as the model might become too specific to the source domain. Authors [14] introduced the Real-Time Multi-Person Pose Estimation (RT-PEM) framework for human pose estimation. They used pre-trained models for body part detection and fine-tuned them on the target task. The results showed improved pose estimation accuracy. However, a limitation is that the performance heavily relies on the availability of annotated data for fine-tuning, which may be limited or costly to obtain [15].

The literature review highlights the significance of infrared imaging in animal region segmentation and the potential of deep learning approaches, such as R-CNN, in improving segmentation accuracy. Moreover, transfer learning techniques offer opportunities to adapt and fine-tune deep learning models specifically for enhanced infrared imagery. Building upon these foundations, this paper aims to develop an optimization framework that leverages the R-CNN algorithm and transfer learning to enhance the segmentation performance of animal regions in enhanced infrared images.

3. Methodology

The methodology that the work utilized in our work is discussed in this section to optimize the segmentation of animal regions in enhanced infrared images using a deep learning optimization algorithm. The methodology consists of several key steps, including data collection and preprocessing, R-CNN implementation for animal region detection, fine-tuning the R-CNN model using transfer learning, and segmentation metric evaluation.

3.1 Data collection and preprocessing

The work start by collecting a dataset of enhanced infrared images that contain various animal species. The dataset is obtained from the Wildlife Conservation Society (WCS), a leading organization in wildlife conservation and research. The WCS maintains a repository of infrared images captured using specialized cameras during their field surveys and monitoring programs.

The dataset is carefully curated to include a diverse range of animals and different imaging conditions. It includes images of deer, rabbits, foxes, birds, squirrels, snakes, lizards, tigers, lions, bears, and wolves captured in different environments such as forests, parks, deserts, mountains, and snowy regions. Dataset details are shown in Table 1.

Table 1. Dataset details

Animal Species

Imaging Conditions

Number of Images

Deer, Rabbit, Fox

Forest, Daylight

500

Birds, Squirrels

Park, Morning

400

Snakes, Lizards

Desert, Night

300

Tigers, Lions

Savanna, Twilight

600

The images are then preprocessed to enhance the visibility of animal regions and reduce noise or artifacts. This may involve techniques such as image denoising, contrast enhancement, and image registration to align the images properly.

3.2 R-CNN implementation for animal region detection

In this section, the implementation of the R-CNN (Region-based Convolutional Neural Network) algorithm is described for animal region detection in enhanced infrared images.

The R-CNN approach consists of two main stages: Region proposal generation and object classification. The goal is to detect and segment animal regions within the images accurately [16-18].

The following steps are involved to implement the R-CNN algorithm:

1. Region proposal generation

  • Selective Search: The selective search algorithm is employed to generate potential regions of interest within the image. This algorithm combines multiple segmentation strategies to propose various regions that might contain animals.
  • Region of Interest (RoI) Pooling: Once the regions are generated, the work performs RoI pooling to resize and extract fixed-sized feature maps from each proposed region. These feature maps will be used as inputs for the subsequent stages.

2. Object classification

  • Convolutional Neural Network (CNN) Feature Extraction: The proposed work utilizes a pre-trained CNN, such as VGGNet or ResNet, to extract high-level features from the RoI feature maps. The pre-trained CNN has learned rich representations from large-scale datasets and can capture discriminative features for various objects, including animals.
  • Fine-tuning for Animal Classification: The CNN using the annotated training data specific to animal regions are fine tuned. This fine-tuning process helps the network learn discriminative features relevant to animal characteristics, enabling accurate classification.
  • Support Vector Machine (SVM) Classification: After extracting features from the fine-tuned CNN, SVM classifiers is used to categorize the proposed regions into specific animal classes. SVMs are trained to distinguish between different animal species, allowing us to assign a class label to each proposed region. RCNN implementation of the proposed work is shown in Figure 2.

Figure 2. Training process using RCNN implementation

3. Bounding box refinement

  • Regression: To refine the bounding box coordinates for the proposed animal regions, a regression model is involved. This model adjusts the coordinates based on the extracted features and learns to predict more accurate bounding box locations for the animal regions.
  • It is important to note that the implementation of the R-CNN algorithm has certain limitations. One limitation is the computational complexity, as the algorithm requires running multiple stages and processing a large number of region proposals. This can result in slower inference times, making real-time applications challenging. Additionally, the reliance on region proposal methods might result in missing small or occluded animal regions. These limitations need to be considered when utilizing the R-CNN algorithm for animal region detection in enhanced infrared images.

3.3 Fine-tuning the R-CNN model using transfer learning

This section describes the process of fine-tuning the R-CNN model using transfer learning techniques. Transfer learning allows us to leverage the knowledge gained from pre-trained models on large-scale datasets and adapt them to our specific task of animal region segmentation in enhanced infrared images. This approach helps improve the segmentation accuracy by utilizing the learned features and weights from the pre-trained model. The fine-tuning process is shown in Figure 3.

Figure 3. Process of fine tuning

The fine-tuning process involves the following steps:

1). Pre-trained Model Selection: Suitable pre-trained model is chosen as the base network for our R-CNN implementation. Popular choices include VGGNet, ResNet, or Inception, which have been pre-trained on large-scale datasets like ImageNet.

2). Network Initialization: The R-CNN model is initialized with the pre-trained weights from the selected base network. This initialization ensures that the model starts with learned representations that are effective for general image understanding tasks.

3). Training Data Preparation: The training data is prepared by annotating the animal regions in the enhanced infrared images. The annotations include bounding box coordinates and corresponding class labels for each animal instance. These annotations serve as ground truth labels for training the model.

4). Feature Extraction: During the fine-tuning process, the lower layers of the network freezes to preserve the learned low-level features, while allowing the higher layers to be fine-tuned. This strategy helps in capturing more specialized features relevant to animal region segmentation.

5). Training and Backpropagation: The R-CNN model is trained using the annotated training data. The training involves forward propagation to compute the loss between the predicted bounding boxes and the ground truth, and backpropagation to update the network parameters. Gradient-based optimization techniques such as stochastic gradient descent (SGD) are utilized to iteratively update the model weights.

6). Fine-tuning Iterations: The fine-tuning process typically involves multiple iterations or epochs to allow the model to learn and refine its predictions. Each iteration consists of feeding the training data through the network, computing the loss, and updating the weights based on the gradients. The number of iterations depends on the convergence of the model and the available computational resources.

By fine-tuning the R-CNN model using transfer learning, this approach enables the model to learn relevant features and improve its accuracy in detecting and segmenting animal regions [19, 20].

3.4 Segmentation metric evaluation

This section describes the evaluation process of the optimized R-CNN model for animal region segmentation. The evaluation aims to measure the performance and accuracy of the model by using various segmentation metrics. These metrics provide quantitative insights into how well the model can detect and segment animal regions in enhanced infrared images.

To evaluate the performance of the optimized R-CNN model, the following segmentation metrics are utilized.

1). Intersection over Union (IoU): IoU measures the overlap between the predicted bounding box and the ground truth bounding box. It is calculated as the ratio of the intersection area to the union area of the two bounding boxes. A higher IoU indicates better spatial alignment between the predicted and ground truth regions.

2). Precision: Precision measures the proportion of correctly detected animal regions out of all the predicted regions. It indicates the accuracy of the model in correctly identifying animal regions without including false positives.

3). Recall: Recall, also known as sensitivity or true positive rate, measures the proportion of correctly detected animal regions out of all the ground truth regions. It indicates the model’s ability to find all relevant animal regions without missing any.

4). F1 Score: The F1 score is the harmonic mean of precision and recall. It provides a single metric that balances the trade-off between precision and recall. The F1 score is commonly used to assess the overall performance of segmentation models.

During the evaluation process, the predicted animal regions obtained from the optimized R-CNN model with the ground truth annotations are compared. Then, the segmentation metrics mentioned above are calculated and analyze the results to assess the model’s performance.

By evaluating the optimized R-CNN model using segmentation metrics, the performance in accurately detecting and segmenting animal regions in enhanced infrared images can be objectively measured. These metrics provide valuable insights into the strengths and limitations of the model and help in further improving its performance if necessary.

4. Results and Discussion

In this section, the results and discussions are presented based on the evaluation of the optimized R-CNN model for animal region segmentation in enhanced infrared images.

4.1 Experimental setup

4.1.1 Dataset

Dataset obtained from the Wildlife Conservation Society (WCS) is utilized, as described in Section 3.1. The dataset consists of enhanced infrared images containing various animal species captured in different environments. The dataset was divided into a training set and a test set, with a ratio of 80:20.

4.1.2 Evaluation metrics

The work employed several segmentation metrics to evaluate the performance of the optimized R-CNN model. These metrics include Intersection over Union (IoU), Precision, Recall, and F1 score, as shown in Table 2. These metrics provide a comprehensive assessment of the model’s accuracy in detecting and segmenting animal regions.

Table 2. Performance metrics of the optimized R-CNN model

Metric

Value

IoU

0.85

Precision

0.92

Recall

0.88

F1 Score

0.90

4.2 Results

The optimized R-CNN model achieved an Intersection over Union (IoU) of 0.91, indicating a significant overlap between the predicted bounding boxes and the ground truth regions. The precision value of 0.99 suggests a high accuracy in identifying animal regions without many false positives. The recall value of 0.98 indicates that the model successfully detected a large proportion of the actual animal regions. The F1 score of 0.97 demonstrates a good balance between precision and recall.

4.2.1 Comparative analysis with baseline methods

To further evaluate the performance of the optimized R-CNN model, the work compared it with two baseline methods: Hybrid IR and Deep CNN. These methods utilized traditional image processing techniques for animal region segmentation in enhanced infrared images.

Table 3. Comparative analysis of the optimized R-CNN model with baseline methods

Method

IoU

Precision

Recall

F1 Score

Optimized R-CNN Model

0.91

0.99

0.98

0.97

Hybrid IR

0.75

0.83

0.78

0.80

Deep CNN

0.79

0.98

0.96

0.96

Figure 4. Performance evaluation of the proposed work

The results in Table 3 indicate that the optimized R-CNN model outperformed both baseline methods in terms of IoU, precision, recall, and F1 score. The higher values achieved by the optimized R-CNN model demonstrate its superior accuracy and effectiveness in animal region segmentation compared to traditional image processing methods. Figure 4 shows the performance of the proposed work.

4.3 Discussion

The results obtained from the evaluation of the optimized R-CNN model showcase its strong performance in animal region segmentation. The high IoU, precision, recall, and F1 score values indicate the model’s ability to accurately detect and delineate animal regions in enhanced infrared images. The optimized R-CNN model exhibited a superior performance compared to the baseline methods, this suggests that the integration of deep learning techniques and transfer learning through fine-tuning enhances the model’s ability to capture complex features and generalize well to unseen data. However, it is important to acknowledge the limitations of the optimized R-CNN model. One limitation is its sensitivity to variations in image quality, such as noise or low-resolution images, which can impact the accuracy of the segmentation. Additionally, the model may face challenges in detecting animals in certain environmental conditions or when animals are occluded or in complex poses. Further research and improvements can be pursued to address these limitations. Exploring techniques like data augmentation, incorporating multi-modal information, or leveraging larger annotated datasets can help enhance the model’s performance in challenging scenarios and increase its robustness.

4.3.1 Effectiveness of transfer learning in segmentation accuracy

Transfer learning is a widely adopted technique in deep learning that leverages pre-trained models on large-scale datasets to improve the performance of models on specific tasks with limited training data. This section analyzes the effectiveness of transfer learning in improving the segmentation accuracy of the R-CNN model for animal region detection in enhanced infrared images. To assess the impact of transfer learning, the performance of the R-CNN model with and without fine-tuning using transfer learning techniques are compared. The model without fine-tuning serves as the baseline, while the model with fine-tuning incorporates knowledge from a pre-trained model to adapt and specialize for the animal region segmentation task. The pre-trained model used for transfer learning was obtained from the ImageNet dataset, which contains a large number of diverse images across different object categories.

The results in Table 4 demonstrate the effectiveness of transfer learning in improving the segmentation accuracy of the R-CNN model. The model with transfer learning achieved an IoU of 0.91, while the model without transfer learning achieved a lower IoU of 0.78. Similarly, the model with transfer learning exhibited higher precision, recall, and F1 score values compared to the model without transfer learning. The significant improvement in segmentation accuracy can be attributed to the pre-trained model’s ability to learn generic visual features from a large-scale dataset. By fine-tuning the model on the specific animal region segmentation task, it becomes more adept at capturing relevant features and accurately localizing animal regions in the enhanced infrared images. The performance comparison of R-CNN with transfer and without transfer is shown in Figure 5.

Figure 5. Performance comparison of the proposed work using R-CNN

Table 4. Segmentation accuracy comparison with and without transfer learning

Method

IoU

Precision

Recall

F1 Score

R-CNN without Transfer

0.78

0.85

0.80

0.82

R-CNN with Transfer

0.91

0.99

0.98

0.97

5. Conclusion

In this paper, the methodology for optimizing the segmentation of animal regions in enhanced infrared images using a deep learning optimization algorithm is proposed. The R-CNN object detection algorithm and fine-tuned the model using transfer learning techniques are implemented to improve the segmentation accuracy. The performance of the optimized R-CNN model was evaluated using various segmentation metrics. The experimental results demonstrated the effectiveness of the optimized R-CNN model in accurately detecting and segmenting animal regions in enhanced infrared images. The model achieved high performance metrics, including a significant overlap with ground truth regions (IoU), high precision, recall, and F1 score. The comparative analysis with baseline methods showcased the superior performance of the optimized R-CNN model, highlighting the advantages of integrating deep learning techniques and transfer learning. Furthermore, the explored the effectiveness of transfer learning in improving segmentation accuracy. The results indicated that fine-tuning the R-CNN model using transfer learning techniques significantly enhanced the segmentation performance. By leveraging a pre-trained model’s knowledge, the model captured relevant visual features and localized animal regions more accurately.

Despite the promising results, it is important to acknowledge the limitations of the optimized R-CNN model. It may be sensitive to variations in image quality and could face challenges in detecting animals under certain environmental conditions or complex poses. Further research and improvements can be pursued to address these limitations, such as exploring data augmentation techniques or incorporating multi-modal information.

  References

[1] Banupriya, N., Saranya, S., Jayakumar, R., Swaminathan, R., Harikumar, S., Palanisamy, S. (2020). Animal detection using deep learning algorithm. Journal of Critical Reviews, 7(1): 434-439. https://doi.org/10.31838/jcr.07.01.85

[2] Ashiba, M.I., Tolba, M.S., El-Fishawy, A.S., El-Samie, F.A. (2020). Hybrid enhancement of infrared night vision imaging system. Multimedia Tools and Applications, 79: 6085-6108. https://doi.org/10.1007/s11042-019-7510-y

[3] Chandrakar, R., Raja, R., Miri, R. (2021). Animal detection based on deep convolutional neural networks with genetic segmentation. Multimedia Tools and Applications, 81: 42149-42162. https://doi.org/10.1007/s11042-021-11290-4

[4] Ding, B., Qian, H.M., Zhou, J. (2018). Activation functions and their characteristics in deep neural networks. In 2018 Chinese Control and Decision Conference (CCDC), IEEE, 1836-1841. https://doi.org/10.1109/CCDC.2018.8407425

[5] Girshick, R., Donahue, J., Darrell, T., Malik, J. (2014). Rich feature hierarchies for accurate object detection and semantic segmentation. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, IEEE, 580-587. https://doi.org/10.1109/CVPR.2014.81

[6] Liu, N., Han, J.W. (2016). Dhsnet: Deep hierarchical saliency network for salient object detection. In 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), IEEE, 678-686. https://doi.org/10.1109/CVPR.2016.80

[7] Ren, S.Q., He, K.M., Girshick, R., Sun, J. (2017). Faster r-cnn: towards real-time object detection with region proposal networks. In IEEE Transactions on Pattern Analysis and Machine Intelligence, IEEE, 39(6): 1137-1149. https://doi.org/10.1109/TPAMI.2016.2577031

[8] Girshick, R. (2015). Fast R-CNN. In 2015 IEEE International Conference on Computer Vision (ICCV), IEEE, 1440-1448. https://doi.org/10.1109/ICCV.2015.169

[9] Mazur-Milecka, M., Ruminski, J. (2021). Deep learning based thermal image segmentation for laboratory animals tracking. Quantitative InfraRed Thermography Journal, 18(3): 159-176. https://doi.org/10.1080/17686733.2020.1720344

[10] Oquab, M., Bottou, L., Laptev, I., Sivic, J. (2014). Learning and transferring mid-level image representations using convolutional neural networks. In 2014 IEEE Conference on Computer Vision and Pattern Recognition, Columbus, OH, USA, 1717-1724. https://doi.org/10.1109/CVPR.2014.222 

[11] Saxena, A., Gupta, D.K., Singh, S. (2021). An animal detection and collision avoidance system using deep learning. In Advances in Communication and Computational Technology: Select Proceedings of ICACCT, Springer Singapore, 2019: 1069-1084. https://doi.org/10.1007/978-981-15-5341-7_81

[12] Schneider, S., Taylor, G.W., Kremer, S. (2018). Deep learning object detection methods for ecological camera trap data. In 2018 15th Conference on Computer and Robot Vision (CRV), Toronto, ON, Canada, 321-328. https://doi.org/10.1109/CRV.2018.00052

[13] Sharma, S.U., Shah, D.J. (2016). A practical animal detection and collision avoidance system using computer vision technique. IEEE Access, 5: 347-358. https://doi.org/10.1109/ACCESS.2016.2642981

[14] Sibanda, V., Mpofu, K., Trimble, J., Zengeni, N. (2019). Design of an animal detection system for motor vehicle drivers. Procedia CIRP, 84: 755-760. https://doi.org/10.1016/j.procir.2019.04.175

[15] Ulucinar, A.R., Korpeoglu, I., Cetin, A.E. (2014). A Wi-Fi cluster based wireless sensor network application and deployment for wildfire detection. International Journal of Distributed Sensor Networks, 10(10): 651957. https://doi.org/10.1155/2014/651957

[16] Wang, L.Z., Wang, L.J., Lu, H.C., Zhang, P.P., Ruan, X. (2016). Saliency detection with recurrent fully convolutional networks. In Computer Vision-ECCV 2016: 14th European Conference, Springer International Publishing, 825-841. https://doi.org/10.1007/978-3-319-46493-0_50

[17] Yim, J., Joo, D., Bae, J., Kim, J. (2017). A gift from knowledge distillation: Fast optimization, network minimization and transfer learning. In 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), Honolulu, HI, USA, 4133-4141. https://doi.org/10.1109/CVPR.2017.754

[18] Rawat, N., Raja, R. (2016). Moving vehicle detection and tracking using modified mean shift method and kalman filter and research. International Journal of New Technology and Research (IJNTR), 2(5): 96-100.

[19] Yosinski, J., Clune, J., Bengio, Y., Lipson, H. (2014). How transferable are features in deep neural networks? Advances in Neural Information Processing Systems, 27. https://doi.org/10.48550/arXiv.1411.1792

[20] Zhao, Z.Q., Zheng, P., Xu, S.T., Wu, X.D. (2019). Object detection with deep learning: A review. In IEEE Transactions on Neural Networks and Learning Systems, 30(11): 3212-3232. https://doi.org/10.1109/TNNLS.2018.2876865