Classification of Medical Thermograms Belonging Neonates by Using Segmentation, Feature Engineering and Machine Learning Algorithms

Classification of Medical Thermograms Belonging Neonates by Using Segmentation, Feature Engineering and Machine Learning Algorithms

Ahmet H. OrnekSaim Ervural Murat Ceylan Murat Konak Hanifi Soylu Duygu Savasci 

Department of Electrical and Electronics Engineering, Faculty of Engineering and Natural Sciences, Konya Technical University, Konya 42130, Turkey

Department of Electrical and Electronics Engineering, Faculty of Engineering, KTO Karatay University, Konya 42020, Turkey

Department of Pediatrics, Faculty of Medicine, Selcuk University, Konya 42130, Turkey

Corresponding Author Email:
18 May 2020
5 August 2020
10 October 2020
| Citation



Monitoring and evaluating the skin temperature value are considerably important for neonates. A system detecting diseases without any harmful radiation in early stages could be developed thanks to thermography. This study is aimed at detecting healthy/unhealthy neonates in neonatal intensive care unit (NICU). We used 40 different thermograms belonging 20 healthy and 20 unhealthy neonates. Thermograms were exported to thermal maps, and subsequently, the thermal maps were converted to a segmented thermal map. Local binary pattern and fast correlation-based filter (FCBF) were applied to extract salient features from thermal maps and to select significant features, respectively. Finally, the obtained features are classified as healthy and unhealthy with decision tree, artificial neural networks (ANN), logistic regression, and random forest algorithms. The best result was obtained as 92.5% accuracy (100% sensitivity and 85% specificity). This study proposes fast and reliable intelligent system for the detection of healthy/unhealthy neonates in NICU.


fast correlation-based filter, local binary pattern, machine learning, neonate, thermography

1. Introduction

In medical fields, thermography is used for detecting and diagnosing of diseases, planning of treatments, and evaluating the effects of the treatment [1]. Early diagnosis of unhealthy neonates is so critical for medicine, if diseases are not handled immediately, may lead to fast deterioration of the health status of neonates and lead to death. Modalities such as magnetic resonance and computerized tomography are used for disease detection. However, these modalities are not useful for neonates because the body temperature can change in long-term image acquisition and temperature changes during neonate relocation so hypothermia [2] can be seen. According to WHO [3], the mortality rate of neonates was 28 per thousand in 2018. However, thanks to thermal imaging which is non-invasive and non-ionized technique, the thermal signature of neonates is obtained and the health status of neonates are detected.

In 1980, Clark and Stothers first performed the analysis of body temperature distributions of neonates by using thermal camera and thermocupl [4]. According to their results, the differences between values obtained from thermal camera and thermocupl was 0.107. In 2012, Abbas et al. examined the neonate heat flow analysis under different clinical scenarios [5]. External influences such as external heat sources and air flow prevent accurate temperature measurement. They recommended calibrations for compensating these effects. One of the most important problems was that the actual temperature value was not known precisely. They tried minimizing temperature differences by measuring the temperature values with a digital thermometer.

In 2013, the identification of thermal abnormalities of neonatal patients was presented by Ruqia [6]. She explained the importance of using temperature changes instead of the spatial distribution of temperature values. The skin temperature is related to thermoregulation; thus, the blood flow disturbances under the skin directly affect the temperature measurement. For this purpose, image enhancement was performed to uncover the analyzed texture and to remove the noise and background. Although the region of interest (ROI) selection was conducted manually, the image enhancement process was performed automatically.

In 2014, Abbas and Leonhardt explained the neonatal infrared thermography pattern clustering based on the independent component analysis (ICA) [7]. They pointed out that abnormalities such as tumors, inflammation, and infection cause local temperature increases or asymmetric models. The head has a high heat emission potential; therefore, they concentrated on head images, and they studied RGB-based methods. They used a wavelet transform to extract coefficients of thermal images, principal component analysis to reduce the size of images, and ICA to clustering. They achieved a 95% accuracy rate by using 34,230 sample patches.

Savasci and Ceylan, in 2018, pointed out the importance of monitoring thermal asymmetry and time-related thermal differences by using 32 neonates [8]. According to them, planning of new treatments, protecting of stability, and making vital decisions can be carried out with the analysis of thermal differences. They showed that thermal degrees vary in unhealthy and healthy neonates by using image processing techniques. Thus, it was once again explained that the thermal asymmetry degree in unhealthy neonates is higher than that in healthy neonates. Ornek et al. explained that using temperature map is more effective than using RGB images for evaluating neonate thermograms [9]. The 10 obtained different thermograms from neonates were converted to RGB and temperature map, and image coefficients were obtained with wavelet transform. Subsequently, thermograms were reconstructed with image coefficient by using inverse wavelet transform. The peak signal noise ratios obtained were 33.625 and 27.695 dB in temperature maps and RGB images, respectively. Furthermore, the structural similarity indices obtained were 0.954 and 0.887 in temperature maps and RGB images, respectively.

The explained works until now do not contain comprehensive system that use segmentation, feature extraction-selection and machine learning algorithms. Our comprehensive intelligent system has the following contributions:

  • We used real data belonging to 20 healthy and 20 unhealthy neonates
  • Segmentation was used for the background subtraction and to uncover temperature changes to be used in feature selection
  • Feature extraction was used to extract the meaningful features from thermograms
  • To select significant features among all extracted features, feature selection was used
  • Four different machine learning algorithms were used to classify the obtained features

This paper is organized as follows. In the Materials and Methods section, used data, thermal map, and experimental procedures are presented. In the Results section, the detailed results are given. In the subsequent sections, discussion and conclusion are described.

2. Material and Methods

At the first sight at thermograms, differences between healthy and unhealthy neonates do not appear, but it is known that healthy neonates show thermal asymmetry and unhealthy neonates shows thermal asymmetry. The goal of this paper is to find thermal asymmetry shown on body of neonates by using computer vision and machine learning algorithms.

Figure 1. Block diagram of the proposed system

In the succeeding subsections, we will describe used data, thermal map, segmentation of thermograms, feature extraction and selection, and classification of thermal map. The block diagram of the system is shown in Figure 1.

2.1 Obtaining of data used

The thermograms were taken with VarioCAM HD infrared thermal camera and IRBIS software which are a product of InfraTec©, in Selcuk University, Faculty of Medicine, Neonatal Intensive Care Unit (NICU). The resolution of the camera is 640 x 480, the temperature resolution is up to 0.02 Kelvin at 30 Celcius and the measurement accuracy +- 1 Celcius or +- 1%. We used 40 different thermal images belonging to 20 healthy and 20 unhealthy premature neonates. Images were captured between 60 and 100 cm distance from the neonate lying in the supine position. The thermograms were converted to raw temperature map using a portable computer and IRBIS software. Measurement setup is shown in Figure 2.

Figure 2. Measurement setup: (1) portable computer, (2) infrared camera, (3) neonate, and (4) incubator

Statistical characteristics of neonates such as birth weights and gestational age at time of delivery are given in Table 1. where unhealthy neonates have diseases such as intracranial hemorrhage, respiratory distress syndrome, necrotizing enterocolitis, transient tachypnea, and diaphragm hernia.

2.2 Thermal map

The temperature map is two-dimensional arrays that directly represent the temperature values of the thermograms. When these temperature values are normalized and displayed, a gray-level image is obtained as in Figure 3, where the white and black parts represent hot and cold regions, respectively.

Figure 3. Gray-level image and temperature map of the selected region

Table 1. Physical characteristics of healthy neonates


Birth Weight (g)

Gestational Age at Time of Delivery

Health Status




Standard Deviation



Standard Deviation

















2.3 Segmentation of thermal map

An open radiant warmer, kangaroo mother care, and convective incubator are available in NICU because every neonate has different weight, body temperature, and conditions. As shown in Figure 4, we used the segmentation algorithm on the basis of temperature value for the background subtraction and to uncover temperature changes to be used in feature selection

Figure 4. Steps of the segmentation process

The thermograms contain real temperature values varying 32 and 35 degrees. Since we need values between 0 and 1 to process the thermograms with image processing methods, min-max normalization [10] was used. Otsu’s threshold method [11] was used for the “background subtraction”. Using the Otsu’s method, we calculated the in-class variance value between the background and foreground for all threshold values. The threshold value, which provides the smallest variance, is selected as optimum threshold. Obtained images are shown in Figure 5.

Figure 5. (a) Visualized thermal map. (b) Temperature map after threshold algorithm

Because of conditions in NICU, some neonates that lie in the incubator have materials unwanted such as latch on umbilical ligaments. As shown in Figure 6 (a), the latch is caused to holes on the skin because this object is colder than the body. To ignore the effects of the material, hole filling [12] was used. Hole filling is a method to fill spaces on objects. The images before and after hole filling can be seen Figure 6.

Figure 6. (a) Thermal map before hole filling. (b) Thermal map after hole filling

After the hole filling and threshold algorithms were completed, the derivative (Eq. (1)) of the image was taken to observe the changes in temperature. Each temperature value l(x) is subtracted from l(x-1).

$\frac{d f}{d x}=f(x)-f(x-1)$          (1)

Consequently, a matrix that contains the temperature changes was obtained. The matrix has negative and positive values, as shown in Figure 7 and values that are greater than zero are equal to 1, equal or smaller than zero are equal to 0 (Eq. (2)).

$\left\{\begin{array}{l}1, \frac{d f}{d x}>0 \\ 0, \text { others } \end{array}\right.$           (2)

Figure 7. Matrix of temperature changes

where, g(x) is the binary matrix obtained from the thermal map as shown in Figure 8.

Figure 8. Binarized thermal map visualization and related matrix

2.4 Feature extraction

Feature extraction is one of the important pre-processing steps in the classification process. Wavelet transform [13, 14], ridgelet transform [15], ripplet transform [16], histogram of the gradient [17, 18], and local binary pattern (LBP) are some of the many known feature extraction methods. By using these methods, edges, orientation, volume, resolution, and histogram are obtained from the images [19, 20]. LBP is an important texture descriptor [21] and the algorithm of LBP is shown in Figure 9.

LBP is a method comparing values in kernel selected and producing a value to be used for feature selection or classification. Since we are interested in small changes in temperature value, LBP were used to extract meaningful features (the effect of small changes).

Figure 9. LBP Algorithm

2.5 Feature selection using fast correlation-based filter

Feature selection removes redundant features, reduces feature dimension, decreases processing time, and increases learning accuracy as a pre-processing step of the classification process [22-24]. Filters, wrappers, and embedded-based approaches are some of the feature selection methods. In this study, the features are selected with the FCBF method.

The FCBF method involves two parts. The first part is deciding whether or not the features are related to class, and the second part is deciding whether or not the features are redundant when taking with other selected features.

Information gain (IG) measures the relation among the features (Eq. (3), (4), (5)):

$I G(Z \mid R)=H(Z)-H(Z \mid R)=H(R)-(R \mid Z)$          (3)

$H(Z)=\sum_{i} p\left(Z_{i}\right) \log p\left(Z_{i}\right)$          (4)

$H(Z \mid R)=\sum_{i} p\left(Z_{i}, R_{i}\right) \log \frac{p\left(Z_{i}, R_{i}\right)}{p\left(R_{i}\right)}$           (5)

where, H(Z) and H(R) are defined as marginal entropies, p(Zi) and p(Zi,Ri) are values of probability, and H(ZIR) and H(RIZ) are defined as conditional entropies in Eq. (3). If these two variables are completely non-correlated, then the IG is closed to the value of 0 [25]. Moreover, symmetric uncertainty (SU) is defined as a criterion for revealing the most dominant features of large quantities of data (Eq. (6)):

$S U(Z, R)=2 \frac{I G(Z \mid R)}{H(Z)+H(R)}$           (6)

2.6 Classification of thermal map

Machine learning algorithms such as decision tree (DT), random forest (RF), ANN, and logistic regression (LR) were used to classify healthy/unhealthy neonates with obtained features. The 10-fold cross-validation was realized to validate the algorithms.

2.6.1 Artificial Neural Network

The ANN is one of the learning algorithms that is based on biological nervous system. ANN creates links between input and output data on the basis of several parameters such as weights, sum function, and activation function [26].

The input layer contains raw data or obtained features, and the output layer contains classes of data. The weights point out the importance of information coming from the artificial cell. Furthermore, sum function calculates the net input by summing up the products of weights and inputs. The activation function calculates the output of the sum function. Thus, the output values are obtained between the identified range. In this study, the sigmoid function (Eq. (7)), as shown in Figure 10, was selected as the activation function, and the obtained output values were 0 and 1.

$f(x)=\frac{1}{e^{-x}}$           (7)

where, x is the input value and f(x) is the output value. The feed forward back propagation ANN, as shown in Figure 11, was used. The optimum ANN parameters are experimentally as follows: error target is 1e-20, hidden layer 2, first hidden layer nodes 5, second hidden layer nodes 8, learning rate 0.3, momentum rate 0.2, and the maximum number of iterations 500.

Figure 10. The Sigmoid Function

Figure 11. ANN model (input = 12, hidden layer = 2, and output = 2) used for classification. The features are obtained by using a thermal map and FCBF

2.6.2 Logistic regression

Regression determines the relationship among data features with a mathematical model [27]. One independent variable and many independent variable regression models exist. One independent variable regression is called simple linear regression and is explained by the following equation (Eq. (8)):

$Y=B_{0}+B_{1} X+E$           (8)

where, X is the input value, Y is the output value, B0 is the constant value when X = 0, B1 is the regression coefficient, and E is the error value. In the linear regression, dependent and independent variables are needed to obtain quantitative values and have a normal distribution. However, in LR, the values do not need to quantify the target variable, and the dataset does not need to have a normal distribution. The probability calculation determines the estimated value of the target variable. If p classes and q features exist, then q x (p-1) parameters are calculated [28]. In this study, 12 parameters (with FCBF) and 756 parameters (without FCBF) are calculated for LR algorithms.

2.6.3 Decision tree and random forest

The DT and RF algorithms were applied to observe the effect of rule-based classification algorithms on data. DT algorithm's processing time and computational complexity are low; furthermore, classification speeds are high in one of the machine learning algorithms [29]. Each node represents a feature on the tree, as shown in Figure 12.

Figure 12. The decision tree architecture used in this study. The features are obtained by using RGB images and FCBF

In the DT learning process, the properties are divided into sub-sets by applying the decision rules. Each feature is usually divided into two sub-clusters. This process continues recursively until there is no more effect on the classification of the properties used. Thus, large-sized data are expressed as a combination of smaller data.

The RF algorithm is aimed at increasing the classification accuracy by using more than one DT [30]. These DTs are sub-sets that were randomly selected from the dataset. RFs are created by combining individual DTs. In this study, the optimum number of the tree is experimentally found to be 10.

2.7 Cross-validation

Evaluation of network accuracy is critical for machine learning algorithms. Frequently, data are split into constant training and test parts, but this method is insufficient. For example, when data are divided into 75% training and 25% test, the training and testing only depend on these divided data. In K-fold cross-validation [31], the data are split into K parts, and all parts are played a role as training and testing.

In this study, the data are divided into 10 parts. One of these parts is defined as test data, whereas the remaining nine parts are defined as training data. Then, another group is reserved for the test data. In this way, all data are used for both testing and training phases. As a final step, a general accuracy is obtained by taking the average of the 10 obtained results. That is, 40 thermograms are split into 10 parts (K = 10), and 36 thermograms are used for training data; four thermograms are used as testing data 10 times.

3. Results

In this study, thermal images, which were taken from 40 different neonates, were classified as healthy and unhealthy. Machine learning algorithms such as ANN, LR, DT, and RF were used for the classification of raw thermal images. LBP algorithm was applied for feature extraction of all used images, and obtained features were reduced by using FCBF, which is a feature selection algorithm. The 10-fold cross-validation algorithm was applied to verify classification validity. Table 2 shows the comprehensive comparison of classification results. The best result of healthy/unhealthy neonate classification is achieved at 92.5% accuracy. Of the 20 healthy images, 17 are classified as healthy, and all unhealthy images are classified as unhealthy by using thermal map, segmentation, feature extraction-selection, and ANN algorithms.

Table 2 also shows the importance of segmentation and feature selection. When and thermal map was used, the highest rate of classification obtained was 50% accuracy (thermal map + RF) without FCBF, whereas 72.5% accuracy (thermal map + LR) was obtained with FCBF. After the threshold algorithm was applied, the highest rate of classification obtained was 75% accuracy (thermal map + LR) with FCBF. When the threshold and binarization algorithms were used together, the best result obtained was 92.5% accuracy (thermal map + ANN) with FCBF.

Table 2. All results

Thermal Map

Classification Results (% Accuracy) without feature selection with feature selection













































Figure 13 shows the processing time with and without feature selection. The processing time is 14.5 s (ANN) without FCBF, whereas the processing time is 0.09 s (ANN) with FCBF for the best result. Between 20 and 100 times, there are differences in processing time because 756 (without FCBF) and 12 (with FCBF) features are classified. The result of the experimental study shows that thermal imaging is a useful method for detecting the healthy/unhealthy neonates.

Figure 13. Processing time (s) without feature selection and with feature selection

4. Conclusion

Thermography is a fast, non-ionized, and non-invasive method; hence, we decided to observe the benefits of this method on neonates in the NICU. In this study, the healthy/ unhealthy classification was made by using 40 different thermograms taken from 20 healthy and 20 unhealthy neonates. Machine learning and feature extraction algorithms were applied to both raw and segmented thermal images to obtain comprehensive results. The results show the importance of the temperature map, features selection after applying the feature extraction, and image segmentation by comparing feature extraction (LBP), feature selection (FCBF), and machine learning (ANN, LR, DT, and RF) algorithms. In this study, the healthy/unhealthy classification of neonates was realized with accuracy, sensitivity and specificity values of 92.5%, 100%, and 85% respectively. This study proposes state-of-the-art methods for the detection of the healthy/unhealthy neonates.

Development of new segmentation types, application of feature selection methods, and testing of different machine learning algorithms have had a great impact on the results. Further improvement of this study will help diagnose diseases such as necrotizing enterocolitis, transient tachypnea of the newborn, and diaphragm hernia by increasing the number of used thermograms and adding new methods. Thus, a pre-diagnosis system will be implemented without any physical deterioration on unhealthy neonates.


This study was supported by the Scientific and Technological Research Council of Turkey (TUBITAK, project number: 215E019).


[1] Sruthi, S., Sasikala, M. (2015). A low cost thermal imaging system for medical diagnostic applications. In 2015 International Conference on Smart Technologies and Management for Computing, Communication, Controls, Energy and Materials (ICSTM), pp. 621-623. 

[2] Brown, D. J., Brugger, H., Boyd, J., Paal, P. (2012). Accidental hypothermia. New England Journal of Medicine, 367(20): 1930-1938.

[3] Mortality rate, neonatal (per 1,000 live births)., accessed on July 25, 2020. 

[4] Clark, R.P., Stothers, J.K. (1980). Neonatal skin temperature distribution using infra-red colour thermography. The Journal of Physiology, 302(1): 323-333.

[5] Abbas, A.K., Heimann, K., Blazek, V., Orlikowsky, T., Leonhardt, S. (2012). Neonatal infrared thermography imaging: analysis of heat flux during different clinical scenarios. Infrared Physics & Technology, 55(6): 538-548.

[6] Nur, R. (2014). Identification of thermal abnormalities by analysis of abdominal infrared thermal images of neonatal patients. Doctoral dissertation, Carleton University.

[7] Abbas, A.K., Leonhardt, S. (2014). Neonatal IR-thermography pattern clustering based on ICA algorithm. Conference: Color Image Processing Workshop, FWS 2008 Aachen.

[8] Savaşci, D., Ceylan, M. (2018). Thermal image analysis for neonatal intensive care units (First evaluation results). In 2018 26th Signal Processing and Communications Applications Conference (SIU), pp. 1-4. 

[9] Chen, P. (2019). Effects of normalization on the entropy-based TOPSIS method. Expert Systems with Applications, 136: 33-41.

[10] Ornek, A.H., Savasci, D., Ceylan, M., Ervural, S., Soylu, H. (2018). Determination of correct approaches in evaluation of thermograms. KTO Karatay University, Konya.

[11] Otsu, N. (1979). A threshold selection method from gray-level histograms. IEEE Transactions on Systems, Man, and Cybernetics, 9: 62-66.

[12] Soille, P. (2004). Geodesic Transformations. In: Morphological Image Analysis. Springer, Berlin, Heidelberg, pp. 183-218.

[13] Zhang, D. (2019). Wavelet transform. In Fundamentals of Image Data Mining, pp. 35-44.

[14] Zhang, J., Zhu, Q., Song, L. (2019). A wavelet-based self-adaptive hierarchical thresholding algorithm and its application in image denoising. Traitement du Signal, 36(6): 539-547.

[15] McEwen, J.D., Price, M.A. (2019). Scale-discretised ridgelet transform on the sphere. In 2019 27th European Signal Processing Conference (EUSIPCO), A Coruna, Spain, pp. 1-5.

[16] Kar, N.B., Babu, K.S., Sangaiah, A.K., Bakshi, S. (2019). Face expression recognition system based on ripplet transform type II and least square SVM. Multimedia Tools and Applications, 78(4): 4789-4812.

[17] Nassih, B., Amine, A., Ngadi, M., Hmina, N. (2019). DCT and HOG feature sets combined with BPNN for Efficient Face Classification. Procedia Computer Science, 148: 116-125.

[18] Deore, S.P., Pravin, A. (2019). Histogram of oriented gradients based off-line handwritten Devanagari characters recognition using SVM, K-NN and NN classifiers. Revue d'Intelligence Artificielle, 33(6): 441-446.

[19] Ojala, T., Pietikainen, M., Maenpaa, T. (2002). Multiresolution gray-scale and rotation invariant texture classification with local binary patterns. IEEE Transactions on Pattern Analysis and Machine Intelligence, 24(7): 971-987.

[20] Yaşar, H., Ceylan, M. (2016). A new method for extraction of image's features: Complex discrete Ripplet-II transform. In 2016 24th Signal Processing and Communication Application Conference (SIU), pp. 1673-1676.

[21] Qian, X., Hua, X.S., Chen, P., Ke, L. (2011). PLBP: An effective local binary patterns texture descriptor with pyramid representation. Pattern Recognition, 44(10-11): 2502-2515.

[22] Yu, L., Liu, H. (2003). Feature selection for high-dimensional data: A fast correlation-based filter solution. In Proceedings of the 20th International Conference on Machine Learning (ICML-03), pp. 856-863.

[23] Senliol, B., Gulgezen, G., Yu, L., Cataltepe, Z. (2008). Fast Correlation Based Filter (FCBF) with a different search strategy. In 2008 23rd International Symposium on Computer and Information Sciences, pp. 1-4.

[24] Kavitha, K.R., Gopinath, A., Gopi, M. (2017). Applying improved SVM classifier for leukemia cancer classification using FCBF. In 2017 International Conference on Advances in Computing, Communications and Informatics (ICACCI), pp. 61-66.

[25] Ervural, S., Ceylan, M. (2017). Determination of benign and malign lesions by fusion of the different phases of liver MR. In 2017 25th Signal Processing and Communications Applications Conference (SIU), pp. 1-4.

[26] Basheer, I.A., Hajmeer, M. (2000). Artificial neural networks: fundamentals, computing, design, and application. Journal of Microbiological Methods, 43(1): 3-31.

[27] Walker, S.H., Duncan, D.B. (1967). Estimation of the probability of an event as a function of several independent variables. Biometrika, 54(1-2): 167-179.

[28] Le Cessie, S., Van Houwelingen, J.C. (1992). Ridge estimators in logistic regression. Journal of the Royal Statistical Society: Series C (Applied Statistics), 41(1): 191-201.

[29] Alves, L.D.O., Cruz, L.F., Saito, P.T., Bugatti, P.H. (2019). Towards Practical Computer Vision in Teaching and Learning of Image Processing Theories. In 2019 IEEE Frontiers in Education Conference (FIE), pp. 1-7.

[30] Ho, T.K. (1995). Random decision forests. In Proceedings of 3rd International Conference on Document Analysis and Recognition, pp. 278-282.

[31] Kohavi, R. (1995). A study of cross-validation and bootstrap for accuracy estimation and model selection. In Ijcai, 14(2): 1137-1145.