Optimizing Energy Consumption in Buildings: Intelligent Power Management Through Machine Learning

Optimizing Energy Consumption in Buildings: Intelligent Power Management Through Machine Learning

Marwa Mushtaq Talib* Muayad Sadik Croock

Department of Computer Engineering, University of Mosul, Mosul 00964, Iraq

Department of Control and Systems Engineering, University of Technology, Baghdad 00964, Iraq

Corresponding Author Email: 
marwa.21enp1@student.uomosul.edu.iq
Page: 
765-772
|
DOI: 
https://doi.org/10.18280/mmep.110321
Received: 
4 July 2023
|
Revised: 
11 October 2023
|
Accepted: 
22 October 2023
|
Available online: 
28 March 2024
| Citation

© 2024 The authors. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

In the realm of energy conservation, managing power consumption within buildings emerges as a pivotal challenge. This study introduces sophisticated models that optimize energy usage by intelligently managing power distribution in various zones of a building. To achieve this, four machine learning classifiers, Random Forest (RF), Support Vector Machine (SVM), K-Nearest Neighbor (KNN) algorithm, and Naive Bayes (NB), were employed. These classifiers were integrated with feature reduction techniques, namely Boruta and Principal Component Analysis (PCA), to diminish model complexity. The study delineates three distinct power management strategies: Full, Selected, and Shutdown. The effectiveness of these models was evaluated using a dataset obtained from a building's energy consumption measurements. A comparative analysis revealed that the integration of the RF classifier with the Boruta feature reduction method significantly excelled, achieving a classification accuracy of 98%. Additionally, this combination demonstrated an execution time of merely 0.4549 seconds. The findings of this research not only underscore the efficacy of combining specific machine learning classifiers with feature reduction techniques but also highlight the potential of such integrations in optimizing energy consumption in building environments. This approach paves the way for more energy-efficient and sustainable building management practices.

Keywords: 

energy management systems, energy efficiency, smart buildings, machine learning models, classification, feature reduction methods, feature selection

1. Introduction

The pursuit of energy efficiency stands paramount in the quest for environmental sustainability in contemporary society. Emphasizing the imperative of energy-efficient infrastructure, this focus becomes increasingly crucial within the urban fabric of smart cities. Buildings, constituting a substantial portion of urban landscapes, are identified as primary contributors to global carbon dioxide emissions and energy consumption, accounting for over two-thirds of the total [1]. In this light, the global inclination towards a low-carbon energy transition has intensified, evidenced by the augmented installation of renewable energy capacities by service providers. This transition necessitates the distribution of storage solutions and the establishment of intricate networked systems to facilitate renewable energy integration.

In parallel, the energy sector's embracement of digital strategies becomes evident through the synergy of energy management and artificial intelligence (AI), unleashing a plethora of opportunities to enhance energy systems. AI-powered solutions for smart consumption are revolutionizing energy consumption and conservation patterns among consumers. The development of decentralized electric grids, utilizing previously collected data [2], enables balanced energy distribution. The burgeoning field of data science, along with associated technologies, presents an unprecedented opportunity to augment energy efficiency across the construction industry’s lifecycle and to manage energy at the building level effectively.

The advent of advanced Information and Communication Technologies (ICTs) such as the Internet of Things (IoT), Distributed Ledger Technology (DLT), and blockchain [3], has catalyzed the emergence of novel services and applications aimed at efficient energy management in buildings. An array of sophisticated machine learning algorithms, supported by diverse data sources, facilitates intricate decision-making processes for managing energy-efficient services. These services aim to enhance the reliability and dependability of energy systems, optimize the operational profitability of power generation components, enable proactive analytics for monitoring building performance, and facilitate data sharing among various power generation units to provide intelligent energy solutions and precise power and cabling information [4]. It is posited that a real-time energy management system is essential to address the deficit in energy consumption and to bolster energy efficiency. This objective is attainable through the integration of machine learning algorithms, which are instrumental in adapting contemporary design and identifying key components for optimizing energy consumption in buildings.

In this investigation, machine learning techniques are utilized within a system designed to enhance energy utilization in buildings. A dataset, comprising open workspace areas within a university building, serves as the foundation for this study. The system encompasses three operational modes for energy management, namely: shutdown, selected, and full. These modes are intrinsically linked to the occupancy status within the building, which constitutes the primary feature for selection. The shutdown mode is activated in the absence of occupancy, contrasting with the full mode, which is implemented during peak occupancy periods. The selected mode is operational during times of partial occupancy. The system under study employs two distinct feature reduction methods: Boruta and PCA, each applied in conjunction with four machine learning-based techniques, namely RF, SVM, KNN, and NB. These algorithms are critically evaluated with respect to accuracy and execution time, facilitating the selection of the most optimal technique that aligns with the requirements of the proposed model.

The primary aim of this study is to optimize energy consumption in buildings through the application of machine learning techniques and occupancy status. This approach ensures that both energy consumption and equipment operation are contingent upon the presence of individuals within the building, whether it be empty, fully occupied, or partially occupied. The subsequent sections of this study are organized as follows: Section 2 presents a review of machine learning-based energy optimization systems explored in the existing literature. Section 3 delineates the proposed methodology for the system. Section 4 discusses the results obtained from the implementation of this methodology. Finally, Section 5 concludes the findings and outlines potential avenues for future research.

2. Related Work

An extensive review of literature in the domain of energy management underscores the profound impact of machine learning techniques, particularly in recent advancements. The application of machine learning in Building Energy Management (BEM) emerges as a pivotal area for optimizing energy consumption in buildings. Challenges such as control, planning, and scheduling in power systems can be effectively addressed using these strategies [5]. These approaches are particularly beneficial in modern large-scale power system applications, which frequently encounter complexities due to increased interconnections and rising load demands.

The effectiveness of various machine learning classifiers in enhancing building energy efficiency has been extensively analyzed. A critical aspect is the necessity of a comprehensive problem definition, coupled with a thorough analysis of datasets used for both training and testing, to achieve improved results [6]. A machine learning model was applied to calculate unmeasured variables derived from a sensor network. Data spanning six months from a Japanese smart building were utilized, demonstrating that interior measurements crucial for optimal regulation of air conditioning and heating systems could be precisely estimated using this model [7]. Furthermore, a study employing the Gradient Boosting Machine (GBM) algorithm developed a model to predict energy usage of a unit as a baseline [8]. The offline analysis of this model yielded a significant finding: the potential to reduce energy usage by 10.31%, equivalent to a reduction of 63,119 metric tons in steam usage annually.

In a notable advancement, a hybrid machine learning strategy was developed for forecasting heat usage in buildings [9]. This model integrates Empirical Mode Decomposition (EMD), SVM, and Imperialistic Competitive Algorithm (ICA), demonstrating a novel approach in the field. Similarly, the application of a gradient-boosting machine learning algorithm for temperature prediction was explored. This study reported that the model consistently predicts temperature with a mean Root Mean Square Error (RMSE) of 0.05 and an estimated value deviation of 2.38℃, significantly outperforming a theoretical model by a margin of 6℃ [10].

Further, Bot et al. [11] conducted a comprehensive examination of a workplace and living lab building, demonstrating the application of BEM through machine learning, conditional modeling, and linear regression. This study emphasized the assessment of energy savings, providing valuable insights into BEM strategies. Additionally, ensemble methodologies were identified as particularly effective for power forecasting in residential homes. The analysis of a Home Energy Management System (HEMS) revealed that ensemble forecasting results surpassed those of individual models, indicating a strong preference for these methodologies in residential energy management. Ganesh et al. [12] introduced an innovative energy optimal management system, combining moving horizon estimation (MHE) with model predictive control (MPC). This approach particularly focused on optimizing indoor air quality, further expanding the scope of energy management strategies.

With the proliferation of IoT, the integration of smart devices in residential buildings has escalated, necessitating advanced energy management systems. A hybrid Gradient Boosting Decision Tree-Artificial Transgender Longicorn Algorithm (GBDT-ALTA) was proposed, enhancing energy efficiency through strategic use of waiting period thresholds, effectively reducing electricity bills [13]. In a related context, the implementation of an IoT system for managing heating, ventilation and air conditioning (HVAC) systems while monitoring environmental, electrical, and comfort factors in real-world settings was investigated. This IoT approach demonstrated versatility in data interchange, deployment, and debugging time, particularly in small and medium-sized buildings [14]. The intersection of IoT and AI techniques for improving energy efficiency in buildings, especially in controlling HVAC systems, has been extensively explored [15-19]. These studies collectively underscore the transformative impact of integrating IoT and AI in energy management, offering a comprehensive view of current innovations and potential future directions in the field.

3. Proposed Methodology

The methodology of the system under study is illustrated in Figure 1. This system was developed through a sequence of methodical steps. Initially, a dataset suitable for the research objective was selected. Subsequent steps included preprocessing methods encompassing data labeling and feature scaling. Data splitting was then conducted, followed by the application of two independent feature reduction methods, Boruta and PCA. Each method was utilized in conjunction with four distinct machine learning classifiers: RF, SVM, KNN, and NB, to yield classification results. The evaluation criteria, focusing on accuracy and execution time, were calculated. A comprehensive comparison of the methods was performed to facilitate the final evaluation of the proposed system.

Figure 1. Flowchart for the proposed system

3.1 Dataset selection

The dataset utilized in this research was sourced from a study focusing on an open plan space and a locked workspace within a college building, covering a floor area of approximately 200m2 [20]. Spanning the duration of one year, from 1 January 2013 to 31 December 2013, this dataset comprised 35,040 records. The data encompassed various parameters, including indoor ambient conditions (temperature, humidity), plug loads, and external elements (temperature, relative humidity, wind speed, and global irradiance). Additionally, it provided insights into the presence of occupants, along with the operation of windows and lights. This comprehensive dataset supported a range of applications, particularly in developing and validating occupancy-related models. The categories of the measured data were delineated as follows:

  • Inhabitation: encompassing the presence at workstations and the status of lights/windows.
  • Indoor conditions: pertaining to indoor temperature and humidity.
  • Outdoor conditions: involving external temperature, humidity, wind speed, and direction.
  • On/off lighting.
  • Equipment power: primarily dealing with plug load.

Figure 2 illustrates the floor plan of the office space. The office is demarcated into various areas, including a kitchen (KI), four offices (O1 to O4), and a meeting room (MR). Area O1 is further subdivided into five distinct areas, with occupancy status and plug load measurements conducted for these sub-areas as well as other rooms within the office space.

Figure 2. Floor plan of the office space [20]

The proposed system aims to enhance energy management within the building by implementing three operational modes based on occupancy status: shutdown, full, and selected. An additional feature termed “mode” was incorporated into the dataset to categorize the data according to these three modes.

3.2 Data preprocessing

The subsequent phase in the development of the classification model entails data preprocessing. Typically, raw data from real-world sources is unstructured, prone to human errors, and occasionally incomplete. To rectify these imperfections, data preprocessing is employed, enhancing the completeness and suitability of datasets for analysis, thereby yielding more accurate results. This process involves transforming raw data into a format that is interpretable and usable for the end-users. The specific preprocessing steps undertaken in this study are detailed below.

3.2.1 Data labeling

Data labeling encompasses the process of identifying or categorizing raw data. These labels serve as indicators of the data's class association, facilitating the machine learning model's ability to recognize and classify similar data in unlabeled datasets. Labeled data is essential for supervised learning, wherein an algorithm is trained on input data paired with output labels to discern patterns and formulate predictions or classifications. In this study, the labels are crucial for classification into three categories: “full”, “selected”, and “shutdown”, which are encoded as “0”, “1”, and “2” respectively. Figure 3 illustrates the distribution of the dataset into these three classes, revealing 20,678 records for “shutdown”, 12,074 for “selected”, and 2,288 for “full”.

Figure 3. Labeled data

3.2.2 Feature scaling

Feature scaling, also known as Z-score normalization, is a critical step for many machine learning algorithms. When these algorithms compute distances between data points, discrepancies in the scale of features can disproportionately influence their values. To address this issue, feature scaling is performed to normalize features within a specified range [21]. This standardization ensures that all features contribute equally to the outcome of the machine learning algorithms, thereby improving the accuracy and efficiency of the model.

3.3 Data splitting

Data splitting, a standard practice in machine learning, involves dividing the dataset into training and testing sets. This technique is instrumental in determining the model's hyperparameters and evaluating its performance. In this study, the dataset was partitioned into 80% for training and 20% for testing. This ratio was selected due to the relatively limited number of records, necessitating a larger training set to enhance model accuracy. Consequently, the dataset comprised 28,033 records for training and 7,008 for testing, as depicted in Figure 4.

Figure 4. Training and testing records

3.4 Feature reduction

Feature reduction, a pivotal technique in machine learning, is employed to enhance model accuracy by focusing on essential variables and eliminating redundant ones. This approach also aids in improving the algorithms' predictive capabilities. Two feature reduction methods were utilized in this research: Boruta and PCA. Each method was applied independently to the four machine learning classifiers to evaluate the results in terms of accuracy and execution time.

3.4.1 Boruta feature selection

Boruta, a feature selection wrapper based on the RF classifier algorithm, was implemented in this study. This method involves creating a duplicate of the original dataset, wherein each column's values are shuffled randomly to produce shadow features. After training the RF classifier, the mean decrease in accuracy is calculated. Features with a higher mean are deemed more significant [22]. Through this process, the number of features was reduced from 37 to 33.

3.4.2 PCA

PCA is widely recognized as an unsupervised machine learning technique utilized for various purposes, including data de-noising, compression, dimensionality reduction, and exploratory data analysis [23]. In this research, PCA was employed as a feature reduction method, resulting in 25 principal components.

3.5 Machine learning algorithms

For the evaluation of the proposed system's performance, four machine learning methods were employed: RF, SVM, KNN, and naïve Bayes. Each method is succinctly elucidated below.

3.5.1 RF

RF is an ensemble learning method that aggregates multiple decision trees to enhance prediction accuracy. This technique boosts randomness in the model by searching for the best features among a randomly selected subset, resulting in a balance of low bias and variance. The final prediction is determined through averaging, as depicted in Eq. (1) [24]:

$X^*=\pi(\mathrm{O}(\mathrm{c})) \mid \beta=1 / \mathrm{k} \sum_{\mathrm{k}=1}^{\mathrm{k}} \mathrm{x}_{\mathrm{lk}}^*\left(\mathrm{f}^{\mathrm{o}} 0(\mathrm{c})\right), \mathrm{k}^*$           (1)

where, X* represents the optimal points, 0(c) the observation, β the learning rate variable, K the number of decision trees, f the feature transform, and lk a leaf node of the decision tree.

3.5.2 SVM

SVM, a supervised machine learning algorithm, creates input-output mapping functions from labeled training data. Rooted in statistical learning, SVMs are widely applied in diverse real-world scenarios. Its formulation is represented in Eq. (2) [25]:

$y(x)=\sin \sum_{k=1}^N \alpha_k y_k \psi\left(x, x_k\right)+b$                (2)

where, X and Y denote distances between respective points x and y; αk the positive real constants; T, b and K are constants. The kernel function ψ (·,·) often takes the form of $\psi\left(x, \quad x_k\right)=x_k^T x$ for a linear SVM, or $\psi\left(x, x_k\right)=\left(x_k^T+1\right)^2$ for a polynomial SVM of degree d.

3.5.3 KNN

KNN maintains all training data for classification. It selects representative samples from the training set for categorization purposes. The inductive learning approach developed from the training dataset is then utilized for classification, as illustrated in the KNN equation [26]:

$y(x)=\sqrt{\sum_{i=1}^k\left(x_i-y_i\right)^2}$             (3)

where, X and Y signify distances between respective data points x1, x2, …. xn and y1, y2, …... yn.

3.5.4 NB

NB, based on Bayes' rule, assumes conditional independence of features given a class. It calculates the probability P(y|x) for all classes y given an item x using sample data, as shown in Eq. (4) [27]:

$\mathrm{P}\left(C_k \backslash x\right)=\frac{p\left(C_k\right) p\left(x \backslash C_k\right)}{p(x)}$            (4)

where, $X$ is a vector of $n$ features $x=\left(x_1, \ldots, x_n\right)$, and $K$ represents potential outcomes or classes $C_k$.

3.6 Criteria measurements

In evaluating the effectiveness of machine-learning models, a range of measures is essential [28]. A confusion matrix categorizes predictions based on their correlation with actual data values. Correct classifications occur when predicted values match observed ones. From the confusion matrix, metrics such as accuracy, precision, recall, and F1-score are computed. In addition to these criteria, execution time is also considered, assessing the time each machine learning method takes when applied with each feature reduction method individually.

4. Results

The selection of the most suitable machine learning model for this research was based on two critical parameters: accuracy and execution time. Boruta and PCA were the feature reduction methods employed, each applied independently to the four machine learning models. The paramount goal was to attain the highest accuracy with the least execution time. The results obtained are presented in various case studies as follows.

4.1 Case study 1

Table 1 presents a comparative analysis of the performance in terms of accuracy when Boruta feature selection was utilized with the four machine learning models. It was observed that the RF classifier achieved the highest accuracy at 98.7%.

Table 1. Accuracy performance comparison of Boruta application

RF Classier

SVM

KNN

NB

98.7

95.8

93.9

93.6

Table 2 illustrates the confusion matrix for the proposed model. Insights into the classification accuracy were gleaned from this matrix, wherein the diagonal readings for each classifier represent the true positive values. The RF classifier displayed superior performance, as exemplified in the "full" class with 436 instances correctly predicted and only 39 misclassified as "selected". This contrasts with the SVM classifier, where 338 instances were correctly predicted and 137 misclassified for the same class. The KNN and NB classifiers displayed similar trends. Table 3 provides a detailed classification analysis of the proposed model. The recall, precision, and F1-score for each class were calculated based on the classification metrics from the testing dataset. The RF classifier demonstrated positive precision, recall, and F1-score values across all classes. Regarding the execution time, as indicated in Table 4, the Naïve Bayes classifier recorded the lowest execution time at 0.0067 seconds, outperforming the other classifiers in this aspect.

Table 2. Confusion matrix comparative analysis (Boruta with machine learning classifiers)

Classifier

RF

SVM

State Type

Full

Selected

Shutdown

Full

Selected

Shutdown

Full

436

39

0

338

137

0

Selected

0

2376

52

52

2285

91

Shutdown

0

0

4105

0

4

4101

Classifier

KNN

NB

State Type

Full

Selected

Shutdown

Full

Selected

Shutdown

Full

333

138

4

383

92

0

Selected

116

2169

143

345

2083

0

Shutdown

0

12

4093

0

6

4099

Table 3. Classification performance analysis for Case study 1

RF Classifier

SVM Classifier

Class

Precision

Recall

F1-Score

Precision

Recall

F1-Score

Rows

Full

100%

92%

96%

87%

71%

78%

475

Selected

98%

98%

98%

94%

94%

94%

2428

Shutdown

99%

100%

99%

98%

100%

99%

4105

KNN Classifier

NB Classifier

Class

Precision

Recall

F1-Score

Precision

Recall

F1-Score

Rows

Full

74%

70%

72%

53%

81%

64%

475

Selected

94%

89%

91%

96%

86%

90%

2428

Shutdown

97%

100%

98%

100%

100%

100%

4105

Table 4. Execution time analysis in seconds

Feature Selection Method

RF Classier

SVM

KNN

NB

Boruta

0.4549

2.8060

3.4520

0.0067

4.2 Case study 2

In this case study, a comparative analysis was conducted using the PCA feature selection method with the four machine learning classifiers, focusing on accuracy. Table 5 reveals that the SVM classifier achieved the highest accuracy at 95.1%. Table 6 presents the confusion matrix results for the application of PCA with the four classifiers. The diagonal entries for each classifier correspond to the true positive values, indicating the instances where predictions were accurately aligned with the actual class.

Table 5. Accuracy performance comparison of PCA application

RF Classier

SVM

KNN

NB

91.7

95.1

93.5

86.6

The classification performance is detailed in Table 7. Notably, for the "full" class, the RF classifier exhibited a precision of 98%, yet the recall was only 13%. This indicates a high proportion of relevant instances among the retrieved results, contrasted with a low percentage of correctly categorized positive instances. Table 8 highlights the execution time for each classifier. The NB classifier recorded the shortest execution time at 0.0055 seconds, underscoring its efficiency compared to the other classifiers.

Table 6. Confusion matrix comparative analysis (PCA with machine learning classifiers)

Classifier

RF

SVM

State Type

Full

Selected

Shutdown

Full

Selected

Shutdown

Full

61

410

4

304

171

0

Selected

1

2276

151

61

2272

95

Shutdown

0

15

4090

0

13

4092

Classifier

KNN

NB

State Type

Full

Selected

Shutdown

Full

Selected

Shutdown

Full

310

161

4

164

306

5

Selected

133

2149

146

167

2070

141

Shutdown

0

10

4095

1

269

3835

Table 7. Classification performance analysis for Case study 2

RF Classifier

SVM Classifier

Class

Precision

Recall

F1-Score

Precision

Recall

F1-Score

Rows

Full

98%

13%

23%

83%

64%

72%

475

Selected

84%

94%

89%

93%

94%

93%

2428

Shutdown

96%

100%

98%

98%

100%

99%

4105

KNN Classifier

NB Classifier

Class

Precision

Recall

F1-Score

Precision

Recall

F1-Score

Rows

Full

70%

65%

68%

99%

35%

41%

475

Selected

93%

89%

91%

78%

85%

82%

2428

Shutdown

96%

100%

98%

95%

93%

94%

4105

Table 8. Execution time analysis in seconds

Feature Selection Method

RF

SVM

KNN

NB

PCA

0.4850

3.2508

3.5801

0.0055

Table 9. Accuracy comparison between PCA and Boruta

Feature Selection Method

RF

SVM

K-NN

NB

Boruta

98.7

95.9

94.1

93.6

PCA

91.7

95.1

93.5

86.6

4.3 Comparison analysis between case studies 1 and 2

The analysis of accuracy ratings for the four algorithms using two distinct feature selection methods, as presented in Table 9, reveals a significant observation. With the application of the Boruta feature selection method, the RF classifier achieved the highest accuracy, marked at 98.7%. This finding underscores the RF classifier's efficacy when combined with the Boruta method. Figure 5 provides a graphical representation, comparing the accuracies of the four machine learning models employing both Boruta and PCA feature selection methods. This comparison, illustrated with variances on the y-axis and the models on the x-axis, highlights the superior accuracy rate of 98.7% consistently achieved by the RF classifier with Boruta across various scenarios. Furthermore, the execution time analysis, as depicted in Table 10, indicates that the NB classifiers recorded the shortest execution time when both PCA and Boruta were applied, with 0.0067 seconds and 0.0055 seconds, respectively. This finding is visually represented in Figure 6, which compares the execution times of all four machine learning models when employing Boruta and PCA feature selection methods. The graph, with time in seconds on the y-axis and models on the x-axis, succinctly illustrates the efficiency of the NB classifiers in terms of execution time.

Figure 5. Chart for accuracies of the four classifiers using two feature selection methods

The comparative analysis between the two case studies, focusing on accuracy, reveals a notable outcome. When the Boruta feature selection method was applied with the RF classifier, an accuracy of 98% was achieved, representing the highest percentage among the evaluated methods. Conversely, in terms of execution time, the NB classifiers demonstrated the shortest duration, recording times of 0.0067 seconds and 0.0055 seconds with Boruta and PCA, respectively. The RF classifier, while not the fastest, was competitive, registering execution times of 0.4549 seconds with Boruta and 0.4850 seconds with PCA. This assessment indicates that the RF classifier, coupled with the Boruta feature selection method, is a viable choice for achieving the highest accuracy rate. Moreover, the implementation time for this combination is considerably efficient, closely rivaling that of the NB classifiers. Therefore, it is deduced that the objective of this research, which is to attain high accuracy while maintaining low execution time, has been successfully realized.

Table 10. Execution time comparison between PCA and Boruta

Feature Selection Method

RF Classier

SVM

KNN

NB

Boruta

0.4549

2.8060

3.4520

0.0067

PCA

0.4850

3.2508

3.5801

0.0055

Figure 6. Time consumed by the four classifiers with two feature selection methods

5. Conclusion

The challenge of energy management, particularly in the context of building power consumption, is a critical concern. This research endeavored to address this challenge by applying machine learning techniques. Four machine learning classifiers, namely, RF, SVM, KNN), and NB, were employed, each combined with PCA and Boruta feature reduction methods. The objective was to reduce the number of features while maintaining high accuracy in the models' performance. This reduction in features directly impacted the execution time required for the models. Three operational modes, namely, full, selected, and shutdown, were incorporated into the models, providing a framework for efficient power management in buildings. The models underwent training and testing using a dataset recorded over a year from a specific building. The findings demonstrated that the RF classifier, in conjunction with the Boruta feature reduction method, outperformed the other models in terms of classification accuracy (98.7%) and execution time (0.4549 seconds).

Looking towards future endeavors, the incorporation of additional variables, such as temperature and humidity, alongside occupancy status, is proposed. This integration aims to refine the predictive accuracy and reduce the execution time of the models further, thereby optimizing energy consumption in buildings more effectively.

  References

[1] Dounis, A.I. (2022). Machine intelligence in smart buildings. Energies, 16(1): 22. https://doi.org/10.3390/en16010022

[2] Boiko, O. (2022). Artificial intelligence in energy: Use cases, solutions, best practices. https://www.n-ix.com/artificial-intelligence-in-energy/, accessed on May 31, 2023.

[3] Hartman, W.T., Hansen, A., Vasquez, E., El-Tawab, S., Altaii, K. (2018). Energy monitoring and control using Internet of Things (IoT) system. In 2018 Systems and Information Engineering Design Symposium (SIEDS), Charlottesville, VA, USA, pp. 13-18. https://doi.org/10.1109/SIEDS.2018.8374723

[4] Marinakis, V. (2020). Big data for energy management and energy-efficient buildings. Energies, 13(7): 1555. https://doi.org/10.3390/en13071555

[5] Kamp, M., Koprinska, I., Bibal, A., et al. (Eds.). (2022). Machine learning and principles and practice of knowledge discovery in databases. In International Workshops of ECML PKDD 2021, Virtual Event, Proceedings, Part II. Springer, Nature. 

[6] Benavente-Peces, C., Ibadah, N. (2020). Buildings energy efficiency analysis and classification using various machine learning technique classifiers. Energies, 13(13): 3497. https://doi.org/10.3390/en13133497

[7] Kaligambe, A., Fujita, G., Keisuke, T. (2022). Estimation of unmeasured room temperature, relative humidity, and CO2 concentrations for a smart building using machine learning and exploratory data analysis. Energies, 15(12): 4213. https://doi.org/10.3390/en15124213

[8] Moghadasi, M., Izadyar, N., Moghadasi, A., Ghadamian, H. (2021). Applying machine learning techniques to implement the technical requirements of energy management systems in accordance with ISO 50001:2018, an industrial case study. Energy Sources, Part A: Recovery, Utilization, and Environmental Effects, 1-18. https://doi.org/10.1080/15567036.2021.2011989

[9] Eseye, A.T., Lehtonen, M. (2020). Short-term forecasting of heat demand of buildings for efficient and optimal energy management based on integrated machine learning models. IEEE Transactions on Industrial Informatics, 16(12): 7743-7755. https://doi.org/10.1109/TII.2020.2970165

[10] Ilager, S., Ramamohanarao, K., Buyya, R. (2020). Thermal prediction for efficient energy management of clouds using machine learning. IEEE Transactions on Parallel and Distributed Systems, 32(5): 1044-1056. https://doi.org/10.1109/TPDS.2020.3040800

[11] Bot, K., Santos, S., Laouali, I., Ruano, A., Ruano, M.D.G. (2021). Design of ensemble forecasting models for home energy management systems. Energies, 14(22): 7664. https://doi.org/10.3390/en14227664

[12] Ganesh, H.S., Seo, K., Fritz, H.E., Edgar, T.F., Novoselac, A., Baldea, M. (2021). Indoor air quality and energy management in buildings using combined moving horizon estimation and model predictive control. Journal of Building Engineering, 33: 101552. https://doi.org/10.1016/j.jobe.2020.101552

[13] Vanitha, V., Vallimurugan, E. (2022). A hybrid approach for optimal energy management system of Internet of Things enabled residential buildings in smart grid. International Journal of Energy Research, 46(9): 12530-12548. https://doi.org/10.1002/er.8024

[14] Tanasiev, V., Pătru, G. C., Rosner, D., Sava, G., Necula, H., Badea, A. (2021). Enhancing environmental and energy monitoring of residential buildings through IoT. Automation in Construction, 126: 103662. https://doi.org/10.1016/j.autcon.2021.103662

[15] Du, Z.M., Chen, S.L., Anduv, B., Zhu, X., Jin, X.Q. (2023). IoT intelligent agent based cloud management system by integrating machine learning algorithm for HVAC systems. International Journal of Refrigeration, 146: 158-173. https://doi.org/10.1016/j.ijrefrig.2022.10.022

[16] Es-Sakali, N., Cherkaoui, M., Mghazli, M.O., Naimi, Z. (2022). Review of predictive maintenance algorithms applied to HVAC systems. Energy Reports, 8: 1003-1012. https://doi.org/10.1016/j.egyr.2022.07.130

[17] KilinÇ, E., Fernandes, S., Antunes, M., Gomes, D., Aguiar, R.L. (2021). Using ML to increase the efficiency of solar energy usage in HVAC. In 2020 2nd International Conference on Societal Automation (SA), Funchal, Portugal, pp. 1-4. https://doi.org/10.1109/SA51175.2021.9507176

[18] Sahoh, B., Kliangkhlao, M., Kittiphattanabawon, N. (2022). Design and development of Internet of Things-driven fault detection of indoor thermal comfort: HVAC system problems case study. Sensors, 22(5): 1925. https://doi.org/10.3390/s22051925

[19] Issaraviriyakul, A., Pora, W., Panitantum, N. (2021). Cloud-based machine learning framework for residential HVAC control system. In 2021 13th International Conference on Knowledge and Smart Technology (KST), Bangsaen, Chonburi, Thailand, pp. 12-16. https://doi.org/10.1109/KST51265.2021.9415840

[20] Mahdavi, A., Berger, C., Tahmasebi, F., Schuss, M. (2019). Monitored data on occupants’ presence and actions in an office building. Scientific Data, 6(1): 290. https://doi.org/10.6084/m9.figshare.9822623

[21] Ozdemir, S., Susarla, D. (2018). Feature Engineering Made Easy: Identify Unique Features from your Dataset in Order to Build Powerful Machine Learning Systems. Packt Publishing Ltd.

[22] Banachewicz, K., Massaron, L. (2022). Data Analysis and Machine Learning for Competitive Data Science.

[23] Walker, M. (2022). Data Cleaning and Exploration with Machine Learning: Get to Grips with Machine Learning Techniques to Achieve Sparkling-Clean Data Quickly. Packt Publishing.

[24] Navada, A., Ansari, A.N., Patil, S., Sonkamble, B. A. (2011). Overview of use of decision tree algorithms in machine learning. In 2011 IEEE Control and System Graduate Research Colloquium, Shah Alam, Malaysia, pp. 37-42. https://doi.org/10.1109/ICSGRC.2011.5991826

[25] Wang, L.P. (Ed.). (2005). Support Vector Machines: Theory and Applications. Springer Science & Business Media.

[26] Guo, G.D., Wang, H., Bell, D., Bi, Y.X., Greer, K. (2003). KNN model-based approach in classification. In On the Move to Meaningful Internet Systems 2003: CoopIS, DOA, and ODBASE. OTM 2003. Lecture Notes in Computer Science, Springer, Berlin, Heidelberg. https://doi.org/10.1007/978-3-540-39964-3_62

[27] Leung, K.M. (2007). NBian classifier. Polytechnic University Department of Computer Science/Finance and Risk Engineering, New York University, 123-156. 

[28] Zhou, J.L., Gandomi, A. H., Chen, F., Holzinger, A. (2021). Evaluating the quality of machine learning explanations: A survey on methods and metrics. Electronics, 10(5): 593. https://doi.org/10.3390/electronics10050593