Artificial Neural Networks for Construction Project Cost and Duration Estimation

Artificial Neural Networks for Construction Project Cost and Duration Estimation

Trijeti* Rachmad Irwanto Tanjung Rahayu Andreas Tri Panudju

Civil Department, Engineering Faculty, Universitas Muhammadiyah Jakarta, Jakarta 10510, Indonesia

Industrial Engineering, Science and Technology Faculty, Universitas Bina Bangsa, Serang 42124, Indonesia

Corresponding Author Email: 
trijeti@umj.ac.id
Page: 
1449-1460
|
DOI: 
https://doi.org/10.18280/ria.370609
Received: 
17 August 2023
|
Revised: 
1 September 2023
|
Accepted: 
7 October 2023
|
Available online: 
27 December 2023
| Citation

© 2023 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).

OPEN ACCESS

Abstract: 

Construction projects are inherently distinguished by their unique durations and associated costs, factors which are pivotal in determining project efficiency and quality, and consequently impacting broader societal development. The inherent unpredictability intrinsic to these projects presents significant challenges in completing them within established schedules and budgets. This unpredictability stems largely from the distinct nature of building operations, rendering the task of construction management analysis multifaceted and complex. In addressing these challenges, the present study explores the influence of architectural building specifics on the cost and duration estimations of construction projects through the application of artificial neural networks (ANNs). Recognized for their robust capacity to generalize from complex input-output relationships within extensive datasets, ANNs were employed to analyze a database incorporating six input variables: the number of activities, the total area of the building, the type of foundation, the number of storeys, the classification of consumers and vendors, alongside two output variables, namely cost and duration. The findings reveal that construction projects entrusted to individual or small-scale contractors are more susceptible to fluctuations in cost and duration compared to those managed by larger or multi-company contractors. The selection process for contractors was significantly influenced by the factors of bidding cost and negotiation fees, with clients possessing higher financial resources more frequently opting for larger companies. The research incorporated the development of a sophisticated intelligent model in MATLAB, utilizing a feed-forward back-propagation network for analysis. The efficacy of the ANN model was rigorously evaluated against statistical benchmarks, focusing on loss-function parameters. A strong correlation was unveiled between the ANN model predictions and the empirical data, as evidenced by an exemplary average coefficient of determination (R2) of 0.99995, markedly outperforming the multiple linear regression (MLR) model, which yielded a result of 0.6986. Additional performance metrics, including the mean absolute error (MAE) of 0.2952 and the root mean square error (RMSE) of 0.5638, attested to the model's robustness. Through the implementation of this research, a significant contribution is made towards enhancing the precision of resource and time estimations for clients and contractors undertaking construction projects, while concurrently accounting for the principal constraining factors.

Keywords: 

artificial neural network (ANN), construction management, MATLAB, project cost, project duration

1. Introduction

The development of infrastructure plays a significant role in the progress and advancement of a society. The assets related to infrastructure serve as a significant indicator when assessing the level of development of a country. The reason for this is that the combination of quality and efficiency has a tangible effect on the level of development in various domains of a society [1]. The idiosyncrasy of each construction project often results in temporal and financial unpredictability, impeding the progress of construction activities. The occurrence of instability, being unanticipated, poses a challenge in the successful execution and timely delivery of projects within the predetermined budgetary constraints. Construction operations often encounter variability due to the distinctiveness of each construction project, which gives rise to diverse factors that render the analysis considerably intricate for construction managers. Diverse elements, including project site, clientele, regulatory framework, workforce, machinery, technology, subcontractors, expertise, stakeholders, and the project team, may exhibit certain variations across different projects [2]. Accurate cost estimation is crucial for ensuring the financial viability of a project. Achieving project completion within budgetary constraints remains a challenging endeavor. Contemporary projects are susceptible to cost overruns, which escalate in proportion to the complexity of the project at hand. There are several factors that contribute to delays in construction projects, including but not limited to: delays caused by contractors, clients, consultants, labor-related issues, and other external factors. According to Odesola et al. [3], prolonged delays can result in time-cost overrun, disputes, utter disregard, and litigation.

Cost-duration models have been identified as a practical financial representation tool, commonly presented in the form of a spreadsheet, mathematical equations, charts, or diagrams [1]. In the preliminary phases of project initiation, predicting the duration and cost of construction projects is highly challenging. The cost-duration models are a useful tool for determining the projected time, duration, and cost. According to the research conducted by Mahamid and Amund [4], all of the construction projects examined experienced cost divergence. The majority of these projects, specifically 77.34%, were underestimated while the remaining 22.66% were overestimated. Effective cost estimation is crucial for the success and financial viability of a project. Ballesteros-Pérez et al. [5] demonstrated that commonly used scheduling techniques, exhibit a consistent tendency to underestimate project duration and cost. One significant factor contributing to underestimation is the neglect of variability in activity duration.

Hence, in situations where there is insufficient quantitative or qualitative data, the accuracy of project duration and/or cost forecasting is expectedly unreliable. It is crucial to have an accurate prediction of the lifespan of a construction project from its inception [6]. According to Lin et al. [7], inadequate estimation, whether it is underestimated or overestimated, can result in suboptimal project outcomes and the inability to achieve project objectives.

The implementation of an analytical approach could serve to mitigate the potential for both under and over-estimation within the construction, while also facilitating the thorough analysis of construction quality to ensure proper contract fulfillment. The term ANN, or artificial neural network, refers to a computational model that emulates the information processing mechanism of the brain in living organisms. The human brain is comprised of a vast network of interconnected neurons that work together in a synchronized manner to process and interpret complex information [8]. The methodology involves a computational approach that constructs multiple processing units through interconnected pathways. The network comprises a heterogeneous collection of nodes, cells, neurons, or units that establish connections between the input and output sets, exhibiting inconsistencies. The concept being referred to is the emulation of the cognitive processes of the human brain in a computer system, as discussed by Kaveh et al. [9]. The neurons exhibit a high degree of interconnectivity and are arranged in a well-structured layered configuration. The input layer is responsible for receiving the data, whereas the output layer is responsible for producing the final result. Typically, one or the other confidential layers are interposed between the two. This configuration poses a challenge in accurately forecasting or determining the precise influx of data. The determination of the positive or negative weight of each input is based on the sign of the input's weight [10-13]. The objective of work organization is to maximize the interdependence among the essential components of production and construction processes, namely equipment, resources, employees, and information. The regulation of costs and duration required to successfully complete the constituent tasks has a significant impact on the outcome of production processes. Work organization refers to the allocation of tasks and responsibilities among members of an organization, as well as the degree to which these arrangements are coordinated to get the intended results outcomes in product or service delivery [14].

The objective of this study is to create an intelligent model utilizing artificial neural network (ANN) that can accurately predict the cost and duration of construction projects across different contexts. The primary aims of this study are to examine the variance between anticipated and factual expenses, as well as anticipated and factual timeframes of construction ventures, utilizing building information as the forecaster variables, as previously explored [15-17]. The primary objective is to attain cost and time optimization in construction project operations, thereby improving the efficacy of pre-execution planning [15, 18].

2. Literature Review

The importance of precise forecasting of construction expenses and timeline during the initial stages of a project is evident. This is because imprecise predictions, whether underestimated or overestimated, lead to budget overruns and suboptimal project outcomes in terms of failure to meet quality standards and timely completion. Soft computing techniques are considered suitable for modeling time-cost constraints in construction projects due to the presence of non-specific patterns resulting from environmental and logistical factors, as well as non-linear and discrete dependencies. This has been noted by scholars such as Mirahadi and Zayed [19] and Wang et al. [20]. Likewise, the availability of representational data for the investigative examination of the relationship between activity duration and costs is scarce, leading to the assumption of a lack of correlation between costs and activities in these analytical methods. Hence, in situations where there is insufficient quantitative or qualitative data, the accuracy of project forecasting is expected to be unreliable.

When construction projects become more complex and time-consuming, traditional methods of supervision may prove inadequate. In such cases, computer simulation techniques can be a useful tool for addressing these challenges. The authors Górecki and Diaz-Madronero [21], Rofooei et al. [22] have suggested that testing different scenarios can be an effective approach to address challenges encountered in construction projects in practice. Simulations are commonly utilized with the primary objective of minimizing expenses and project duration, while also examining diverse operational strategies across various project categories. AbouRizk [23] states that construction simulation models consist of two main components, namely activity and resource. The aforementioned pertains to the tasks and materials necessary for the completion of a given project. The stochastic nature of construction processes and various parameters influencing productivity and performance contribute to the inherent uncertainty associated with construction activities. AbouRizk [23] identified several sources of uncertainty that can arise during construction activities, including labor skills, weather conditions, and equipment breakdown.

The task of modeling all potential influencing factors in construction projects can be perceived as challenging. Despite the comprehensive modeling of all aspects of an operation, it would be exceedingly difficult to fully incorporate all site conditions and variables that may arise during the implementation of said project. The creation of a simulation model for construction processes can be accomplished through the utilization of either historical data observations or the expertise and insight of professionals, as noted by Kim et al. [24] and Emsley et al. [25]. It is recommended to utilize historical or current data when there is no anticipation of substantial modifications to the fundamental assumptions of the procedure. Relying on the expertise of professionals is a suitable approach for conceptualizing inputs that may experience fluctuations over time as a result of unforeseen shifts in the underlying factors. Simulation refers to the replication of a real-world process of a system over a period of time. Simulation tools are utilized in the construction of models to provide visual representations of various project activities, the resources employed in executing the work, and the surrounding environment of the project location. The utilization of models can facilitate the enhancement of project plans, optimization of resource utilization, reduction of costs and duration of the project, and augmentation of overall productivity [12].

The creation of an intelligent model with advanced capabilities to predict these factors with precision would facilitate effective project planning by project managers and clients. The resulting data would serve as a reliable guide for work supervision and control, ultimately leading to the attainment of desired quality within the specified timeframe [26]. Accurate cost estimation is crucial for ensuring the financial viability of a project. The issues pertaining to the temporal and financial resources necessary for the completion of a project are of significant importance, as they play a pivotal role in determining the feasibility of achieving the success criteria. Effective scheduling is essential to ensure that quality requirements are met in accordance with the standards set by the professional code. The accuracy of estimation methods has been identified as a weak point in the construction planning process, as noted by Wang et al. [27]. Several contractors can recollect a few craftsmen whose skills have ultimately resulted in business failure, and a significant number of these failures can be attributed to inadequate estimating practices. The issue of estimation, despite its prevalence, requires a systematic approach as emphasized by Xu [28].

Artificial neural networks (ANNs) are computational models that emulate the information processing mechanisms of the brain and nervous system in living organisms. The system is comprised of a vast quantity of interconnected processing elements, commonly referred to as neurons, which operate in a cohesive manner to arrive at a resolution for a particular issue [15, 29]. Neurons, also referred to as biological neurons or nerve cells, constitute the fundamental units of the nervous system and brain. They are responsible for receiving sensory input from the external environment through dendrites, processing it, and transmitting output via axons.

Figure 1. Perceptron

The cell body, also known as the soma, is a crucial component of the neuron cell as it contains the nucleus and facilitates vital biochemical transformations necessary for the survival of neurons. The transmission of information between neurons is facilitated through the axon and synapses [30]. The most prominent perceptron in contemporary usage was first introduced by McCulloch and Pitts in 1943, which emulated the operational mechanism of a biological neuron. The nomenclature used to describe a neural network consisting of a solitary layer and a singular output is a Perceptron, as depicted in Figure 1.

The system executes the tasks of combining the bias and inputs through their respective weightings, followed by decision-making based on the resulting aggregation. In the context of neural networks, a set of input parameters denoted by x0, x1, x2, x3, …, xn is used to represent a single observation. The aforementioned inputs undergo multiplication by a synapse or connection weighted function, as indicated by previous studies [15]. The weights are denoted by the symbols w0, w1, w2, w3, ..., wn. The weight attribute of a node is indicative of its strength. The variable "b" represents a bias value. The inclusion of a bias term permits the manipulation of the activation function's vertical position by means of a predetermined intercept, thereby precluding the emergence of plots originating from the origin. In the simplest scenario, the summation of the products is subjected to a transfer function, also known as an activation function, resulting in an output. The mathematical representation of the summation/transfer function can be found in Equation 1, as described by Kaveh and Servati [31], Rezaei and Kheirkhah [32].

$x_0 \times w_0+x_1 \times w_1+\cdots x_n \times w_n=\sum x_i \times w_i$              (1)

Additionally, dendrites, which are fine, tubular extensions surrounding neurons, are present. The dendrites of a neuron exhibit arborization and receive incoming signals in a radial manner around the soma. The axon is a slender, elongated, cylindrical entity that operates in a manner akin to a transmission line [33-35]. Neurons exhibit a complex spatial arrangement through their interconnections. As the axon reaches its ultimate destination, it terminates at the nerve fiber responsible for transmitting impulses away. The terminal end of the axon is comprised of intricate and comprehensive formations known as synapses. The process of synaptic transmission involves the detachment of one neuron from another at these synaptic junctions. Dendrites function as the primary site for receiving synaptic input from neighboring neurons. The soma gradually processes incoming signals and transforms the resulting value into an output. The activation function is a crucial element for an artificial neural network to acquire knowledge and comprehend intricate concepts. The primary function of artificial neural network nodes is to convert an input signal into an output signal. The underlying justification is to introduce non-linearity into the neuronal output. In the absence of an activation function, the resulting output signal would be a linear function, specifically a polynomial of degree one. Therefore, a linear function is relatively straightforward to solve, yet its complexity is limited, and it has a lower degree of computational power. The absence of an activation function in our model impedes its ability to effectively learn and model complex data, as noted by Alaneme et al. in their recent publications [36, 37].

3. Material and Methods

This investigational research's study area is Jakarta which is a Special Capital Region as shown in Figure 2. The city of Jakarta is located at the coordinates of Latitude-Longitude, 06 South Latitude - 106 East Longitude. The area of Jakarta is 7,659.02 square kilometers. Until 2020, the population of Jakarta is 11,100,929 people with a city density of 16,718 inhabitants per square km. Jakarta is the nation's capital and the largest city in Indonesia. Jakarta is the only city in Indonesia that has province-level status.

Figure 2. Study area map

3.1 Methodology

Figure 3 depicts the logical sequence that outlines the methodical structure for research phases and scientific methodologies. The initial step in the evaluation of the research survey involves the establishment of well-defined study objectives, which are subsequently followed by a comprehensive analysis of pertinent literature pertaining to the assessment of the duration and cost of civil construction projects. The initial step in conducting a study is the implementation of a crucial process that involves the creation of a research tool or instrument and the selection of a sample. This practical step is aimed at achieving the study’s objectives and goals.

Figure 3. Research methodology flowchart

3.2 Data collection

The study employs questionnaires as the principal means of data collection. These questionnaires will be distributed to management, customers, and site engineers of registered building construction firms the Minister for Public Works and Human Settlements. The survey instrument distributed to a subset of participants via electronic mail, while others will be approached through face-to-face interaction. The study evaluated the impact of building information, including factors such as foundation type, such as the intended use of the construction, total number of the floors, constructing area, category of customers, and contractors, on the cost and duration of construction projects, is a subject of academic interest.

For the present study, the design of observation forms and questionnaires has been adopted, as suggested by Fellows and Liu [38]. The questionnaire has been skillfully crafted to evaluate study-related difficulties, and the outcomes derived from this survey will serve as the system's data sets, which will be subjected to analysis through an artificial neural network (ANN) model. The predictive efficacy of the smart intelligent model generated is assessed through statistical techniques and juxtaposed with the results obtained from multiple linear regression (MLR). The results obtained through computation are analyzed and interpreted in order to draw necessary conclusions for practical use and to incorporate these findings into the current body of knowledge. Ultimately, the investigative findings of Wang et al. [39] are utilized to formulate conclusions and recommendations.

The questionnaire administered to the respondents required them to provide detailed information on these factors [26]. The present research aims to improve the efficacy of decision-making processes in project planning, particularly following the acquisition of architectural and structural specifications for the proposed edifice. This will be achieved by offering a valuable assessment of the anticipated cost and duration. This research study involved the establishment of two distinct varieties of project datasets. The initial item is scrutinized at both the activity and project levels, whereas the subsequent item encompasses project-level data such as anticipated and factual activity durations and expenses, and will be employed for explanatory objectives during the deliberations. In order to derive accurate activity duration and cost estimates. According to Phillips and Stawarski [40], the range and quantity of project types, expenses, timeframes, structures, and activity quantities are considered to be adequately inclusive for the purposes of the analysis. The expressions in Eqs. (2)-(3) are utilized to calculate the cost deviations and activity duration for each activity i in the initial dataset.

$\sigma A A c t_i= \textit{log}_{10}\left(\frac{A A c t_i}{PA c t_i}\right)$       (2)

$\sigma ACost_i= \textit{log}_{10}\left(\frac{A Cost_i}{PCost_i}\right)$             (3)

where, σAActi=Activity duration deviations of activity i; σACosti=Activity cost deviations of activity i; AACti=the actual activity i duration, PActi=the activity i planned duration, ACosti=the actual cost of activity i, Pcosti=the activity i planned cost.

The phrase in question is represented in a logarithmic scale, which is a crucial aspect to consider due to the asymmetrical nature of the duration and cost variable ratios with respect to the value of 1, as they are invariably positive. The phenomenon of scale distortion can occur when the range of values falls between 0 and 1, particularly in situations where the denominator greatly exceeds the numerator. However, when the numerator surpasses the denominator, the range shifts to between 1 and positive infinity. This phenomenon can lead to an artificially positive skewness in the distribution of data, which can only be remedied by pre-processing the data through the application of log ratios. Furthermore, when using a logarithmic scale, the variances of variables exhibit additive properties rather than multiplicative ones.

It is noteworthy that ratios within the natural scale ranging from 0 to 1 are equivalent to values ranging from infinity to 0 within any logarithmic scale. While ratios in the natural scale ranging from 0 to positive infinity correspond to the range of (0, +00) [41].

3.3 Hypothesis

A hypothesis is a conjecture based on knowledge and experience regarding a particular phenomenon that can be subjected to empirical verification through either observation or experimentation. The utilization of statistical methodology to analyze experimental or survey data in order to identify significant associations between factor variables is known as a statistical approach. The fundamental objective is to assess the validity of the obtained outcomes by examining the probability that chance alone could have produced them. If the outcomes obtained were due to random chance, then the experiment or observation holds minimal or negligible statistical significance and cannot be replicated. The process of hypothesis testing involves the utilization of a set of sample data to ascertain the viability of accepting or rejecting the null hypothesis (Ho). In the event that the null hypothesis (Ho) is rejected, it can be concluded statistically that the alternative hypothesis (Ha) holds true. The P-value is utilized to draw conclusions in applications of hypothesis testing. This communication pertains to the probability of the outcomes, under the assumption that the null hypothesis (Ho) holds true. According to Ikpa et al. [42], the null hypothesis is considered rejected if the P value is below 0.05, which corresponds to a confidence interval of 95%.

Ho: Information about buildings does not affect the duration of the project or cost.

Ha: Information about buildings affects the duration of the project or cost.

3.4 Evaluation of the model's efficiency

The performance of the developed model was assessed to determine its capacity to accurately estimate or predict response parameters with a satisfactory level of precision. The study employed a Multiple Linear Regression (MLR) model to assess the performance of an artificial neural network (ANN) model. The prediction performance criteria used were the mean absolute error (MAE) and root mean square error (RMSE), which are commonly used statistical loss-function parameters. These criteria were selected based on relevant literature and are used to measure the errors between paired observations expressing the same phenomenon. The formula for RMSE was presented in Eqs. 4-5. Where Ei represents the actual values, and Mi represents the model predicted results [33, 43].

$\mathrm{RMSE}=\sqrt{\frac{\sum_{i=1}^n\left(E_i-M_i\right)^2}{n}}$         (4)

$\mathrm{MAE}=\frac{1}{n} \sum_{i=1}^n\left|E_i-M_i\right|$        (5)

4. Result and Discussion

The collected data from questionnaires and surveys were meticulously organized to facilitate the assessment of key variables that impact the financial expenditure and time frame associated with diverse building construction endeavors. These projects were categorized into different types, including housing, commercial, industrial, and goverment structures. The study evaluated the impact of several factors, including the nature of activities, construction area in square meters, foundation type, total of floors, client classification, and contractors involved, on cost and duration outcomes. The results indicate that projects assigned to sole and mini-contractors exhibit greater variability in cost and duration, which can be attributed to inadequate modernisation, technology, and staffing provided by the firm for managing and controlling construction project activities. Multinational corporations and other entities in the same category possess advanced tools and equipment that are utilized to achieve optimal outcomes in terms of desired quality within shorter timeframes, thereby facilitating effective project management. These measures prevent the occurrence of cost overruns and facilitate the timely completion of projects while achieving the prescribed quality specifications. The selection of contractors for construction projects was found to be influenced by the costs of bidding and negotiation fees. Clients with greater financial resources, such as government and corporate entities, tended to engage the services of medium and multi-sized companies. This observation was made by Hammad et al. [44] and AlSehaimi et al. [45].

4.1 Respondents’ demographical characteristics

Table 1. Respondents’ demographical characteristics

Variables

Divisions

Freq

(%)

Sex

Male

61

78.21

Female

17

21.79

Age

20–35

13

16.67

36–45

39

50.00

46–60

20

25.64

>60

6

7.69

Occupation

Civil Engineer

32

41.02

Builder

18

23.08

Project Manager

20

25.64

Client

8

10.26

Experience

1–20 years

10

16.67

21-30 years

33

42.31

31-45 years

27

34.62

>45

8

10.26

 

Total

78

100.00

The study involved the distribution of 120 questionnaires to individuals who hold significant roles in infrastructural construction projects. Out of the total number of questionnaires administered, 78 were completed and returned, resulting in a response rate of 65%. The responses provided by these individuals were utilized for the analysis conducted in this study. Table 1 displays the demographic characteristics of the respondents in terms of percentage (%) and frequency distribution. Based on the tabulated data, it can be observed that 21.79% of the respondents were female while 78.21% were male. Additionally, the data reveals that 41.02%, 23.08%, and 25.64% of the participants identified as civil engineers, builders, and project managers, respectively. The data indicates that 42.31% of the respondents had 21-30 years of experience, while 34.62% had 31-45 years of experience.

4.2 Statistical test

The research conducted an assessment of the correlations and interrelationships between building information details, construction time frame, and costs. This was accomplished by utilizing a 3D surface plot with wireframe, as depicted in Figure 4. The plot analysis revealed that the factors of building area (BA), number of stories, and activity (Act.) had a favorable impact on cost. The input and output variables were represented by distribution histograms, as depicted in Figure 5. The histograms display the frequency of occurrence of each unique value within the dataset. Negligible or minimal skewness was detected in both categories of parameters employed. Table 2 presents the fundamental statistical measures, including statistical mean, standard deviation, variance skewness, and kurtosis, which have been reported to exhibit satisfactory values in the studies conducted by Alaneme et al. [11] and Rofooei et al. [22].

Table 2. Output-input statistical functions

Parameters

Mean

Standard Deviation

Sample Variance

Kurtosis

Minimum

Maximum

Skewness

Output variables

 

 

 

 

 

 

 

Activity_Cost (Rupiah)

1.87*106

1.37*108

1.88*1016

6.53

32,147,895.00

860,187,425.00

1.94

Activity_Duration

497.69

235.21

51,719.90

−1.25

136.00

878.00

0.36

Input_variables

 

 

 

 

 

 

 

Activity

229.34

124.26

16,966.17

−0.87

36.00

478.00

0.46

B_A (m2)

571.73

207.99

43,257.89

−0.88

266.00

976.00

0.21

Floor_Type

1.84

0.76

0.62

−1.27

1.00

4.00

0.40

Storeys

3.31

1.96

3.64

−0.33

0.00

7.00

0.53

Contractors

2.76

0.84

0.72

−0.82

1.00

5.00

−0.08

Client

3.18

1.33

1.76

−1.10

1.00

4.00

−0.16

Figure 4. 3D surface plots of factor interactions

Figure 5. Input (yellow) and output (pink) histograms

4.3 Pearson correlation

Table 3 displays the utilization of Pearson correlation coefficients in assessing the linear correlation between input and output variables, as indicated by prior research. The findings illustrate the performance of the variables being examined to enable a thorough evaluation of the impact of construction particulars on both the timeline and financial expenditure of the undertaking. The findings suggest that the input variables exhibit a more pronounced positive correlation with project duration than the cost variables. The findings in the studies [46, 47] indicate that the project activities, storey number, and building area exhibit a strong positive correlation with the response variables of cost and duration.

Figure 6 displays the deviation pertaining to the duration and cost variables. The logarithmic function was employed to rescale the data sets, which were initially outside the boundary limits of 0-1, in order to address the issue of additive variances (scale distortion) that arose from the significant differences between the numerator and denominator. The findings suggest that the cost variable exhibited greater deviation outcomes in comparison to the duration parameter. The observed variances are attributed to the crucial factors that were carefully chosen as independent variables in this research investigation. These variables will be assessed using an intelligent modeling system, as suggested by Mačková and Bašková [48] and Wang et al. [39].

Table 3. Pearson’s correlations

 

Activit_Cost (Rp)

Activity_Duration (days)

Activity

BA (m2)

Floor_Type

Storeys

Contractors

Clients

Activit_Cost (Rp)

1

 

 

 

 

 

 

 

AD Activity_Duration (days)

0.581884

1

 

 

 

 

 

 

Activity

0.621422

0.969436

1

 

 

 

 

 

BA (m2)

0.501776

0.907812

0.861306

1

 

 

 

 

Floor_Type

0.279769

0.194762

0.264976

0.069056

1

 

 

 

Storeys

0.433693

0.817498

0.826987

0.624709

0.177922

1

 

 

Contractors

0.29593

0.425558

0.463985

0.389693

0.119046

0.529844

1

 

Clients

0.199704

0.391148

0.365789

0.487272

0.065506

0.258053

0.307584

1

Figure 6. Expense and duration

4.4 ANN model

The survey found various independent factors that affect building cost and length. Six input variables and two goal responses—construction costs and duration—comprise the model architecture. Descriptive statistical analysis and correlation analysis of survey data determine these factors. The alternative hypothesis is supported by a statistically significant positive linear correlation between the variables.

Table 4. ANN processing parameters

Parameters

Setting

General

 

Type

Input–output and curve fitting (nftool)

Numberofhiddenneurons

22

Training function

Levenberg–Marquardt (Trainlm)

Data division

Random

Activation functions

Tansig, purelin

Adaptation learning function

Gradient descent with momentum weight and bias learning function (Learngdm)

Performance

Mean squared error (MSE)

Calculation

MATLAB, Agiel

Network type

Feed-forward back propagation

Sampling

 

Training

70%,_(54 samples)

Testing

15%,_ (12 samples)

Validation

15%,_ (12 samples)

Table 4 and Figure 7 illustrate neural network processing parameters. The depicted two-layer feed-forward network has 6-22-2 architecture, with tansig hidden neurons and linear output neurons. It has been established that this network can effectively fit multi-dimensional mapping problems, provided that the data is consistent and there are sufficient neurons in the hidden layer. This has been demonstrated in previous studies conducted by Neeraja and Swaroop [49] and Alaneme et al. [46]. In this analytical study, the optimal number of neurons was determined by utilizing the evaluation criteria of mean squared error (MSE) and R-values. The study analyzed a range of neuron quantities, ranging from 1 to 25, in order to identify the most effective network for the artificial neural network model that was created. The results of the performance test indicate that the optimal generalization outcomes were achieved by 22 neurons. This was determined based on the test criteria results obtained from the network's training, validation, and testing phases, as illustrated in Figures 8 and 9.

Figure 7. ANN architecture

Figure 8. Variable hidden layer neuron R-values

Figure 9. Variable hidden layer neuron MSE

4.5 ANN training

Figure 10, illustrates the state of training of the artificial neural network (ANN). The gradient value observed was 4.6854, which was the optimal outcome achieved after 15 Epoch. However, the validation checks were unsuccessful at the 6th Epoch, as the errors were recurrently observed six times before the process ultimately ceased. This denotes the optimal network performance at a given stage, beyond which further improvement is unattainable. The error function exhibits zero crossings during epoch 0-5, followed by a slight increase to values of 1 and 2 during epoch 6 and 7, respectively. However, it was observed that the data exhibited over-fitting starting from epoch 10. As per Alaneme et al. [11], the final weights were chosen by selecting epoch 9 as the base.

Figure 10. ANN training state

4.6 ANN validation

The Mean Squared Error (MSE) was utilized as the loss function parameter to assess the performance of the developed artificial neural network (ANN) model, as depicted in Figure 11. The optimized network (6-22-2) demonstrated the most favorable validation performance, with a score of 7.5443 at Epoch 9. The smart model reliably predicts target output parameters and generalizes complicated variable inputs with low error. This is supported by previous studies conducted by Alaneme George and Mbadike Elvis [15] and Uwanuakwa [30].

Figure 11. ANN validation

4.7 ANN error

The error histogram in Figure 12 shows a good connection between experimental and projected outcomes using 20 bins for network training, testing, and validation. The concept of zero error serves as an indicator of optimal performance. Approximately 95% of the data produces an error that is less than 1%. The error function with a training set of ninety (90) instances, displays a yellow line at 0.02904 to indicate the presence of a zero error.

Figure 12. ANN error histogram

4.8 ANN regression

Figure 13 compares empirical data to ANN model estimates. The coefficient of determination and mean squared error (MSE) were employed to evaluate the performance of the model on the training, validation, and testing datasets. The y-axis of the plot displays the output values estimated by the ANN model, while the x-axis represents the actual data as target values. The statistical analysis indicates that the ANN model exhibits satisfactory prediction accuracy, with validation, training, and testing correlation coefficients (R) of 0.9311, 0.99564, and 0.93195 [50].

Figure 13. Regression plot

4.9 Model validation

The statistical comparison of the datasets generated through the process of smart intelligent modeling is presented in the histogram charts depicted in Figures 14 and 15. Loss-function parameters like MAE and RMSE were used to assess the ANN model's prediction accuracy. Additionally, the adequacy of the model was further verified through the implementation of multiple linear regression (MLR) computation [36].

Table 5 and Figures 16 and 17 display the outcome summary of the multiple linear regression (MLR) modeling. The regression coefficients and model summary suggest that the regression model has a COD of 69.88% and is non-robust and ineffective. This approach provided a satisfactory means of evaluating the performance of the developed artificial neural network model in terms of enhanced predictive accuracy.

Table 6 presents a summary of the computation of loss-function validation results, revealing a good connection between actual data and ANN model estimates. The average coefficient of determination (R2) is 99.9995%, while the MAE and RMSE are 0.2952 and 0.5638, respectively. The obtained performance evaluation outcomes align with the performance assessment results of the studies [11, 12] for the adaptive neuro-fuzzy inference system (ANFIS) and artificial neural network (ANN).

Table 5. Multi-regression analysis

Model__Summary

Regression__Coefficients

Variables

S

R2

R2-adj

R2-pred

constant

Act

BA (m2)

FT

Storey

Contractor

Clients

Cost

1.08*108

42.86%

38.04%

25.93%

3.8*107

1*106

− 1*105

1*107

− 2*107

1*107

− 2*106

Duration

41.5857

96.86%

96.59%

96.20%

37.8

1.039

0.3705

− 0.35

20.08

− 14.81

− 1.4

Table 6. ANN performance model

Output

Parameter Statistic

Requirements

Calculated Results

Remarks

Cost

MAE

Close to 0

0.38599

Very good

 

RMSE

Close to 0

0.4658

Good

 

R2

> 0.8

1

Excellent

Output

Parameter Statistic

Requirements

Calculated results

Remarks

Duration

MAE

Close to 0

0.204428

Very good

 

RMSE

Close to 0

0.66184

Good

 

R2

> 0.8

0.99999

Excellent

Figure 14. Cost model vs. actual results

Figure 15. Duration model vs. actual results

Figure 16. Cost variable MLR residual graphs

Figure 17. Duration MLR residual graphs

The study found that the smart intelligent model demonstrated strong and reliable performance in forecasting the cost and duration of projects, taking into account various factors such as the physical characteristics of the building, as well as the classification of the contractor and client. The generated model is subject to certain limitations, namely the inability to integrate non-linear and intricate factors such as submitting proposals and the contract negotiating, supply chain, and safety constraints.

5. Conclusion

According to survey data, construction activity, construction area, type of foundations, total storeys, clients class, and contractors were more positively correlated with project duration than cost. Conversely, the cost variable demonstrated a higher degree of deviation in comparison to the duration parameter. The study developed a back forward propagation network using random data division and Levenberg-Marquardt training algorithm, selecting an optimized architecture based on MSE and Tansig and Purelin activation functions. The performance of the ANN model was assessed through two methods.

Firstly, MLR statistics were utilized, and secondly, MAE and RMSE parameter were employed to ascertain the sufficiency of the developed model.

The findings of this study demonstrate the development of a study model, as evidenced by the computation results. The model exhibits a mean absolute error (MAE) of 0.2952 and a root mean square error (RMSE) of 0.5638. Additionally, the coefficient of determination (COD) is calculated to be 99.9995%, which surpasses the 69.86% COD observed in the multiple linear regression (MLR) model.

This indicates that a model has an extremely high R-squared value, nearly equal to 100%. This implies that the independent variables in this model explain almost all of the variance in the dependent variable, suggesting a very strong fit of the model to the data. the MLR model explains about 69.86% of the variance in the dependent variable, which suggests that the model is moderately effective at explaining the variation in the data but leaves a significant portion unexplained.

The incorporation of building information details in the evaluation projects would facilitate the provision of precise and efficient data for budgeting and scheduling. The results of this research are crucial in delivering accurate and timely forecasts, which can prove highly beneficial during the planning phase of construction projects. This can enable both the client and contractor to determine the necessary resources and precise period required for completion, while taking into account critical constraint factors.

Nevertheless, the limits of the developed model encompass the incapacity to integrate procurement and contract negotiations, supplier chain, and safety constraint elements, which are fundamentally non-linear and exceedingly intricate to measure.

6. Recommendation

The evaluation of the correlation between building information and the cost and length of construction projects assumes significant importance in facilitating the processes of planning, budgeting, and scheduling. The findings derived from this research investigation will offer invaluable assistance in the decision-making process aimed at attaining the requisite level of quality for the project outputs. The study found that the construction duration and cost of a project were significantly affected by the building information, such architectural geometrical design, customer, and the contractor's class. The study's outcomes will provide guidance to project managers, clients, and construction stakeholders on how to effectively manage and implement projects within the designated budget and schedule. It is recommended that further research be conducted, particularly in the realm of applying soft computing techniques to thoroughly evaluate the multicollinearity among factor variables.

  References

[1] Elmousalami, H.H. (2020). Artificial intelligence and parametric construction cost estimate modeling: State-of-the-art review. Journal of Construction Engineering and Management, 146(1): 03119008. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001678

[2] Chudley, R., Greeno, R. (2006). Building construction handbook. Routledge. https://doi.org/10.4324/9781315780320

[3] Odesola, I.F., Ige, E.O., Adesokan, A.A., Ige, I.O.A. (2019). An ANN approach for estimation of thermal comfort and sick building syndrome. Revue d'Intelligence Artificielle, 33(2): 151-158. https://doi.org/10.18280/ria.330211 

[4] Mahamid, I., Amund, A. (2010). Analysis of cost diverge in road construction projects. In Proceedings of the 2010 Annual Conference of the Canadian Society for Civil Engineering, pp. 1490–1499.

[5] Ballesteros-Pérez, P., Larsen, G.D., González-Cruz, M.C. (2018). Do projects really end late? On the shortcomings of the classical scheduling techniques. JOTSE: Journal of Technology and Science Education, 8(1): 17-33. https://doi.org/10.3926/jotse.303

[6] Jin, R., Han, S., Hyun, C., Cha, Y. (2016). Application of case-based reasoning for estimating preliminary duration of building projects. Journal of Construction Engineering and Management, 142(2): 04015082. https://doi.org/10.1061/(ASCE)CO.1943-7862.0001072

[7] Lin, M.C., Tserng, H.P., Ho, S.P., Young, D.L. (2011). Developing a construction-duration model based on a historical dataset for building project. Journal of Civil Engineering and Management, 17(4): 529-539. https://doi.org/10.3846/13923730.2011.625641

[8] Feng, G.L., Li, L. (2013). Application of genetic algorithm and neural network in construction cost estimate. Advanced Materials Research, 756: 3194-3198. https://doi.org/10.4028/www.scientific.net/AMR.756-759.3194

[9] Kaveh, A., Gholipour, Y., Rahami, H. (2008). Optimal design of transmission towers using genetic algorithm and neural networks. International Journal of Space Structures, 23(1): 1-19. https://doi.org/10.1260/026635108785342073

[10] Onyelowe, K.C., Fazal, E.J., Michael, E.O., Ifeanyichukwu, C.O., Alaneme, G.U., Chidozie, I. (2021). Artificial intelligence prediction model for swelling potential of soil and quicklime activated rice husk ash blend for sustainable construction. Jurnal Kejuruteraan, 33(4): 845-852.

[11] Alaneme, G.U., Onyelowe, K.C., Onyia, M.E., Bui Van, D., Dimonyeka, M.U., Nnadi, E., Ogbonnna, C., Odum L.O., Aju, D.E., Abel, C., Udousoro, I., Onukwugha, E. (2021). Comparative modelling of strength properties of hydrated-lime activated rice-husk-ash (HARHA) modified soft soil for pavement construction purposes by artificial neural network (ANN) and fuzzy logic (FL). Jurnal Kejuruteraan, 33(2): 365-384. https://doi.org/10.17576/jkukm-2021-33(2)-20

[12] Onyelowe, K.C., Jalal, F.E., Onyia, M.E., Onuoha, I.C., Alaneme, G.U. (2021). Application of gene expression programming to evaluate strength characteristics of hydrated-lime-activated rice husk ash-treated expansive soil. Applied Computational Intelligence and Soft Computing, 2021: 1-17. https://doi.org/10.1155/2021/6686347

[13] Hassannejad, H., Pakbaz, M.S., Mehdizadeh, R. (2015). Comparison and evaluation of artificial neural network (ANN) training algorithms in predicting soil type classification. Pharmacology and Life Sciences Bull. Bulletin of Environment, Pharmacology and Life Sciences, 4: 212-218.

[14] Flintsch, G.W., Chen, C. (2004). Soft computing applications in infrastructure management. Journal of Infrastructure Systems, 10(4): 157-166. https://doi.org/10.1061/(ASCE)1076-0342(2004)10:4(157)

[15] Alaneme George, U., Mbadike Elvis, M. (2019). Modelling of the mechanical properties of concrete with cement ratio partially replaced by aluminium waste and sawdust ash using artificial neural network. SN Applied Sciences, 1(11): 1514. https://doi.org/10.1007/s42452-019-1504-2

[16] Alaneme, G.U., Mbadike, E.M. (2021). Optimisation of strength development of bentonite and palm bunch ash concrete using fuzzy logic. International Journal of Sustainable Engineering, 14(4): 835-851. https://doi.org/10.1080/19397038.2021.1929549

[17] Alaneme, G.U., Mbadike, E.M. (2021). Experimental investigation of Bambara nut shell ash in the production of concrete and mortar. Innovative Infrastructure Solutions, 6: 1-13. https://doi.org/10.1007/s41062-020-00445-1

[18] Yilmaz, I., Yuksek, G. (2009). Prediction of the strength and elasticity modulus of gypsum using multiple regression, ANN, and ANFIS models. International Journal of Rock Mechanics and Mining Sciences, 46(4): 803-810. https://doi.org/10.1016/j.ijrmms.2008.09.002

[19] Mirahadi, F., Zayed, T. (2016). Simulation-based construction productivity forecast using neural-network-driven fuzzy reasoning. Automation in Construction, 65: 102-115. https://doi.org/10.1016/j.autcon.2015.12.021

[20] Wang, Y., Yang, Z., Zhang, F., Qin, Y., Wang, X., Lv, B. (2020). Microstructures and properties of a novel carburizing nanobainitic bearing steel. Materials Science and Engineering: A, 777: 139086. https://doi.org/10.1016/j.msea.2020.139086

[21] Górecki, J., Diaz-Madronero, M. (2020). Who risks and wins?—Simulated cost variance in sustainable construction projects. Sustainability, 12(8): 3370. https://doi.org/10.3390/SU12083370

[22] Rofooei, F.R., Kaveh, A., Farahani, F.M. (2011). Estimating the vulnerability of the concrete moment resisting frame structures using artificial neural networks. International Journal of Optimization in Civil Engineering, 1(3): 433-448.

[23] AbouRizk, S. (2010). Role of simulation in construction engineering and management. Journal of Construction Engineering and Management, 136(10): 1140-1153. https://doi.org/10.1061/(ASCE)CO.1943-7862.0000220

[24] Kim, M.J., Min, S.H., Han, I. (2006). An evolutionary approach to the combination of multiple classifiers to predict a stock price index. Expert Systems with Applications, 31(2): 241-247. https://doi.org/10.1016/j.eswa.2005.09.020

[25] Emsley, M.W., Lowe, D.J., Duff, A.R., Harding, A., Hickson, A. (2002). Data modelling and the application of a neural network approach to the prediction of total construction costs. Construction Management & Economics, 20(6): 465-472. https://doi.org/10.1080/01446190210151050

[26] Eskander, R.F.A. (2018). Risk assessment influencing factors for Arabian construction projects using analytic hierarchy process. Alexandria Engineering Journal, 57(4): 4207-4218. https://doi.org/10.1016/j.aej.2018.10.018

[27] Wang, Y.R., Yu, C.Y., Chan, H.H. (2012). Predicting construction cost and schedule success using artificial neural networks ensemble and support vector machines classification models. International Journal of Project Management, 30(4): 470-478. https://doi.org/10.1016/j.ijproman.2011.09.002

[28] Xu, Z.H. (2020). Construction and optimization of talent training quality based on data mining. Ingénierie des Systèmes d’Information, 25(4): 419-425. https://doi.org/10.18280/isi.250403 

[29] Juszczyk, M., Leśniak, A., Zima, K. (2018). ANN based approach for estimation of construction costs of sports fields. Complexity, 2018: 7952434. https://doi.org/10.1155/2018/7952434

[30] Uwanuakwa, I.D., Idoko, J.B., Mbadike, E., Reşatoğlu, R., Alaneme, G. (2022). Application of deep learning in structural health management of concrete structures. In Proceedings of the Institution of Civil Engineers-Bridge Engineering, pp. 1-8. https://doi.org/10.1680/jbren.21.00063

[31] Kaveh, A., Servati, H. (2001). Design of double layer grids using backpropagation neural networks. Computers & Structures, 79(17): 1561-1568. https://doi.org/10.1016/S0045-7949(01)00034-7

[32] Rezaei, S., Kheirkhah, A. (2017). Applying forward and reverse cross-docking in a multi-product integrated supply chain network. Production Engineering, 11: 495-509. https://doi.org/10.1007/s11740-017-0743-6

[33] Alaneme, G.U., Mbadike, E.M., Attah, I.C., Udousoro, I.M. (2022). Mechanical behaviour optimization of saw dust ash and quarry dust concrete using adaptive neuro-fuzzy inference system. Innovative Infrastructure Solutions, 7: 1-16. https://doi.org/10.1007/s41062-021-00713-8

[34] Alaneme, G.U., Attah, I.C., Mbadike, E.M., Dimonyeka, M.U., Usanga, I.N., Nwankwo, H.F. (2022). Mechanical strength optimization and simulation of cement kiln dust concrete using extreme vertex design method. Nanotechnology for Environmental Engineering, 7: 1-24. https://doi.org/10.1007/s41204-021-00175-4

[35] Taşan, S., Demir, Y. (2020). Comparative analysis of MLR, ANN, and ANFIS models for prediction of field capacity and permanent wilting point for Bafra plain soils. Communications in Soil Science and Plant Analysis, 51(5): 604-621. https://doi.org/10.1080/00103624.2020.1729374

[36] Alaneme, G.U., Onyelowe, K.C., Onyia, M.E., Van Bui, D., Mbadike, E.M., Dimonyeka, M.U., Attah, I.C., Ibe, U., Kumari, S., Firoozi, A.K., Oyagbola, I. (2020). Modelling of the swelling potential of soil treated with quicklime-activated rice husk ash using fuzzy logic. Umudike Journal of Engineering and Technology (UJET), 6(1): 1-22.

[37] Alaneme, G.U., Onyelowe, K.C., Onyia, M.E., Bui Van, D., Mbadike, E.M., Ezugwu, C.N., Dimonyeka, M., Attah, I.C., Ogbonnna, C., Ikpa, C., Udousoro, I., Udousoro, I.M. (2020). Modeling volume change properties of hydrated-lime activated rice husk ash (HARHA) modified soft soil for construction purposes by artificial neural network (ANN). Umudike Journal of Engineering and Technology (UJET), 6(1): 1-12. https://doi.org/10.33922/j.ujet_v6i1_9

[38] Fellows, R.F., Liu, A.M. (2021). Research methods for construction. John Wiley & Sons.

[39] Wang, W.C., Bilozerov, T., Dzeng, R.J., Hsiao, F.Y., Wang, K.C. (2017). Conceptual cost estimations using neuro-fuzzy and multi-factor evaluation methods for building projects. Journal of Civil Engineering and Management, 23(1): 1-14. https://doi.org/10.3846/13923730.2014.948908

[40] Phillips, P.P., Stawarski, C.A. (2008). Data collection: Planning for and collecting all types of data. John Wiley & Sons.

[41] Batselier, J., Vanhoucke, M. (2015). Construction and evaluation framework for a real-life project database. International Journal of Project Management, 33(3): 697-710. https://doi.org/10.1016/j.ijproman.2014.09.004

[42] Ikpa, C.C., Alaneme, G.U., Mbadike, E.M., Nnadi, E., Chigbo, I.C., Abel, C., Udousoro, I., Odum, L.O. (2021). Evaluation of water quality impact on the compressive strength of concrete. Jurnal Kejuruteraan, 33(3): 527-538.

[43] Ritter, A., Muñoz-Carpena, R. (2013). Performance evaluation of hydrological models: Statistical significance for reducing subjectivity in goodness-of-fit assessments. Journal of Hydrology, 480: 33-45. https://doi.org/10.1016/j.jhydrol.2012.12.004

[44] Hammad, A.A.A., Ali, S.M.A., Sweis, G.J., Bashir, A. (2008). Prediction model for construction cost and duration in Jordan. Jordan Journal of Civil Engineering, 2(3): 250-266.

[45] AlSehaimi, A.O., Tzortzopoulos Fazenda, P., Koskela, L. (2014). Improving construction management practice with the Last Planner System: A case study. Engineering, Construction and Architectural Management, 21(1): 51-64. https://doi.org/10.1108/ECAM-03-2012-0032

[46] Alaneme, G.U., Mbadike, E.M., Iro, U.I., Udousoro, I.M., Ifejimalu, W.C. (2021). Adaptive neuro-fuzzy inference system prediction model for the mechanical behaviour of rice husk ash and periwinkle shell concrete blend for sustainable construction. Asian Journal of Civil Engineering, 22: 959-974. https://doi.org/10.1007/s42107-021-00357-0

[47] Ferentinou, M., Fakir, M. (2017). An ANN approach for the prediction of uniaxial compressive strength, of some sedimentary and igneous rocks in eastern kwazulu-natal. In ISRM EUROCK, 191: 1117-1125. https://doi.org/10.1016/j.proeng.2017.05.286

[48] Mačková, D., Bašková, R. (2014). Applicability of Bromilow´ s time-cost model for residential projects in Slovakia. Selected Scientific Papers-Journal of Civil Engineering, 9(2): 5-12.

[49] Neeraja, D., Swaroop, G. (2017). Prediction of compressive strength of concrete using artificial neural networks. Research Journal of Pharmacy and Technology, 10(1), 35-40. https://doi.org/10.5958/0974-360X.2017.00009.9

[50] Ambrule, V.R., Bhirud, A.N. (2017). Use of artificial neural network for pre design cost estimation of building projects. Interational Journal on Recent and Innovation Trends in Computing and Communication, 5(2): 173-176.