A Review of Single and Multi-Hazard Risk Assessment Approaches for Critical Infrastructures Protection

A Review of Single and Multi-Hazard Risk Assessment Approaches for Critical Infrastructures Protection

Alessandro PasinoSilvia De Angeli Umberto Battista Davide Ottonello Andrea Clematis

CNR – IMATI, Via De Marini 6, Genoa 16149, Italy

Department of Civil, Chemical, and Environmental Engineering (DICCA), University of Genoa, Via Montallegro 1, 16145 Genoa, Italy

Stam S.r.l., Via Pareto 8 AR, Genoa 16129, Italy

Corresponding Author Email: 
alessandro.pasino@ge.imati.cnr.it
Page: 
305-318
|
DOI: 
https://doi.org/10.18280/ijsse.110403
Received: 
9 January 2021
|
Accepted: 
25 July 2021
|
Published: 
31 August 2021
| Citation

OPEN ACCESS

Abstract: 

One of the greatest societal challenges is represented by Critical Infrastructures (CIs) protection. To minimize the impacts of man-made and natural threats, a series of risk assessment techniques have been developed. This work aims to critically compare state-of-the-art risk assessment methodologies for CIs protection, to find the pros and cons of each of them. The paper firstly defines the main challenges in performing the risk assessment of CIs, which have been identified in data availability and in modelling multiple hazard interactions. Afterwards, twelve different risk evaluation methodologies, including mathematical and statistical methods, machine learning techniques, graph and network methods, are analyzed and compared. Every method is described and its strengths and weaknesses are summarized in a suitable Table. Results show that statistical and mathematical methods provide the most accurate results, but need a large amount of data and execution time, while machine learning and complex network approaches work well even if the data are scarce and have a lower computational cost. In addition, the graph and network approaches tend to be the most flexible, able to adapt to every data availability condition and to deal with multiple hazards contemporarily.

Keywords: 

critical infrastructures protection, expert knowledge, man-made hazard, multi-hazard, natural hazard, risk assessment

1. Introduction

A critical infrastructure (CI) is a framework of interdependent networks and systems so vital for a nation that their incapacitation or destruction would have a debilitating impact on defense or economic security of the nation itself” [1]. The term CI started to be used in the second half of the 1990s, when homeland security became a priority for the most advanced countries in the world [1]. The sectors CIs belong to are different and include, among others, telecommunications, electrical power system, transportation and emergency services sectors [2].

The European Union (EU) has long recognized the pan-European importance of CIs and carried out concrete actions as the establishment of the European Programme for Critical Infrastructure Protection (EPCIP) in 2006 and the adoption of the European Critical Infrastructure (ECI) Directive in 2008 [3, 4]. However, the current framework on CIs protection is no longer sufficient to address the current and future threats which can disrupt the provision of essential services and, indeed, our daily lives. For this purpose, European Commission has made a new proposal for a Directive on the resilience of critical entities in December 2020. This further step beyond demonstrates that safeguarding CIs is still a top priority of Europe and also that current security standards are not adequate neither homogeneous across all the Member States [3, 4]. Indeed, only in the last two decades, the documented cases of CIs hit by natural or man-made perils increased dramatically with respect to the past [5-7]. We cite, among others, the Fukushima Daiichi nuclear disaster (Japan, 11th March 2011), the Trans-Ecuador pipeline breakage due to a landslide (Ecuador, 31st May 2013), the terroristic attacks which hit the twin towers and the pentagon (USA, 11th September 2001) and the artificial non-nuclear explosion to the Beirut port (Lebanon, 4th August 2020). In order to avoid similar catastrophic events, safety measures towards CIs have been reinforced by the governments of every nation. In parallel, hazards have been analyzed more accurately and effective risk management strategies have been implemented.

The risk assessment is the core step inside the risk management process. In particular, the risk assessment is aimed at identifying and analyzing the risks to understand what are the priorities for intervention in order to successfully develop strategic actions to contain or mitigate them. Centrality of risk assessment into protection of CIs has been reaffirmed also by the proposal for a new European Directive on resilience of critical entities [3, 4]. Indeed, the Directive prescribes two kinds of compulsory Risk Assessment to be carried out: the first made at national level by each Member State to identify critical entities, while the second performed by each singular critical entity designated by Member States in order to draft a resilience plan and put in place proper countermeasures to mitigate relevant risks [3, 4].

The possibility of evaluating the risk associated to multiple hazards is a crucial aspect, because many regions of the world are prone to multiple types of man-made and natural hazards, and only through an analysis of all the relevant threats, an effective risk reduction can be properly carried out. In some cases, these hazards can even occur simultaneously – or successively in a short time window – in the same location, placing CIs, infrastructures in general, and population suffers greater stress than if the hazards had occurred in different locations or at different times [8]. The interaction among different hazards is a significant issue in the field of risk assessment studies. Indeed, the manifestation of various hazards, either jointly or very close in time, is an eventuality that must be considered in order to perform an exhaustive risk analysis for a given CI. An example of multiple hazards event has been the 11th March 2011 Tōhoku earthquake, which led to a tsunami and to the consequent Fukushima Daiichi nuclear disaster.

At the international level, the Sendai Framework for Disaster Risk Reduction [9] calls for new ‘multi-hazard’ approaches to disaster risk reduction. Nevertheless, the framework does not indicate any standardized approach to address this requirement and integrates multi-hazard considerations inside the national risk assessment procedures.

A similar urgency about multi-hazard assessment is alive in the scientific community. In a review dated back to 2012, Kappes et al. [10] pointed out the necessity to extend methodologies and tools from single to multiple hazard analysis. Moreover, they summarized the requirements arising from the international community with respect to multi-hazard analysis and indicate a set of challenges, some of them still valid. Their survey resulted in a research agenda with strong downstream effects on the literature of the forthcoming period. The research efforts in the field of natural hazard multi-risk analysis are continuously developing, and a recent bunch of papers providing meaningful enhancements includes Dunant et al. [11], Tilloy et al. [12], Pourghasemi t al. [13], Liu et al. [14] and Pourghasemi et al. [15].

As a preliminary step in the identification of the common standardized approach to multi-hazard risk management for CIs protection, this paper aims at presenting a critical review of state-of-the-art methodologies that contribute to risk assessment including, at some extent specific steps such as hazard probability evaluation, vulnerability assessment or damage assessment. After introducing the risk concept and its mathematical formulation in Section 2, two critical issues in performing the risk assessment of CIs are discussed in detail, presenting also some approaches to address them. Specifically, data availability and some aspects of system characterization are discussed in Section 3 and the modelling of multiple hazards and their interactions is tackled in Section 4. The core part of the review is the analysis of a series of methodologies for risk calculation, classified into three categories – the mathematical and statistical, the machine learning, the graphs and networks – and presented in Section 5. All these methodologies are then critically compared in Section 6, in order to identify the pros and the cons of each one. Comparing them, specific attention is paid to the two issues previously discussed in Sections 3 and 4 – data requirements and the capacity of taking into account multi-hazard interaction respectively.

2. Defining Risk

We now discuss the main terminology adopted in this paper in the context of natural and man-made risks, together with their mathematical formalization.

The concept of risk, which has become more and more crucial in our society in the last decades, is formally defined by Kaplan and Garrick [16] as the “possibility of loss or injury” and the “degree of probability of such loss”. According to this definition, the concept of risk underlines two other concepts: the uncertainty, which is expressed as a probability of occurrence of a hazard, and the expected consequences, which can be defined in terms of facility loss, financial loss, fatalities, or down-time [16, 17]. This is expressed in mathematical terms by the following equation:

$R=P(I M) \times C$

As stated in this formulation, risk R can be defined as the product of the probability of occurrence of a certain hazard P(IM) with a prescribed intensity magnitude IM, multiplied by the consequences C of the hazard, usually named as impacts [17]. The severity of the impacts an asset faces depends on its turn by the magnitude of the hazard and the asset’s exposure and vulnerability. The exposure expresses the value of the asset, while the vulnerability measures its propensity to suffer damages from hazardous events and is linked to the concept of fragility that will be discussed in detail in the second part of Section 3 of this manuscript. Indeed, UNISDR [9] defines risk as “the potential loss of life, injury, or destroyed or damaged assets which could occur to a system, society or a community in a specific period of time, determined probabilistically as a function of hazard, exposure, vulnerability and capacity”. In this definition, together with the concepts of hazard, exposure and vulnerability, also the capacity is introduced, as a quantification of all the strengths, attributes and resources available within a certain system, that can be used to reduce risks and increase its resilience.

Usually, risk is quantified in economic terms and the measure (in monetary terms) of the negative impact caused by a certain hazardous event (i.e. the damage) is named loss. In particular, risk is evaluated quantifying the exceedance rate of loss as:

$\nu (l)=\sum\limits_{i=1}^{EventsNo}{P(L>l|Even{{t}_{i}})\cdot {{F}_{A}}(Even{{t}_{i}})}$

In this formula, EventsNo represents the number of hazards a CI can be hit by; $F_{A}\left(\right.$ Event $\left._{i}\right)$ is the annual frequency of occurrence of the $i$-th event; $P\left(L>l \mid\right.$ Event $_{i}$ ) is the probability of the loss to be greater or equal to $l$, conditioned by the occurrence of the Event $_{i}$. The exceedance rate of loss displays the relation between a given loss and the annual frequency of occurrence of that loss or of a larger one. The graphical representation of this relationship is the loss exceedance curve, LEC, which provides the most complete description of risk [18].

Another important risk metric that can be obtained from the exceedance rate of loss and which well describes the concept of loss, is the annual average loss (AAL):

$A A L=\int_{0}^{\infty} v(l) d l$

where, v(l) is the exceedance rate of loss. AAL expresses the loss expectation, that is, the weighted average of all plausible loss values [18]. This versatile metric can be used to express the risk for an asset, a portfolio of assets, a city or a country. Both the LEC and the AAL can be obtained for a single or for multiple hazards, allowing to consider at the same time the aggregated multi-hazard risk [19].

Table 1 clarifies terminology adopted in this paper, including the definitions for capacity, consequence/impact, damage, exposure, fragility, hazard, hazard event, loss, multi-hazard, multi-hazard risk, risk, vulnerability [20-24].

Table 1. Definitions of the manuscript

Risk term

Definition

Capacity

Quantification of all the strengths, attributes and resources available within a certain system, that can be used to reduce risks and increase its resilience [9].

Consequence or Impact

“The total effect, including negative effects (e.g., economic losses) and positive effects (e.g., economic gains), of a hazardous event or a disaster". The term includes physical, monetary, human and environmental impacts. As a result, a consequence can be quantified using different units of measure [9, 16].

Damage

Negative impacts i.e., impacts that result in negative effects on assets, people, socioeconomic and environmental systems [20, 21].

Exposure

“The situation of people, infrastructure, housing, production capacities and other tangible human assets located in hazard-prone areas”. Exposure is described through a series of characteristics of the exposed elements (or exposed assets), such as material, occupancy, economic value or number of people [9].

Fragility

"The combination of exposure to risk and insufficient coping capacity of the state, system and/or communities to manage, absorb or mitigate those risks" [22]. Inside this manuscript, the fragility is described as a measurable property of the system, which can be quantified applying equation (1).

Hazard

“A process, phenomenon or human activity that may cause loss of life, injury or other health impacts, property damage, social and economic disruption or environmental degradation. Hazards may be natural, anthropogenic or socio-natural in origin. Natural hazards are predominantly associated with natural processes and phenomena […]. Hazards may be single, sequential or combined in their origin and effects. Each hazard is characterized by its location, intensity or magnitude, frequency and probability” [9].

Hazard event (or event scenario)

“A specific occurrence of a hazard […] often constrained by a spatio-temporal domain” [23].

Loss

A measure (usually in monetary terms) of a certain damage [20, 21].

Multi-hazard

“[Multi-hazard analyses refer to the] implementation of methodologies and approaches aimed at assessing and mapping the potential occurrence of different types of natural hazards in a given area. [The employed methods] have to take into account the characteristics of the single hazardous events […] as well as their mutual interactions and interrelations” [24].

Multi-hazard risk (or impact)

A risk (or an impact), which is evaluated considering the effects of multiple hazards.

Risk

The “possibility of loss or injury” and the “degree of probability of such loss”. Risk is usually determined probabilistically as a function of hazard, exposure and vulnerability. In some cases also capacity is included [9, 16].

Vulnerability

The totality of “the conditions determined by physical, social, economic and environmental factors or processes which increase the susceptibility of an individual, a community, assets or systems to the impacts of hazards” [9].

3. Data Availability and System Characterization

One of the first critical issues that are encountered in the implementation of the risk assessment of a CI is related to data collection. The harvest of information is not easy in the field of risk assessment, because it requires retrieving information about events that have rarely (or never) happened. As a result, only in a few cases the quantity of available data is acceptable to perform a reliable risk assessment [16].

As a consequence, two different scenarios can be distinguished regarding the problem of the collection of data, as pointed out by Kaplan and Garrick [16]: in the first case there is a lack of available data, while in the second situation data are abundant. The second situation is always the preferable one, because relying on comprehensive datasets brings to easier modelling of the examined CI risk assessment case. Nevertheless, what often happens is that, given their nature, the extreme events analyzed have happened so rarely that very little data are available. As a result, the lack of data requires the integration of additional knowledge into the model to work properly, usually derived from expert judgments [25, 26]. Indeed, in the case of data scarcity, what usually happens in the risk analysis field is that knowledge from experts in the field is added, in order to have enough reliable data to carry on the studies. On the other way, the abundance of data allows working only with objective data. Therefore, in these conditions, there is no need for external knowledge to enlarge the quantity of possessed data. As anticipated before, the first option is far more diffused than the second one. As a result, the support of experts is crucial, in order to gather information as specific and reliable as possible.

The two distinct scenarios of data availability require some different procedures to apply.

A rather different and in some way complementary perspective may be adopted considering the problem of data availability for risk assessment. This is based on the analysis of characteristics and consequent properties of the system and leads to the idea of fragility as a measurable quantity, as illustrated in the last part of this Section.

3.1 Data abundancy

In the case of data abundance, information can be represented using triplets [16]. Indeed, exploiting the great quantity of data possessed, it can be defined which are the possible scenarios related to a given type of threat, their probability of occurrence and the associated consequences. More precisely, for a considered CI, several scenarios can be identified and each scenario can be described by a triplet $\left\langle s_{i} ; p_{i} ; x_{i}\right\rangle$, where $s_{i}$ is the representation of the $i$-th scenario in terms of considered threat, $p_{i}$ stands for the probability that the $i$-th scenario will happen, $x_{i}$ are the consequences associated to the i -th scenario. All the triplets are collected into a set and, starting from that, threats are ordered in terms of their severity of damage and a cumulative curve is drawn. The cumulative curves, named “risk curves”, allow describing exhaustively the risk profile of a certain CI.

3.2 Data scarcity

In case of lack of data, Bayes’ theorem [16], probability analysis [27-29], interval analysis [27-30] and probability bound analysis [27, 29] can be applied to face up to the problem. When data are scarce, the limited dataset must be properly integrated using subjective knowledge, obtained from expert judgment. This subjective knowledge can be brought in following different approaches. The first method which can be applied is the Bayes’ theorem, that introduces three different probabilities: the prior probabilities P(h) and P(E), together with the conditional probability P(E|h). P(h) is the probability given to the hazard before the evidence E ; P(E)is the prior probability of the evidence E; P(E|h) is the conditional probability that evidence E would be observed if the true frequency of the hazard were actually h. The purpose of this method is to calculate P(E|h) applying the following equation:

$P(h \mid E)=P(h) \cdot \frac{P(E \mid h)}{P(E)}$

The term P(h) is given by the expert, the conditional probability P(E|h) can be calculated from the likelihood function, whereas the prior probability P(E) is the sum or the integral of the numerator.

Other two approaches that can be applied to deal with data scarcity are the probability analysis [27-29] and the interval analysis [27-30]. The first of the two methods are used when the assumptions are strong and the probabilistic distribution of the considered hazard is known. With this method, in fact, the variability is propagated. On the other side, when the assumptions made must be relaxed and propagating ignorance is the goal, interval analysis is used. Indeed, when knowledge is poor, it is not usually possible saying which probabilistic distribution is associated to the hazard studied, but the extreme values are one of the few information known. As a result, in these cases, defining an interval of values is the best option. The intervals can be obtained by direct arguments or constructed indirectly from assigned possibility functions or mass functions, in the framework of evidence theory. The disadvantage of the probability analysis is that it is supposed to propagate variability, but not always the probabilistic distributions chosen in order to describe the hazard considered corresponds to reality (e.g. the uniform distribution is sometimes used, even if no evidence confirms it as the distribution associable to the hazard studied) [28]. On the other hand, the interval analysis returns an output more objective, but the information contained in it is affected by too much uncertainty, so the stakeholders cannot base their decisions on it [29].

The last approach that can be applied when few data are available is the probability bound analysis, which is a combination of the previous two methods (i.e. probability analysis and interval analysis). Indeed, it belongs to the category of hybrid methods, since it derives the properties from both probability analysis and interval analysis, allowing it to handle variability and ignorance in a single investigation. This opportunity can be exploited when the variables to be studied are two, being them, two variables referring to the same hazard, or being them two variables associated to two different hazards: for the first variable, it is possible defining a probability distribution, while for the second variable, no knowledge is possessed. The variability is propagated thanks to the first variable, whereas ignorance is diffused due to the second variable. If these two variables could be multiplied, the result that the probability bound analysis gives back is the region the probability distribution of the product must lie [28]. The greater is the ignorance, the bigger is the extension of the region. Vice versa, the greater is the knowledge, the smaller is the area of the region.

To conclude the analysis of the four methods that can be used in order to deal with poor data, it is possible asserting which are the pros and cons of every approach. The Bayes’ theorem approach makes the decision step simple, because a well-behaved decision theory can support the final scientific judgment [16, 30]. On the other hand, both probability analysis and interval analysis could capture different types of knowledge useful for the decision maker [27], but they return a result which is less informative than the output of Bayes’ theorem approach [29]. Finally, the probability bound analysis is convenient to be used when the variables to study are two and when must be propagated both variability and ignorance [27, 29].

3.3 Fragility quantification

The approaches described so far highlight that risk is not always easily and objectively quantifiable. Therefore, in order to solve this issue, a change of perspective could be useful [31]. Indeed, quantifying how much a system is fragile is a parallel approach that can be adopted, since fragility is a measurable property of the system. In particular, three types of systems can be distinguished: the robust systems, the fragile systems and the antifragile systems. If a system is robust, the shocks and stressors it is subject to do not origin any consequence. If a system is fragile, it is not under control and large negative effects can be experienced. In this system, the frequency distribution of events has a considerable mass on negative values. The opposite happens to antifragile systems: they encounter only positive extreme consequences and the frequency distribution of events has a substantial mass on positive values. As asserted before, fragility is a measurable property of the system, so there exists a heuristic method which can be used to calculate the level of fragility of a system and it is represented by the following formula [32]:

$H=\frac{f(\alpha-\Delta)+f(\alpha+\Delta)}{2-f(\alpha)}$      (1)

The resulting value measures the deviation of the measured shock from the average shock. In this formula the function f is the profit or loss for a certain level and $\alpha$ is the referring level in the state variable considered. $\Delta$ is a deviation from $\alpha$ and the purpose of the calculation is understanding how much a deviation from a referring level impacts on the system. If H is zero or if it is near to it, the potential gain from a smaller x is equal to the potential loss from an equivalently sized larger x. In this case the system is said to be robust. If H<0, the outcome is fragile, in the sense that the additional losses with a small unfavorable shock will be much larger than the additional gains with a small favorable shock. On the other side, if H>0, the outcome is antifragile because the additional gains with a small favorable shock will be much larger than the additional losses with a small unfavorable shock. This is not the same as robustness, since with robustness, higher volatility provides neither significant harm nor benefit [32].

4. Modelling Multiple Hazards and Their Interactions

When multiple hazards are considered inside the risk analysis, we can talk about “multi-hazard” approaches, and the resulting risk is named as “multi-hazard risk”. Nevertheless, different definitions of multi-hazard are found in the literature and can be classified as: (i) definitions where multi-hazard refers to a series of hazards that are relevant to a given area, without considering any interaction among them [33]; (ii) definitions where multi-hazard refers to a series of hazards that are relevant to a given area including their mutual interactions and interrelations [8, 34]. This second option is the one that should be mostly taken into account because it allows having a model closer to reality. Mutual interactions are attracting increasing interest because recent events have shown that they correspond to actual events concatenations and cascading effects (e.g. the already cited Fukushima Daiichi nuclear disaster – Japan, 11th March 2011). In contrast to the analysis of single hazard events, the examination of events involving multiple-hazards poses a series of challenges in each step of the risk analysis (hazard assessment, vulnerability evaluation, impact assessment, risk calculation) [35, 36].

The two kinds of definitions of multi-hazard – with or without introducing interactions – are representative of the two major categories in which available multi-hazard approaches can be classified:

  • Independent multi-hazards (does not consider hazard interactions)
  • Interacting multi-hazards (considers hazard interactions)

In the following part, all these two approaches will be described together with the mathematical methods to be applied for each of them. In the last part of the Section, flexible approaches that can be applied to both cases are defined.

4.1 Independent multi-hazards

The first category of approaches considers the multiple threats as non-dependent. The first approach described is the most diffused and the easiest one because fewer data and a smaller number of calculations are needed. However, the resulting model is significantly less accurate. Furthermore, considering the events as independent can distort management priorities, increase vulnerability to other spatially relevant hazards or underestimate risk [37]. A method that can be used in these situations is the one described in Fleming et al. [38], which considers the exceedance probabilities of a given loss value. $P_{i}\left(L_{j}\right)$ is the probability of exceedance of the j-th loss per annum for the i-th source. It represents the probability that a loss of more than $L_{j}$ euros will be caused by the risk source i (e.g. earthquakes, floods, landslides, etc.). The total annual exceedance probability can be calculated as [38]:

$P{{({{L}_{j}})}_{TOT}}=1-\prod{(1-{{P}_{i}}({{L}_{j}}))}$

The formulation considers the elements of every risk source, but not the ones that imply the correlation. Therefore, this formulation does not allow to model hazard interdependencies but just considers risk source as independent.

The second multi-hazard approach allows considering correlations among hazards. Evaluating the correlations among events allows obtaining a more accurate model which can lead to more informed decisions. Nevertheless, it is more expensive in terms of modelling and requires much information compared to independent multi-hazard approaches.

4.2 Interacting multi-hazards

The second category of approaches considers the multiple threats as interdependent. If there is a correlation between two or more threats, there is an interaction among them. The interactions among threats can be unidirectional or bidirectional [37]. In unidirectional interactions, the primary hazard occurs and then happens the second hazard. In bidirectional interactions, the primary and secondary events can be of different types and several authors in the literature suggest different classifications of hazard interaction mechanisms (e.g. [8, 14]).

In general, three main interaction relationships can be identified: triggering, increased probability and catalysis or impedance [37-39]. Two events have a triggering relationship if the primary threat can result in the secondary hazard occurring. A triggering relationship can be characterized by a probability associated with a threshold that can be passed and give raise to the secondary event. As an example, a tropical storm may trigger many landslides. The second type of relationship consists of a primary event which increases the possibility of the occurrence of a secondary event. This type of relationship involves a probability too, but in this case, the probability quantifies how much the primary event activates changes in environmental parameters so as to change the temporal proximity or specific characteristics of a secondary hazard. An example of this case is a wildfire which increases the probability of ground heave. The third type of relationship involves a peril which catalyzes or impedes the occurrence of a secondary event. In this case, an example can be the urbanization process which catalyzes storm-triggered flooding and the deforestation practice which impedes the presence of wildfires.

The available methods which are used to model correlated events are based on a probabilistic approach [40], similar to the technique described by Fleming et al. [38], and on a three-level framework [41]. The probabilistic approach described by Stewart [40] has the purpose of calculating the expected loss produced by multiple hazards, considering the conditional probabilities among them, according to the following equation:

$E(L)=\sum{P(T)\cdot P(H|T)\cdot P(D|H)\cdot P(L|D)\cdot L}$

In this formula, P(T) is the annual probability that a certain threat (e.g. a terrorist attack or natural hazard) will occur; P(H|T) is the annual probability of a hazard (e.g. wind, heat, explosion) conditional on the threat; P(D|H) is the probability of damage (also called the vulnerability probability) conditional on the hazard; P(L|D) is the probability of a loss (e.g. due to economic damages, casualties and injuries) given a damage; L is the money loss if the full damage occurs. It is difficult quantifying the conditional probabilities, but once obtained the final outcome of this formula it is possible evaluating the expected loss in monetary terms, taking into consideration also the connection among threats, hazards, losses and damages [40].

Another method that helps to assess correlated events is the one presented by Liu et al. [41]. The three-level framework, as explained the name, is a technique composed of three different steps: a qualitative analysis, a semi-quantitative analysis and a quantitative analysis. The analysis becomes more rigorous and detailed as the user moves from one step to the next. The qualitative analysis consists of a flow chart type list of questions that guides the end-user in understanding whether or not a multi-type assessment approach is required. At the second step, a semi-quantitative approach is applied and the interactions among the hazards are evaluated thanks to a matrix structure based on system theory. The basis of this approach consists of the comprehension and description of the relationships among hazards in the evolution of the system. The squared matrix is composed of a number of rows equal to the number of hazards. Every cell is fulfilled by an integer number between 0 and 3 which determines the level of interaction between the two events of the row and of the column implied. Lastly, the quantitative analysis involves the use of a Bayesian network approach. It has the objective of estimating the probability of triggering/cascade effects and of modelling the time-variant vulnerability of a system exposed to multiple hazards. These targets are reachable thanks to the probabilities obtained with the Bayes’ theorem: the probabilities are constantly updated with information of specific cases, in order to refine the model. Once the propagation pattern of the cascade effect is known, the occurrence probability of the cascade effect can be estimated thanks to the following equation [41]:

${{P}_{cascade}}={{P}_{primary}}\cdot {{P}_{conditional}}$

In this formula, the cascade probability ( $\left.P_{\text {cascade }}\right)$ is obtained as a product between the probability of occurrence of the primary hazard $\left(P_{\text {primary }}\right)$ and the probability of occurrence of the events conditional to the primary hazard event $\left(P_{\text {conditional }}\right) .$

4.3 Flexible approaches

Finally, there are methods that can be applied either in the case of independent hazards or in the case of correlated hazards. Techniques like statistical regression can work well in both cases because terms of independency and of correlations can be included or excluded from the regression formula. Reed et al. [39] explain how logistic regression can be applied to take into account hazard interactions. A fragility function, which can be seen as the probability of failure of a CI conditional upon a hazard or set of hazards as presented in the Section 2, is then used. More specifically, a fragility function in the form of a logistic response function is introduced as it follows:

$F=P(Failure|H)=\frac{\exp ({{\beta }_{0}}+{{\beta }_{1}}H)}{1+\exp ({{\beta }_{0}}+{{\beta }_{1}}H)}$     (2)

where, $H$ is the variable associated to the single hazard considered (e.g. wind speed, peak ground acceleration), $\beta_{0}$ and $\beta_{1}$ are the parameters of the logistic regression model. In particular, $\beta_{0}$ is identified as the intercept and dictates the placement along the hazard variable axis, while $\beta_{1}$ is the coefficient and controls the slope of the so-called " $S$ " curve. The logit transformation of $F$ is:

$y(H)=\ln \left[ \frac{P(Failure)}{1-P(Failure)} \right]={{\beta }_{0}}+{{\beta }_{1}}H$      (3)

This procedure is applied because the output that the presented case returns is dichotomous and it is not possible finding a regression function directly from these 0/1 values. Indeed, starting from a determined hazard variable $H$, it is only possible to see if a CI fails or not. As a consequence, it is necessary to recur to a quantitative variable as the probability of failure of the analyzed CI, in order to build a function that models the CI risk of failure phenomenon. By doing this, a continuous output in the range from 0 to 1 can be obtained. This brings to the concept of fragility function, which is calculated in Eq. (2) as the conditional probability of the CI to suffer a failure, given a certain hazard variable. Afterwards, a logit transformation is applied to Eq. (2), with the aim of simplifying it. Indeed, the logit model gives the possibility to pass to a linear function, which explains how elevated is the risk of failure for the analyzed CI. Eq. (3) is easy to understand because the estimated model coefficients represent the natural logarithm of the odds ratio – which indicates the ratio between the probability of failure and the probability of non-failure – of damage due to a unit increase in the independent variable. Logistic transformation and regression models for fragilities have been used successfully for evaluating fragilities of bridges, constructions, and lifeline systems for single hazard metrics [39, 42]. From this starting point, it is possible considering more than one hazard and the interaction terms. If the hazards are considered all independent, the result is the following linear function:

$y={{\beta }_{0}}+{{\beta }_{1}}{{H}_{1}}+...+{{\beta }_{m}}{{H}_{m}}$

where, $\beta_{i}$ are the fitted coefficients and $H_{i}$ are the variables associated to the different hazards treated. Adding the interaction terms, this other linear function is obtained:

$y={{\beta }_{0}}+{{\beta }_{1}}{{H}_{1}}+...+{{\beta }_{m}}{{H}_{m}}+\sum\limits_{1\le j\ne k\le m}{{{\beta }_{jk}}{{H}_{j}}{{H}_{k}}}$

Thanks to the last term ($\sum_{1 \leq j \neq k \leq m} \beta_{j k} H_{j} H_{k}$), the relationships among the hazards are evaluated too.

5. Approaches for Risk Calculation

As illustrated in the previous sections, assessing the risks a CI faces is a challenging task, specifically if data are scarce and interactions among hazards are taken into account. After data collection and the identification of the different hazards the CI is exposed to – considered as correlated or independent – the next step of the risk assessment is represented by the risk evaluation phase. In order to perform this step, it is necessary to apply one or more methods which allow to quantify the risk the monitored CI is subject to. Among the large number of available mathematical methodologies, twelve methods have been selected as suitable for risk assessment applications. Indeed, some of these methods are strictly devoted to risk calculation, while some others are suitable to perform specific steps which contribute to the overall risk evaluation, such as hazard probability evaluation, vulnerability assessment or damage assessment.

More specifically, we selected the methods which satisfied the following criteria: (i) the method can be applied to single and/or multiple hazards; (ii) the method can provide a quantitative evaluation; (iii) the technique must be applied both to man-made and natural hazards.

The identified methods can be classified into three main categories: (i) mathematical and statistical methodologies; (ii) machine learning approaches; (iii) graph and network techniques. In the following three paragraphs, the twelve approaches, divided into these three categories, are introduced and described. In addition, in order to understand how the presented methods actually work, twelve different examples have been provided, one for each method. These examples include both man-made and natural hazards and have been selected directly from the papers referred to in the definition of each approach. Furthermore, Section 6 provides a critical comparison of considered methods with respect to the general critical issues identified in the first part of this paper.

5.1 Mathematical and statistical methods

Inside the cluster of the mathematical and statistical approaches, three different methods have been identified as the most effective according to our criteria: (i) Bayesian Belief Network (BBN) technique; (ii) logarithmic regression implementation; (iii) early warning index calculation.

The first method – BBN technique – consists of using a set of known variables as nodes of a causal network, able to model the cause-effect relationships between the parameters and to consider model uncertainties [43]. BBN model can perform both diagnostic and predictive analysis and can be used for several applications in the field of risk assessment, such as analyzing domino effects in technological accidents [44], performing probabilistic vulnerability assessments or damage assessments [45-47] evaluating the probability of failure (fragility) of CIs [48], among others. The BBN technique requires a few data and can be applied both to single hazards and to multiple hazards, being able to consider interactions among them. Van Verseveld et al. [46] use this method to establish a relationship between observed damage caused by a hurricane and multiple hazard indicators, in order to make better probabilistic predictions. Hazard data are collected under the form of Local Hazard Indicators (LHIs), which include the inundation depth, the flow velocity, the wave attack and the scour depth. The CIs hit by the hurricane are categorized, using aerial images, according to the degree of damage they suffered into four classes: affected, minor damage, major damage, destroyed. The LHIs are the parent nodes of the BBN, while the damage levels are the child nodes. There is a causal relationship between the hazard indicators and the degree of damage. This relation is represented by Bayes' theorem $P\left(F_{i} \mid O_{j}\right)=P\left(O_{j} \mid F_{i}\right) \cdot P\left(F_{i}\right) / P\left(O_{j}\right)$, where $F_{i}$ is the $i$-th forecast, in our case the damage level, and $O_{j}$ is the $j$-th observation, in our case the LHIs. The term $P(\cdot)$ indicates that the probability of the term into parenthesis is calculated. Applying the Bayes’ theorem on the available data it is possible to model the BBN and use it to predict the level of damage a certain CI suffers, given the four LHIs. This method can be validated thanks to the Log-Likelihood Radio (LLR) and to the Pearson correlations. The LLR compares the forward prediction (conditional distribution) to the prior prediction (marginal distribution). The Pearson correlation describes both the inherent correlations in the multivariate data set and help interpret the BBN results.

The second technique – logarithmic regression implementation – combines various attributes into a logarithmic transformation with the aim of estimating the risk related to a certain hazard [49]. This method, which requires some expert knowledge, can be applied both to single and to multiple hazards without considering the interactions among them. In the application described by Chatterjee and Abkowitz [49] the level of risk of a terroristic attack in a certain area is evaluated, given its population density and its number of CIs. In particular, the risk is expressed as an expected annual economic loss, named risk-cost. The risk-cost associated to terroristic events is a fraction of the risk-cost for all the possible hazards. The total risk-costs, considering each scenario (natural hazards, man-made accidents and terroristic attacks) as independent, is calculated applying the following formula:

$Risk_{ah}^{c}=Risk_{n}^{c}+Risk_{m}^{c}+Risk_{i}^{c}$

The total risk-cost Risk $_{a h}^{c}$ evaluates all the possible hazards, while the term $\operatorname{Ris} k_{n}^{c}$ is the risk-cost for natural events, the term $\operatorname{Ris} k_{m}^{c}$ is the risk-cost for man-made accidents and the term Risk $_{i}^{c}$ is the risk-cost for intentional acts like terroristic events. The last term $-$ the risk-cost for intentional acts $-$ that is the focus of the analyzed study, is calculated with a logarithmic regression approach. The logarithmic transformation allows avoiding negative values, fitting perfectly the case of the risk-cost, which is always positive. The function that Chatterjee and Abkowitz [49] found from the fitting of available data is the following:

$\ln (Risk_{i}^{c})=-18.46+(1.63\cdot \ln ({{p}_{dw}}))+(0.0002\cdot \ln ({{p}_{dw}})\cdot {{s}_{ci}})$

In this formula, the risk-cost deriving from a terroristic threat $\operatorname{Risk}_{i}^{c}$ is calculated as a function of the densityweighted population $p_{d w}$, measured in population $^{2} /$ mile $^{2}$, and the unweighted sum of the number of CI $s_{c i}$. This procedure could potentially be applied also for the natural and the man-made hazards, in order to assess the total risk-cost.

The last statistical approach – early warning index calculation – is introduced by Hamadeh et al. [50] to perform a near real-time wildfire risk assessment. The calculated index indicates if a risk threshold has been passed and how much in danger is the CI analyzed. In this way, it is possible to intervene in time in order to significantly reduce the impacts of forest fires. The quantity of required observed data is high and, moreover, the method, that can be applied to single hazard and multiple hazard cases, is not able to consider interactions among hazards. The variables considered as consistent and included inside the index are temperature, dew point and upper layer soil temperature, because they are strongly correlated with fire occurrences. The forest danger index (FDI) is calculated as:

$FDI=1.18\cdot T+1.07\cdot S+D$

where, T is the temperature (℃), $S$is the soil temperature (℃), and D is the dew point (℃). The risk of forest fire increases as the FDI becomes higher. When a certain threshold is exceeded, the probability of occurrence of a forest fire becomes very high. The performance of the index is evaluated by the authors in terms of precision, accuracy, specificity, sensitivity and AUC (area under the curve).

5.2 Machine learning techniques

Among available machine learning techniques, six different methods have been identified as the most powerful, in line with our criteria: (i) Artificial Neural Network (ANN); (ii) Support Vector Machine (SVM); (iii) Boosted Regression Tree (BRT); (iv) Generalized Additive Model (GAM); (v) Genetic Algorithm Rule-Set Production (GARP); (vi) Quick Unbiased Efficient Statistical Tree (QUEST).

The ANN methodology, described by Pilkington and Mahmoud [51], aims at using the neural network algorithm to find a risk index that measures the potential impact of one or more hazards on the considered CIs, in terms of damage level. This approach, which requires few data to work, can face both single and multiple hazards. An ANN is a computer model created to reproduce how the human brain learns. Indeed, an ANN is composed of different neurons, divided into three layers: an input layer, a hidden layer, and an output layer. The principal purpose of this method is to find statistical correlations between the input and the output variables. The main mathematical rule ANN is based on is the following:

${{s}_{i}}={{b}_{i}}+\sum\limits_{j=1}^{i-1}{{{W}_{ij}}{{x}_{j}}}$

This formula allows describing a link from a neuron j to a neuron i which are on two different following layers. In the case presented, the two neurons can be respectively on the input and the hidden layer or on the hidden and the output layer. The term $x_{j}$ is the value of the $j$-th neuron, $W_{i j}$ is the weight present between neuron $j$ and $i, b_{i}$ is the bias related to the neuron $i$. The value obtained, $s_{i}$, can be seen as the potential of the i-th neuron. This formula is applied to all the couples of neurons belonging to different layers of the network, in order to find biases and weights which fit the model. Thanks to backpropagation, these terms are refined and the final model gives the chance to derive output variables, starting from the input provided. In the case described by Pilkington and Mahmoud [51], the input variables used to model the damage caused by a hurricane on the considered CIs include wind speed, pressure, rainfall accumulation, population, and landfall location(s). The output variable is an integer number between 0 and 5 which predicts, thanks to the input variables, the level of economic damage a specific hurricane can generate to some CIs. In order to apply this method, it is required to find a correct number of inputs and hidden layers: in fact, too little or too many neurons can bring to an unreliable result.

The SVM, BRT and GAM methods are three state-of-the-art models that can be used to classify the level of risk some CIs or territorial systems in general can face if subjected to some hazards. All the three methodologies require few data and are able to face both single and multiple hazards scenarios, but cannot evaluate interactions among them. The aim of the case study presented by Rahmati et al. [52] is to produce an integrated multi-hazard exposure map for a mountainous area (Asara watershed, Iran), able to predict disaster prone areas considering the impacts from avalanches, rock falls, and floods. Hazard maps based on BRT, GAM, and SVM models are created for each of the three considered hazards separately. Then, the best model for each hazard type is selected and a weighted integration method is applied to conduct the multi-hazard susceptibility analysis and produce the final multi-hazard susceptibility map.

The SVM is an approach based on transforming the input data to a new feature space and then, in that space, a decision function built through an optimal hyper plane is used to classify the data. The decision function established to the optimal hyperplane is the following:

$g(x)=sign(\sum\limits_{i=1}^{n}{{{y}_{i}}{{\alpha }_{i}}K({{x}_{i}},{{x}_{j}})+b)}$

where, $x_{i}$ is the input class, $y_{i}$ is the output class, $\alpha_{i}$ is a Lagrange multiplier, $K\left(x_{i}, x_{j}\right)$ is the kernel function and $b$ is the intercept term of the hyperplane. The outcome of this function is a positive or a negative value, which defines if the considered point belonging to the study area is predicted to be suffering the hazard or not. The second technique used is the BRT, which combines the regression trees with a boosting algorithm. This method, thanks to the association between these two techniques, brings interesting results: the performance of the classification trees is upgraded thanks to the addition of an iterative procedure, which produces a new tree after every step, reducing the loss function value that measures the distance from the optimum. As for the SVM method, the goal of the BRT is to return a value that classifies the considered map point as risky or not. The third and last method applied by Rahmati et al. [52] is the GAM. This approach is a combination of the Generalized Linear Model and the additive model, merging a link function with a smoothed function. The first one is used to build the relation of the mean of the dependent variable with the second function which is referred to the independent variables. The Eq. found is the following:

$G(y)=\alpha +{{f}_{1}}({{x}_{1}})+...+{{f}_{m}}({{x}_{m}})$

where, $G(\cdot)$ is the link function, $\alpha$ is the intercept, $x_{i}$ is the independent variable, $y$ is the expected value and $f(\cdot)$ is the smoothed function. The goal of this method is to find a link function that allows classifying each map point as risky or not.

The last two machine learning approaches presented are the GARP and QUEST techniques. They can be applied with the aim of classifying CIs according to their level of risk. These two methods require few data to work and can be used for single and for multiple hazards. Darabi et al. [53] test the two techniques in evaluating the risk of flood for selected map points starting from some input variables, that include rainfall, elevation, slope percent and distance to the river. The GARP approach adopts sets of rules to deduce output variables starting from input variables. The method adds the genetic algorithm to this mechanism, with the aim of making the process iterative and testing as many sets of rules as possible, in order to find the best-fit rule-sets which is able to define the output associated with each input, given its distinctive variables. The QUEST technique is based on classification trees instead of sets of rules, but has an approach similar to the GARP method. Indeed, QUEST is a tree-structured classification algorithm that yields a growing binary split decision tree. It employs a sequential tree growing method, which utilizes a linear discriminate analysis method in splitting tree nodes. In addition, it is unbiased in choosing splitting rules and does not use an exhaustive variable search routine. The great advantage of GARP and QUEST methods is that they have simple requirements in terms of input data, computational time and costs. As a consequence, these two methodologies are particularly interesting for the risk assessment of CIs when available data are very limited.

5.3 Graph and network methods

Inside the cluster of the graph and network methods, three different techniques have been selected as the most efficient in agreement with our criteria: (i) game theory application; (ii) complex network methodology; (iii) multi-level complex network formulation.

The first technique – the game theory application – defined by Major [54] and Hausken et al. [55], focuses on the behaviours that attackers and defenders take facing different hazards, given the value of every CI as known a priori. The attackers are mostly interested in hitting the most vulnerable CIs, while defenders have the objective of protecting the CIs which can suffer higher damages, minimizing the risks they can suffer. The game theory technique is mainly applied to terroristic attacks, but can also be used to deal with natural hazards and is able to model both single and multiple hazards. The data needed relies mostly on subjective evaluation and a priori definitions, so expert knowledge is fundamental for this methodology. In each game theory application, there is a defender who tries to protect himself against an attacker’s move. The possible cases are (i) a simultaneous move by the two actors, (ii) a move of the defender which precedes the one of the attacker, and (iii) a move of the attacker which happens before the move of the defender. Usually, a set of CIs is defined and numbered from 1 to N, each characterized by value $V_{i}$. The attacker has total resources $A_{T}$ and must assign an attacking capacity $A_{i}$ to every $\mathrm{CI}$. Similarly, the defender must allocate its resources $D_{T}$ to all the CIs. The total destruction of the $i$-th $\mathrm{CI}$ occurs with a probability given by the function $p\left(V_{i}, A_{i}, D_{i}\right)$. This probability is the starting point for the definition of the expected loss function $EL$:

$EL=\sum\limits_{i}{{{V}_{i}}\cdot p({{V}_{i}},{{A}_{i}},{{D}_{i}})}$

The attacker wants to maximize the EL, while the defender wants to minimize it. This is known as a zero-sum game with payoff EL to the attacker. The way to solve this problem depends on the situation: for example, if the defender and the attacker are acting simultaneously, the defender should apply the minimax criterion. This consists of choosing a strategy that results in the lowest possible worst-case EL, regardless of which CI the attacker selects to hit. This implies that the resulting EL among all the defended CIs will be equal, while the EL among undefended targets will be less. The more valuable CIs should then be equalized in terms of their expected losses. Less valuable CIs may be left undefended, because an attack there, even if successful with 100% certainty, will result in a loss that is less than the EL of the defended CIs. In synthesis, the expected loss function can be used in order to solve a game theory case like the one presented. It helps in discovering the correct attacking and defensive capacity to allocate in order to maximize – for the attacker – or minimize – for the defender – the expected losses.

The complex network construction [56] can be used to find topological patterns that cannot be seen in other ways, in order to prevent dangerous situations. This approach needs few data but is able to deal only with single hazards. Daskalaki et al. [56] apply the method to perform a short-term hazard assessment of large earthquakes. In particular, the tools of complex network theory are exploited to identify potential spatio-temporal foreshock patterns that could add value to short-term earthquake forecasting or hazard assessment. A complex network divides the geographical space of the study area into small squared cells of equal dimension, each representing a node of the graph. The edges are traced between nodes that experience seismic events immediately successive in time. If $t_{i}$ and $t_{i+1}$ are two successive seismic events; the two cells, j and k, where they occur, are linked with a directed edge. From the resulting graph, different metrics can be calculated, in order to discover its topology: the small-world index (SW), the average clustering coefficient (ACC), the betweenness centrality (BC), and the mean degree of the underlying degree distribution, among others. The analysis of these indexes is able to reveal statistically significant changes in the network topology two months before a catastrophic seismic event and can help in reducing the risk a CI can suffer.

The third network method analysed is the multi-level complex network formulation, developed by Lacasa et al. [57], which is very useful for multi-hazard cases and requires few data. This approach starts from two or more events time-series which can be referred to one or more CIs and transforms them into graphs thanks to a mathematical technique known as ‘horizontal visibility’. The aim of this approach is to capture the topological properties of the time-series and helping in the prevention of natural or man-made hazards, similarly to the previous method. The multi-level complex network formulation can be applied to multi-hazard cases which are described by different time-series that all have the same number$N$of data points, each one corresponding to the same time instant. These time-series can be transformed into complex networks thanks to the horizontal visibility criterion and then are analysed altogether. In the application of the horizontal visibility criterion, every observation of each time-series becomes a node of the graph and the edges every node possesses are determined by the following rule: two nodes $i$ and j are linked by an edge if the associated time-series realizations $x(i)$ and $x(j)$ have horizontal visibility, i.e. if every intermediate datum $x(k)$ satisfies the ordering relation $x(k)<\inf \{x(i), x(j)\}, \forall k: i<k<j$. If the number of time-series considered is $M$, an $M$-layer multiplex network, which is called 'multiplex visibility graph', is then built. In this multiplex visibility graph, every layer $\alpha$ corresponds to the horizontal visibility graph associated to the time-series of the state variable $\left\{x^{[\alpha]}(t)\right\}_{t=1}^{N} .$ As previously anticipated, all the considered time-series have the same number of observations, N. Indeed, all the complex networks will have the same amount of nodes that can be analysed thanks to this method. Once all the graphs are built, two metrics are calculated: the ‘average edge overlap’ and the ‘interlayer mutual information’. The first index is used as a proxy of the overall coherence of the original multivariate time-series, with higher values indicating a higher correlation among time-series. The second index expresses the weights of the edges of a graph of layers, this being a projection of the original multiplex visibility graph M into a (single-layer) weighted graph of M nodes, where each node represents one layer. The weights of the edges of such graph denote the magnitude of mutual information, such as correlation and causality, between layers. The accuracy of the two multiplex metrics in distinguishing different dynamical phases makes it possible to efficiently analyse large multivariate time-series.

6. Critical Comparison of Analysed Approaches

In Section 5, a series of possible methodologies that can be applied to quantify the risk a CI is subject to have been reported. The selection of the best approach to be used in operational applications is not a simple task, because of the several factors to consider, such as data availability, type of hazards analyzed, computational requirements, and many others. Furthermore, the requirements are usually very strict to support decision-makers with reliable and comprehensive outcomes for long-term planning or short-term response to emergencies. The crucial issues to focus on with the aim to select the most suitable methodology to adopt for real applications are:

  1. Data scarcity, which is a very frequent obstacle, especially when emergent hazards must be modelled without any statistical basis. Therefore, methodologies capable to efficiently work with few data are preferable and easy to be implemented;
  2. Interaction mechanisms among hazards and high level of interconnection of CIs components, which increase significantly the possibility of having disruptive cascading effects. Therefore, risk assessment methodologies which allow the modelling of interdependencies, both among hazards and among CIs components, are preferable and they should be supported by the implementation of suitable data models in the tool (e.g. shifting from classical SQL database to those based on graph structure);
  3. Different hazard types, which can hit a CI and can vary from natural to man-made. It is important having the possibility of working with all the different types of hazards, simply adapting the method applied. Therefore, the methods that can be easily implemented for all the different types of hazards must be preferred to the ones which have been built only for a specific type of threat;
  4. Computational cost and application speed, which are fundamental features to be considered in order to have a proper functioning of the application. Indeed, the faster is the progress of the methodology, the better is for the stakeholders, who could be interested in taking quick decisions if they are facing rapidly changing situations.

All the methods presented in Section 5 are now critically compared, in order to identify the pros and the cons of each one. The comparison is performed using, as criterion, how the different methods address each of the four issues previously illustrated – data requirements and the capacity of taking into account multi-hazard interactions (previously discussed in Sections 3 and 4 respectively), together with the ability to model natural and man-made hazards and the required computational effort.

Table 2. Comparative analysis of the presented methodologies

   

Data Quantity

Single Hazard

Multiple Hazards

Natural Hazards

Man-made Hazards

Computational Cost

   

Observed Data

Expert Knowledge

WITHOUT Interactions

WITH Interactions

Mathematical & Statistical Methods

Bayesian Belief Network

MEDIUM

MEDIUM

 

HIGH

Logarithmic Regression

MEDIUM

MEDIUM

 

MEDIUM

Early Warning Index

HIGH

NOT REQUIRED

 

MEDIUM

Machine Learning Techniques

Artificial Neural Network

MEDIUM

NOT REQUIRED

 

HIGH

Support Vector Machine

MEDIUM

LOW

 

MEDIUM

Boosted Regression Tree

MEDIUM

LOW

 

MEDIUM

Generalized Additive Model

MEDIUM

LOW

 

MEDIUM

Genetic Algorithm Rule-Set Production

LOW

LOW

 

LOW

Quick Unbiased Efficient Statistical Tree

LOW

LOW

 

LOW

Graphs & Networks Approaches

Game Theory Application

LOW

HIGH

 

MEDIUM

Complex Network Methodology

MEDIUM

NOT REQUIRED

 

MEDIUM

Multi-Level Complex Network Formulation

LOW

NOT REQUIRED

 

 

LOW

The results of this comparative analysis are presented into Table 2 and are organized in the form of a matrix: all the methods are reported on the rows, grouped into three main categories (the mathematical and statistical methodologies; the machine learning approaches; the graph and networks techniques); the issues to be addressed – used as criteria for the comparison – are presented on the columns. Each criterion is discussed hereafter:

  • Data Quantity (Observed Data & Expert Knowledge). It expresses the data quantity every method needs in order to work well. Data are distinct into observed data, also called objective data, and expert knowledge, which are the subjective data. Every method can require “low”, “medium” or “high” quantity of observed data to work properly. Similarly, every method needs “low”, “medium” or “high” quantity of expert knowledge, but it can also use objective data only. In this last case, the label “not required” referred to expert knowledge indicate that the observed data are sufficient for the proper functioning of the methodology.
  • Single Hazards vs. Multiple Hazards (Without Interactions or With Interactions). It expresses the number of hazards every method is able to deal with. In particular, it takes into account if the considered method is able to deal with one hazard only (single hazard), or if it is able to process multiple hazards (multiple hazards). In this second case, the focus has been also on understanding if the method is able to take into account the interactions among hazards or not (with interactions or without interactions);
  • Natural Hazards vs. Man-Made Hazards. It describes if the method analyzed is native for facing a certain type of hazard and applicable to the other, or if it can be applied to both type of hazards. To define this, the symbols used in Table 2 are a filled black dot and an empty black dot. In the first case the method is native for that particular type of hazard, so the procedure to be followed in case of another hazard of the same type is the same already defined; in the second case, the method can be applied to the considered type of hazard too, but it must be adapted starting from the known procedure;
  • Computational cost. It measures the level of complexity of the method implemented, in order to understand if the procedure is onerous or efficient. As for the Data Quantity criterion, the computational cost of the method has been quantified thanks to the terms “low”, “medium” or “high” which define respectively if the method is fast, if it has a medium speed or if it is slow.

Based on the results of the analyzed literature, we can quantitatively conclude that most of the methodologies presented are able to perform well even if data are scarce. GARP [53], QUEST [53], and multi-level complex networks [57] are the approaches that are mostly able to give a good performance also with poor objective data. On the other hand, the early warning index calculation is the method which requires the greatest quantity of observed data in order to perform well. Some methods, like the game theory application, does not need a lot of objective data, but relies mostly on expert knowledge to function properly. On the contrary, some other methods, like the complex network methodology, do not require expert knowledge at all, but can only count on objective data. In general, the more data are provided, the better are the results of the method selected: the BBNs [46, 47], for example, require more and more data as their complexity arises, because a great quantity of data is needed to train the model. Usually, mathematical and statistical techniques should be supported by a great quantity of data, while graph and network methods, but above all machine learning approaches, can perform well even if data are not abundant.

The other great distinction that must be considered in order to better choose the method to use is the quantity of hazards every method can process contemporarily. Indeed, some methods can work with one hazard at a time, while some other methods can deal with multiple hazards contemporarily. In this second case, some approaches do not consider interactions among hazards, whereas others are able to do it. All the methods presented can deal with single hazards, except for the multi-level complex network [57], which has been explicitly built to work only with multiple hazards. All the machine learning techniques are able to deal with multiple hazards, but they are not capable of evaluating interactions among hazards, except for the ANNs [51]. The other two methods which consider interactions among hazards are the BBNs [46, 47] and the multi-level complex network [57].

All the methods presented are able to deal with natural and man-made hazards, but the papers cited present methods which are native for a certain type of hazard and can be applied to the other type of hazard and vice versa. As an example, the game theory application has been built for dealing with man-made hazards, but can be applied also to natural hazards. From this division, it is notable that all the machine learning techniques presented have been applied to natural hazards, but can also be used to face man-made hazards. In general, a native method for a certain type of hazards can be applied directly to another hazard of the same type following the procedure indicated in the paper cited, but if the considered methodology is not native for a certain type of hazard, it can be adapted to face it.

Lastly, the computational cost is another important aspect to consider when a method must be chosen. There are three approaches which are very fast in finding the final output: the GARP [53], the QUEST [53] and the multi-level complex network [57] approaches. All the other methods have a medium computational cost, while the slowest methods are the BBNs [46, 47] and the ANNs [51].

To conclude this critical comparison, it can be affirmed that the machine learning techniques and the graph and network approaches selected are the ones which are easily usable among all the methods, because they require a few quantities of data, they perform well both with single and with multiple hazards and they have a decent speed. On the other hand, the mathematical and statistical methods described usually require more data and are slower.

7. Conclusions

The increasing number of natural and man-made interacting hazards and the rising level of interconnection among CIs components, which is increasing the possibility of having disruptive effects, is transforming the CIs protection into one of the most central sectors worldwide. Risk assessment has been reaffirmed by the European Union as a core element to successfully implement CIs protection and risk reduction strategies, together with the need for a common standardized approach to multi-hazard risk at the Community level.

To support in the identification a common approach, this paper has presented a critical review of state-of-the-art risk assessment methodologies able to work with different levels of data availability and with single or multiple hazards, including also interactions among them. Twelve methods have been selected as the most promising in order to evaluate in a quantitative way the natural and man-made risk for CIs. The methods have been clustered into three different categories – mathematical and statistical methodologies, machine learning approaches, graph and network techniques – and each of them has been introduced in general terms. Afterwards, all the twelve methods have then been applied to respective examples, each one taken from the papers referred to in the definition of each approach.

A critical comparison among the methods has allowed determining the pros and the cons of each. The comparison led to the conclusion that the machine learning techniques and the graph and network approaches are the easiest to use, because they usually require a limited quantity of data and their computational cost is overall low. On the other hand, the mathematical and statistical methodologies are most onerous, both in terms of data quantity and in terms of computational cost.

Far from being exhaustive of all the possible issues encountered in risk assessment for CIs protection, this work has mainly focused on analyzing two main challenges that have been identified in data availability and modelling the interaction mechanisms between hazards.

Future works should focus deeply on interdependencies among CIs components, analyzing methods able to model the complexity of CIs and to evaluate possible cascading effects, due to damages of one or more components, generated both from single and multiple interacting hazards. In addition, the quantification of accuracy and reliability of multi-hazard risk models with respect to the reality should be investigated. Furthermore, a specific focus on time dependency of variables will be explored, since the variation of the conditions along the day (such as people distribution) and among the seasons (e.g. flash floods are more common in autumn) can significantly influence the risk landscape and consequent outcomes. Therefore, this aspect becomes even more crucial for near real-time risk assessment applications and early warning systems.

Acknowledgment

This work is supported by the Programma Operativo Por FSE Regione Liguria 2014 2020 GRISK (Grant numbers: RLOF18ASSRIC/70/).

  References

[1] Moteff, J., Copeland, C., Fischer, J. (2003). Critical infrastructures: What makes an infrastructure critical? Library of Congress Washington DC Congressional Research Service.

[2] Murray, A.T., Grubesic, T. (2007). Critical infrastructure: Reliability and vulnerability. Springer Science & Business Media. https://doi.org/10.1007/978-3-540-68056-7

[3] European Parliament. (2020). Directive of the European Parliament and of the Council on the resilience of critical entities.

[4] European Parliament. (2020). Proposal for measures to enhance the protection and resilience of critical infrastructure.

[5] Institute for Economics & Peace. (2019). Global terrorism index: measuring the impact of terrorism.

[6] Wallemacq, P. (2018). Economic losses, poverty & disasters: 1998-2017. Centre for Research on the Epidemiology of Disasters, CRED. https://doi.org/10.13140/RG.2.2.35610.08643

[7] European Parliament. (2020). Overview of natural and man-made disaster risks the European Union may face.

[8] Gill, J.C., Malamud, B.D. (2014). Reviewing and visualizing the interactions of natural hazards. Reviews of Geophysics, 52(4): 680-722. https://doi.org/10.1002/2013RG000445

[9] UNISDR, C. (2015). The human cost of natural disasters: A global perspective.

[10] Kappes, M.S., Keiler, M., von Elverfeldt, K., Glade, T. (2012). Challenges of analyzing multi-hazard risk: A review. Natural Hazards, 64(2): 1925-1958. https://doi.org/10.1007/s11069-012-0294-2

[11] Dunant, A., Bebbington, M., Davies, T. (2020). Probabilistic cascading multi-hazard risk assessment methodology using graph theory, a New Zealand trial. International Journal of Disaster Risk Reduction, 54: 102018. https://doi.org/10.1016/j.ijdrr.2020.102018

[12] Tilloy, A., Malamud, B.D., Winter, H., Joly-Laugel, A. (2019). A review of quantification methodologies for multi-hazard interrelationships. Earth-Science Reviews, 196: 102881. https://doi.org/10.1016/j.earscirev.2019.102881

[13] Pourghasemi, H.R., Kariminejad, N., Amiri, M., Edalat, M., Zarafshar, M., Blaschke, T., Cerda, A. (2020). Assessing and mapping multi-hazard risk susceptibility using a machine learning technique. Scientific Reports, 10(1): 1-11. https://doi.org/10.1038/s41598-020-60191-3

[14] Liu, B., Siu, Y.L., Mitchell, G. (2016). Hazard interaction analysis for multi-hazard risk assessment: a systematic classification based on hazard-forming environment. Natural Hazards and Earth System Sciences, 16(2): 629-642. https://doi.org/10.5194/nhess-16-629-2016

[15] Pourghasemi, H.R., Gayen, A., Edalat, M., Zarafshar, M., Tiefenbacher, J.P. (2020). Is multi-hazard mapping effective in assessing natural hazards and integrated watershed management? Geoscience Frontiers, 11(4): 1203-1217. https://doi.org/10.1016/j.gsf.2019.10.008

[16] Kaplan, S., Garrick, B.J. (1981). On the quantitative definition of risk. Risk Analysis, 1(1): 11-27. https://doi.org/10.1111/j.1539-6924.1981.tb01350.x

[17] Dhakal, R.P., Mander, J.B. (2006). Financial risk assessment methodology for natural hazards. Bulletin of the New Zealand Society for Earthquake Engineering, 39(2): 91-105. https://doi.org/10.5459/bnzsee.39.2.91-105

[18] Velásquez, C.A., Cardona, O.D., Mora, M.G., Yamin, L.E., Carreño, M.L., Barbat, A.H. (2014). Hybrid loss exceedance curve (HLEC) for disaster risk assessment. Natural Hazards, 72(2): 455-479. https://doi.org/10.1007/s11069-013-1017-z

[19] Cardona, O.D., Ordaz, M.G., Yamín, L.E., Singh, S.K., Barbat, A.H. (2013). Probabilistic Modelling of Natural Risks at the Global Level: Global Risk Model.

[20] Moore, W., Phillips, W. (2014). Review of ECLAC damage and loss assessments in the Caribbean.

[21] EU Expert Working Group on Disaster Damage and Loss Data. (2015). Guidance for Recording and Sharing Disaster Damage and Loss data–Towards the development of operational indicators to translate the Sendai Framework into action, JRC Scientific and Policy Report [online].

[22] OECD (2016). States of Fragility 2016.

[23] Schmidt, J., Matcham, I., Reese, S., et al. (2011). Quantitative multi-risk analysis for natural hazards: A framework for multi-risk modelling. Natural Hazards, 58(3): 1169-1192. https://doi.org/10.1007/s11069-011-9721-z

[24] Del Monaco, G., Margottini, C., Serafini, S. (1999). Multi-hazard risk assessment and zoning: an integrated approach for incorporating natural disaster reduction into sustainable development. TIGRA (The Integrated Geological Risk Assessment) Project (Env4-CT96-0262) Summary Report.

[25] Skjong, R., Wentworth, B.H. (2001). Expert judgment and risk perception. The Eleventh International Offshore and Polar Engineering Conference.

[26] Shen, Z., Odening, M., Okhrin, O. (2016). Can expert knowledge compensate for data scarcity in crop insurance pricing?. European Review of Agricultural Economics, 43(2): 237-269.

[27]  Flage, R., Aven, T., Zio, E., Baraldi, P. (2014). Concerns, challenges, and directions of development for the issue of representing uncertainty in risk assessment. Risk Analysis, 34(7): 1196-1207. https://doi.org/10.1111/risa.12247

[28] Ferson, S., Ginzburg, L.R. (1996). Different methods are needed to propagate ignorance and variability. Reliability Engineering & System Safety, 54(2-3): 133-144. https://doi.org/10.1016/S0951-8320(96)00071-3

[29] Aven, T. (2010). On the need for restricting the probabilistic analysis in risk assessments to variability. Risk Analysis: An International Journal, 30(3): 354-360. https://doi.org/10.1111/j.1539-6924.2009.01314.x

[30] Dubois, D. (2010). Representation, propagation, and decision issues in risk analysis under incomplete probabilistic information. Risk Analysis: An International Journal, 30(3): 361-368. https://doi.org/10.1111/j.1539-6924.2010.01359.x

[31] Aven, T. (2015). The concept of antifragility and its implications for the practice of risk analysis. Risk Analysis, 35(3): 476-483. https://doi.org/10.1111/risa.12279

[32] Taleb, N.N., Canetti, E., Kinda, T., Loukoianova, E., Schmieder, C. (2012). A new heuristic measure of fragility and tail risks: Application to stress testing. International Monetary Fund.

[33] Kappes, M. S. (2011). Multi-Hazard Risk Analyses: A Concept and its Implementation, University of Vienna.

[34] Del Monaco, G., Margottini, C. and Spizzichino, D. (2006). Report on new methodology for multi-risk assessment and the harmonisation of different natural risk maps. Deliverable 3.1, ARMONIA project.

[35] Carpignano, A., Golia, E., Di Mauro, C., Bouchon, S., Nordvik, J. (2009). A methodological approach for the definition of multi‐risk maps at regional level: First application. Journal of Risk Research, 12(3-4): 513-534. https://doi.org/10.1080/13669870903050269

[36] Gallina, V., Torresan, S., Critto, A., Sperotto, A., Glade, T., Marcomini, A. (2015). A review of multi-risk methodologies for natural hazards: Consequences and challenges for a climate change impact assessment. J. Environ. Manage., 168: 123-132. https://doi.org/10.1016/j.jenvman.2015.11.011

[37] Gill, J.C., Malamud, B.D. (2016). Hazard interactions and interaction networks (cascades) within multi-hazard methodologies. Earth System Dynamics, 7(3): 659. https://doi.org/10.5194/esd-7-659-2016

[38] Fleming, K., Parolai, S., Garcia-Aristizabal, A., Tyagunov, S., Vorogushyn, S., Kreibich, H., Mahlke, H. (2016). Harmonizing and comparing single-type natural hazard risk estimations. Annals of Geophysics, 59(2): 0216. https://doi.org/10.4401/ag-6987

[39] Reed, D.A., Friedland, C.J., Wang, S., Massarra, C.C. (2016). Multi-hazard system-level logit fragility functions. Engineering Structures, 122: 14-23. https://doi.org/10.1016/j.engstruct.2016.05.006

[40] Stewart, M.G. (2016). Risk and decision-making for extreme events: climate change and terrorism. Multi-hazard Approaches to Civil Infrastructure Engineering, pp. 87-103. https://doi.org/10.1007/978-3-319-29713-2_5

[41] Liu, Z., Nadim, F., Garcia-Aristizabal, A., Mignan, A., Fleming, K., Luna, B.Q. (2015). A three-level framework for multi-risk assessment. Georisk: Assessment and management of risk for engineered systems and geohazards, 9(2): 59-74. https://doi.org/10.1080/17499518.2015.1041989

[42] Hosmer Jr, D.W., Lemeshow, S., Sturdivant, R.X. (2013). Applied Logistic Regression (Vol. 398). John Wiley & Sons.

[43] Kabir, G., Suda, H., Cruz, A.M., Giraldo, F.M., Tesfamariam, S. (2019). Earthquake-related Natech risk assessment using a Bayesian belief network model. Structure and Infrastructure Engineering, 15(6): 725-739. https://doi.org/10.1080/15732479.2019.1569070

[44] Khakzad, N. (2015). Application of dynamic Bayesian network to risk analysis of domino effects in chemical infrastructures. Reliability Engineering & System Safety, 138: 263-272. https://doi.org/10.1016/j.ress.2015.02.007

[45] Argenti, F., Landucci, G., Reniers, G., Cozzani, V. (2018). Vulnerability assessment of chemical facilities to intentional attacks based on Bayesian Network. Reliability Engineering & System Safety, 169: 515-530. https://doi.org/10.1016/j.ress.2017.09.023

[46] Van Verseveld, H.C.W., Van Dongeren, A.R., Plant, N.G., Jäger, W.S., Den Heijer, C. (2015). Modelling multi-hazard hurricane damages on an urbanized coast with a Bayesian Network approach. Coastal Engineering, 103: 1-14. https://doi.org/10.1016/j.coastaleng.2015.05.006

[47] Kwag, S., Gupta, A. (2017). Probabilistic risk assessment framework for structural systems under multiple hazards using Bayesian statistics. Nuclear Engineering and Design, 315: 20-34. https://doi.org/10.1016/j.nucengdes.2017.02.009

[48] Khakzad, N., Van Gelder, P. (2018). Vulnerability of industrial plants to flood-induced natechs: A Bayesian network approach. Reliability Engineering & System Safety, 169: 403-411. https://doi.org/10.1016/j.ress.2017.09.016

[49] Chatterjee, S., Abkowitz, M.D. (2011). A methodology for modeling regional terrorism risk. Risk Analysis: An International Journal, 31(7): 1133-1140. https://doi.org/10.1111/j.1539-6924.2010.01565.x

[50] Hamadeh, N., Karouni, A., Daya, B., Chauvet, P. (2017). Using correlative data analysis to develop weather index that estimates the risk of forest fires in Lebanon & Mediterranean: Assessment versus prevalent meteorological indices. Case Studies in Fire Safety, 7: 8-22. https://doi.org/10.1016/j.csfs.2016.12.001

[51] Pilkington, S.F., Mahmoud, H.N. (2016). Using artificial neural networks to forecast economic impact of multi-hazard hurricane-based events. Sustainable and Resilient Infrastructure, 1(1-2): 63-83. https://doi.org/10.1080/23789689.2016.1179529

[52] Rahmati, O., Yousefi, S., Kalantari, Z., et al. (2019). Multi-hazard exposure mapping using machine learning techniques: A case study from Iran. Remote Sensing, 11(16): 1943. https://doi.org/10.3390/rs11161943

[53] Darabi, H., Choubin, B., Rahmati, O., Haghighi, A.T., Pradhan, B., Kløve, B. (2019). Urban flood risk mapping using the GARP and QUEST models: A comparative study of machine learning techniques. Journal of Hydrology, 569: 142-154. https://doi.org/10.1016/j.jhydrol.2018.12.002

[54] Major, J.A. (2002). Advanced techniques for modeling terrorism risk. Journal of Risk Finance, 4(1): 15-24. https://doi.org/10.1108/eb022950

[55] Hausken, K., Bier, V.M., Zhuang, J. (2009). Defending against terrorism, natural disaster, and all hazards. Game Theoretic Risk Analysis of Security Threats, pp. 65-97. https://doi.org/10.1007/978-0-387-87767-9_4

[56] Daskalaki, E., Spiliotis, K., Siettos, C., Minadakis, G., Papadopoulos, G.A. (2016). Foreshocks and short-term hazard assessment of large earthquakes using complex networks: the case of the 2009 L’Aquila earthquake. Nonlinear Processes Geophys, 23(4): 241-256. https://doi.org/10.5194/npg-23-241-2016

[57] Lacasa, L., Nicosia, V., Latora, V. (2015). Network structure of multivariate time series. Scientific Reports, 5: 15508. https://doi.org/10.1038/srep15508