© 2021 IIETA. This article is published by IIETA and is licensed under the CC BY 4.0 license (http://creativecommons.org/licenses/by/4.0/).
OPEN ACCESS
The Copula approach for continuous variables is highly developed, while discrete ones are underdeveloped due to computational difficulties and sometimes algorithm failure to convergent. Therefore, providing an alternative method for discrete variables becomes an essential issue. In this paper, a simple method is proposed to answer the problem by applying the Continuous Extension Technique (CET). This is carried out by adding random independent perturbations in the form of either Uniform distribution $\mathrm{U}(0,1) \text { or }(\mathrm{U}(0,1)1)$, and the discrete variables are treated as continuous. Subsequently, a Copula model for resulted variables is estimated based on the Copula theory for continuous variables. This method is called a Copula continuous extension technique. Our analytic and simulation approaches show that both random perturbation forms produce the same Kendall’s Tau measure and the selected Copula bivariate model. As illustrations, the proposed method is applied to the seismicity data obtained from the annual frequencies of earthquakes that occurred in the Sumatra megathrust of Indonesia, from January 1971 to December 2018, with magnitudes ( $M_{\mathrm{w}}$ ) of at least 4.6. Based on the selected Copula models, our simulations confirm the evidence of dependence seismic activity in each of the two adjacent large earthquake sources. These results provide new information regarding the seismicity behavior in the Sumatra megathrust.
continuous extension technique, dependence, copula model, Kendall’s tau, random perturbation, earthquakes
The Copula model approach becomes an alternative method in statistics, which is used in constructing the joint distribution of multivariate variables wherein dependence between marginals exists. Meanwhile, an application of the Copula model arises in a wide context, and several of the most commonly used are briefly mentioned, namely financial risk assessment by Zhang and Jiang [1], environmental sciences by Bhatti and Do [2], image processing by Dong et al. [3], health data by Ghahroodi et al. [4], and industrial problem by Wan and Li [5]. Furthermore, the Copula models used in the aforementioned cases are for continuous variables. Due to the nonuniqueness of the Copula model outside the range of marginal discrete distribution, the Copula for discrete variables is still not well developed [6]. In many real data applications, however, many phenomena are modeled based on the counting process, where the appropriate distributions are the discrete multivariate distributions. Therefore, these reasons motivate us to provide alternative procedures in Copula modeling for discrete variables.
One way to overcome this problem is by applying the Continuous Extension Technique (CET) to the marginal of the discrete variables. Moreover, the CET idea is to transform the marginal discrete variables $X$ to become continuous, denoted by $X^{\star}$. This is worked out by summing the discrete random variable with a random perturbation taking values in $(0,1)$ (e.g., Uniform distribution, denotes $\mathrm{U}(0,1)$ ) where the variables $X$ and $U(0,1)$ are independent. Formally, it can be written as follows [7]:
$\mathrm{X}^{*}=\mathrm{X}+\mathrm{U}(0,1)$. (1)
According to Eq. (1), it is depicted that $X$ is continued by $\mathrm{U}(0,1)$. After applying the CET to the discrete variable (previous), a Copula model is estimated for continuous variables (new) by following the model procedure for continuous variables (referred to as Copula continuous extension technique).
The CET procedure has been studied further by Machado and Santos Silva [8] to extend quantile regression to count data and by Denuit and Lambert [9] to study concordance measures for The CET procedure has been studied further by Machado and Santos Silva [8] to extend quantile regression to count data and by Denuit and Lambert [9] to study concordance measures for dependent discrete data. The latter authors modified the CET, and Eq. (1) becomes X^{⍟}$=X+(U(0,1)1)$. The detailed explanation is related to the idea of modifying the CET and its implications of the association measure (Kendall's Tau, denoted by $\tau$ ) between the bivariate continuous (new) and the bivariate discrete (previous), that is, $\tau(\mathrm{X}, \mathrm{Y})=$$\tau$(X^{⍟}, Y^{⍟}), can be seen in Denuit and Lambert [9].
Currently, the bivariate discrete variables $(X, Y)$ are considered, and are continued by (independent) considering two CET forms, $\left(X^{\star}, Y^{\star}\right)$ according to Stevens [7] and $(X^{⍟}, Y^{⍟}) according to Denuit and Lambert [9]. From each of the bivariate continuous, a Copula model can be constructed, as stated in Sklar's theorem [10]. From this situation, a question worthy to ask is whether the two Copula models produce the same model. When this happens, then one of two existing CET forms can be selected. Otherwise, further analysis regarding the selection of the appropriate CET model is needed. Since the result of this problem becomes the base for the next, this issue will be examined first. The examination is carried out using two approaches, namely analytic and simulation studies. In the simulation studies, five common bivariate discrete, namely Binomial, Geometric, Hypergeometric, Poisson, and Binomial Negative, are used to generate the dependence bivariate data. Meanwhile, for constructing the bivariate Copula, two families of Copula models, namely the Archimedean, i.e., Clayton, Gumbel, Frank, Joe, and Independent, and Elliptical, i.e., Gaussian and Student’s, are applied.
In the present paper, the possible real case for the application was selected, which is the dependence modeling of seismicity data in the Sumatra megathrust subduction zone of Indonesia. The main reason for selecting this context is the characteristics of data to be analyzed appropriately with our research problems, which is a discrete multivariate variable. Moreover, to the best of our knowledge, the application of dependence modeling of seismicity data using the Copula model is still rare. Even though, this model is needed by seismologists to identify the dependencies of seismic activity in one restricted area [11].
The rest of the paper is organized as follows. The study materials and methods related to this work such as the theory of bivariate discrete distributions, association measure, and bivariate Copula theory are provided in Section 2. In Section 3, steps of simulation and corresponding analytical study are presented. Furthermore, the real data application on seismicity is described in Section 4. Finally, concluding results and recommendations for further study are discussed in Section 5.
The use of a Copula model is relatively easy to implement in a bivariate case, but it becomes awkward when we implement in the multivariate case. Therefore, we restrict the discussion to the bivariate case, although generalization to higher dimensions is possible. In the next subsection, we address subjects such as bivariate discrete distribution, association measure, and bivariate Copula models. For the descriptions of bivariate discrete distribution, the books of Kocherlakota and Kocherlakota [12], Montgomery and Runger [13], were used as references. Meanwhile, for an association measure and bivariate Copula model, the books of Joe [14], Nelsen [15], and Hofert et al. [16] were used.
2.1 Bivariate discrete distributions
Here, we focus the discussion on five discrete bivariate distributions commonly used, namely Binomial, Geometric, Hypergeometric, Poisson, and Negative Binomial. Those distributions are derived from the Bernoulli distribution [17]. Furthermore, the relationship between the Bernoulli and the five aforementioned bivariate distributions is seen in Figure 1.
Figure 1. A family of bivariate distributions generated by the bivariate Bernoulli distribution (Adopted from Marshall and Olkin [17])
A common measure of the relationship between two random variables $(X, Y)$ is the Pearson correlation coefficient (PCC) denoted by $\rho(\mathrm{X}, \mathrm{Y})$. The value $\rho(\mathrm{X}, \mathrm{Y})$ is the covariance of the two variables divided by the product of their standard deviations, which is formally stated as follows:
$\rho(\mathrm{X}, \mathrm{Y})=\left(\sigma_{\mathrm{X}} \sigma_{\mathrm{Y}}\right)^{1} \operatorname{Cov}(\mathrm{X}, \mathrm{Y})$ (2)
As for the summary related to distribution notationmodel parameters, joint probability mass function, and the Pearson correlation of the selected five bivariate discrete distributions are seen in Table 1.
2.2 Association measure
By far, the most familiar association measure between two random variables is calculated by the $\mathrm{PCC}$ defined in $\mathrm{Eq}$. (2). However, it is noted that this approach has some limitations, namely only appropriate for detecting the linear dependencies between two variables and it is not invariant under strictly increasing nonlinear transformation, which is, $\rho(\mathrm{X}, \mathrm{Y}) \neq$ $\rho\left(\mathrm{F}_{\mathrm{X}}(\mathrm{x}), \mathrm{G}_{\mathrm{Y}}(\mathrm{y})\right)$. The variables $\mathrm{F}_{\mathrm{X}}(\mathrm{x})$ and $\mathrm{G}_{\mathrm{Y}}(\mathrm{y})$ are the cumulative distribution functions (CDFs) of the random variable of $\mathrm{X}$ and $\mathrm{Y}$, respectively $[18] .$
Those limitations motivate the development of a dependence measure for bivariate variables, i.e., Spearman’s Rho and Kendall’s Tau. Both measures are constructed based on the concept of concordance and discordance (Kruskal [19] and Lehmann [20]), which refers to the property that large values of one random variable tend to be associated with large values of the other random variable and small values of one random variable with small values of the other. Whereas discordance refers to large values of one random variable being associated with small values of the other.
To be more precise, suppose a pair of data is given by $\left(x_{i}, y_{i}\right)$ and $\left(x_{j}, y_{j}\right)$. We identify a concordance or discordance pair by: for $i \neq j$, if $\operatorname{sign}\left(x_{i}x_{j}\right)=\operatorname{sign}\left(y_{i}\right.$ $\left.y_{j}\right)$ then $\left(x_{i}, y_{i}\right)$ and $\left(x_{j}, y_{j}\right)$ is a concordance pair otherwise it is a discordance pair.
In the following discussions, we concentrate on Kendall’s Tau measure which is denoted by “ $\tau$" . By definition, Kendall's Tau measure is the probability of a concordance minus the probability of a discordance, which is formally written as follows:
$\tau(\mathrm{X}, \mathrm{Y})=P\left[\left(\mathrm{X}_{1}\mathrm{X}_{2}\right)\left(\mathrm{Y}_{1}\mathrm{Y}_{2}\right)>0\right]$$P\left[\left(\mathrm{X}_{1}\mathrm{X}_{2}\right)\left(\mathrm{Y}_{1}\mathrm{Y}_{2}\right)<0\right]$ (3)
Here, $\left(\mathrm{X}_{1}, \mathrm{Y}_{1}\right)$ and $\left(\mathrm{X}_{2}, \mathrm{Y}_{2}\right)$ are independent random vector and identically distributed from ( $\mathrm{X}, \mathrm{Y}$ ). Also, it should be noted that Eq. (3) is appropriate for continuous random variables but not for discrete ones due to the presence of ties condition, that is, $\mathrm{P}($ tie $)=\mathrm{P}\left(\mathrm{X}_{1}=\mathrm{X}_{2}\right.$ or $\left.\mathrm{Y}_{1}=\mathrm{Y}_{2}\right)$. By using that $\mathrm{P}$ (concordance) $+\mathrm{P}$ (discordance) $+\mathrm{P}$ (tie) $=1$ then Kendall's Tau measure for the discrete bivariate variables (as commonly denotes $\tau_{b}$ ) is written as follows [21]:
$\begin{gathered}
\tau_{b}(\mathrm{X}, \mathrm{Y})=4 P\left[\left(\mathrm{X}_{1}\mathrm{X}_{2}\right)\left(\mathrm{Y}_{1}\mathrm{Y}_{2}\right)>0\right] \\
1+P\left(\mathrm{X}_{1}=\mathrm{X}_{2} \text { or } \mathrm{Y}_{1}=\mathrm{Y}_{2}\right)
\end{gathered}$ (4)
Table 1. Joint probability mass functions and the PCC formulation associated with the selected five bivariate discrete distributions adopted from Kocherlakota and Kocherlakota [12] and Montgomery and Runger [13]
Bivariate model 
Distribution notation and model parameters 
Joint probability mass function (PMF) formulation 
$\boldsymbol{\rho}(\mathbf{X}, \mathbf{Y})$ 
Binomial 
$\begin{gathered} 
$\begin{gathered} 
$\frac{n\left(p_{00} p_{11}p_{10} p_{01}\right)}{\sqrt{a} \sqrt{b}}$, $\begin{gathered} 
Geometric 
$\begin{gathered} 
$\begin{gathered} 
$\frac{\left(1\eta_{3}\right) \sqrt{\eta_{1} \eta_{2}}}{1\eta_{1} \eta_{2} \eta_{3}}$ 
Hypergeometric 
$\begin{gathered} 
$\frac{\left(\begin{array}{c} 
$\sqrt{\frac{m_{1}}{Nm_{1}} \frac{m_{2}}{Nm_{2}}}$ 
Poisson 
$\begin{gathered} 
$\begin{aligned} 
$\frac{\lambda_{3}}{\sqrt{\lambda_{1}+\lambda_{3}} \sqrt{\lambda_{2}+\lambda_{3}}}$ 
Binomial Negative 
$\operatorname{BivNB}\left(\kappa, p_{1}, p_{2}\right)$ $\kappa>0$ and $0<p_{i}<1$ such that $p_{1}+p_{2}<1$ 
$\begin{aligned} 
$\frac{p_{1} p_{2}}{\sqrt{\left(p_{0}+p_{1}\right)\left(p_{0}+p_{2}\right)}}$ 
In addition, the association of two continuous random variables is measured using the bivariate Copula models approach, which is known as Kendall's Tau Copula $\tau_{C}$. This is presented in the next subsection.
2.3 Bivariate Copula theory
To be selfcontained, we provide a brief description of a bivariate Copula theory to sampleselection models. Literally, in statistics and probability theory, a bivariate Copula model is a twodimensional joint CDFs based on a given marginal CDF. Therefore, the properties of the bivariate Copula models are analogous to properties of bivariate CDFs, that satisfy grounded and 2increasing properties, see Joe [14], Nelsen [15], and Trivedi and Zimmer [18] for details.
The following theorem, known as Sklar's theorem [10], is the practical usefulness theorem of construction Copula models. According to that theorem, for random variables $X$ and $\mathrm{Y}$ with respective marginal $\operatorname{CDFs} F_{\mathrm{X}}(x)$ and $F_{\mathrm{Y}}(y)$, the bivariate distribution $H_{\mathrm{X}, \mathrm{Y}}(x, y)$ can be expressed as follows:
$\begin{aligned}
&H_{\mathrm{X}, \mathrm{Y}}(x, y)=P(\mathrm{X} \leq x, \mathrm{Y} \leq y) \\
&=C\left(F_{\mathrm{X}}(x), F_{\mathrm{Y}}(y): \theta\right), x, y \in R
\end{aligned}$ (5)
where, $\theta$ is a parameter of the Copula model called the parameter of dependence, which describes the dependence between $F_{\mathrm{X}}(x)$ and $F_{\mathrm{Y}}(y)$. Estimation $\theta$ by the value $\hat{\theta}$ can be determined by maximizing the loglikelihood function of the pdf Copula model as follows [22]:
$\hat{\theta}=\arg \max _{\theta} \sum_{i=1}^{T} \log c\left(F_{\mathrm{X}}\left(x_{i}\right), F_{\mathrm{Y}}\left(y_{i}\right) ; \theta\right)$ (6)
Based on the Integral Transform probability theorem (Angus $[23]$ ), the marginal distribution of variables $(\mathrm{X}, \mathrm{Y})$ follows the Uniform distribution ranging from 0 to 1 . Therefore, by denotes $F_{\mathrm{X}}(x)$ and $F_{\mathrm{Y}}(y)$ with $u_{1}$ and $u_{2}$, respectively, we can rewrite the Eq. (5) as follows:
$C\left(F_{\mathrm{X}}, F_{\mathrm{Y}}: \theta\right)=C\left(u_{1}, u_{2}: \theta\right), u_{1}, u_{2} \in[0,1]$ (7)
There are a number of Copula models that have been widely explored in many articles and books. In this paper, models are chosen to be the Archimedean Copula family (i.e., Clayton, Gumbel, Frank, Joe, and Independent) and the Elliptical Copula family (i.e., Gaussian and Student’s) which were often used in practice. Here, we write those Copula models in terms of random variables U_{1} and U_{2} that have standard to denotes the Uniform marginal distribution, as can be seen in Table 2.
Let $(X, Y)$ be bivariate discrete random variables. We reconsider two random independent perturbation forms of the CET process for marginal bivariate discrete variables, i.e., $X^{\star}=X+U(0,1)[7]$ and X^{⍟}$=X+(U(0,1)1)$ [9]. There is analog for the discrete variable $\mathrm{Y}$, but now replacing $\mathrm{U}(0,1)$ by $\mathrm{V}(0,1)$. Here, the variable $\mathrm{U}(0,1)$ and $\mathrm{V}(0,1)$ are independent. Subsequently, we analytically prove that for a given bivariate discrete variables $(X, Y)$ the two CET results, i.e., $\left(X^{\star}, Y^{\star}\right)$ and (X^{⍟}, Y^{⍟}), produce the same Copula model and its parameter. Therefore, some simulations are carried to confirm our analytical approach. The process includes the following.
Suppose $\left(F_{\mathrm{X}^{\star}}, F_{\mathrm{Y}^{\star}}\right)$ and (F_{X}^{⍟}, F_{Y}^{⍟}) are the CDFs for $\left(X^{\star}, Y^{\star}\right)$ and (X^{⍟}, Y^{⍟}), respectively. We will first examine that if two bivariate continuous, $\left(X^{\star}, Y^{\star}\right)$ and (X^{⍟}, Y^{⍟}) have the same behavior of the CDFs, i.e., $\left(F_{\mathrm{X}^{\star}}, F_{\mathrm{Y}^{\star}}\right)=$(F_{X}^{⍟}, F_{Y}^{⍟}), and Kendall's Tau measure, i.e., $\tau\left(\mathrm{X}^{\star}, \mathrm{Y}^{\star}\right)=\tau$ (X^{⍟}, Y^{⍟}), then they will have not only the same Copula model but also its parameter due to the presence of the unique property of the Copula model for continuous variables [10].
Table 2. Formulation of some single parameter Copula model of the selected Archimedean Copulas family with their generators function and the selected Elliptical Copulas family, i.e., Gaussian Copula and Students’s Copula
Copula model 
$\text { Function } C\left(u_{1}, u_{2} ; \theta\right)$ 
$\text { Generator } \phi(t ; \theta)$ 
Parameter range $(\boldsymbol{\theta})$ 
Kendall’s tau of the copula C 
Independence 
$u_{1} u_{2}$ 
$\log (t)$ 
n/a 
$0$ 
Clayton 
$\left(u_{1}^{\theta}+u_{2}^{\theta}1\right)^{1 / \theta}$ 
$\theta^{1}\left(t^{\theta}1\right)$ 
$(0, \infty)$ 
$\frac{\theta}{\theta+2}$ 
Gumbel Hougaard 
$\exp \left(\left(\tilde{u}_{1}^{\theta}+\tilde{u}_{2}^{\theta}\right)^{1 / \theta}\right)$ ; $\tilde{u}_{j}=\log u_{j}$ 
$(\log t)^{\theta}$ 
$[1, \infty)$ 
$\frac{\theta1}{\theta}$ 
Frank 
$\frac{1}{\theta} \log \left(1+\frac{u_{1}{ }^{*} u_{2}{ }^{*}}{\exp (\theta)1}\right)$ ; $u_{j}^{*}=e^{\left(\theta u_{j}1\right)}$ 
$\log \left(\frac{\exp (\theta t)1}{\exp (\theta)1}\right)$ 
$(\infty, \infty)$ 
$1\frac{4}{\theta}\left[1D_{1}(\theta)\right]$ 
Joe 
$1\left[u_{1}^{*}+u_{2}^{*}u_{1}^{*} u_{2}^{*}\right]^{1 / \theta}$ ; $u_{j}{ }^{*}=\left(1u_{j}\right)^{\theta}$ 
$\log \left(1(1t)^{\theta}\right)$ 
$[1, \infty)$ 
$1\sum_{k=1}^{\infty} \frac{4}{h(\theta, k)}$ 
Gaussian 
$\Phi_{G}\left[\Phi^{1}\left(u_{1}\right), \Phi^{1}\left(u_{2}\right) ; \theta\right]$ 
n/a 
$(1,1)$ 
$\frac{2}{\pi} \arcsin (\theta)$ 
Student’s 
$t_{2, v}\left[t_{v}^{1}\left(u_{1}\right), t_{v}^{1}\left(u_{2}\right) ; \theta\right]$ ; $v \in(2, \infty)$ 
n/a 
$[1,1]$ 
$\frac{2}{\pi} \arcsin (\theta)$ 
Notes: 1. The quantity $D_{1}$ is the Debye function of order one, defined by $D_{1}(x)=1 / x \int_{0}^{x} t /\left(e^{t}1\right) d t, t \in(0, \infty), 2 . h(\theta, k)=(k(\theta k+2)(\theta(k1)+2))$, and 3. $\mathrm{n} / \mathrm{a}$ denotes that the mentioned item is not available.
Before proceeding to show (F_{X}^{⋆},F_{Y}^{⋆})⊆(F_{X}^{⍟},F_{Y}^{⍟}), a relation of $F_{\mathrm{X}^{\star}}$ with F_{X}^{⍟} and $F_{\mathrm{Y}^{\star}}$ with F_{Y}^{⍟} needs to be determined. In addition, for all $(x, y)$ elements of the domain (X^{⍟}, Y^{⍟}), there is F_{X}^{⍟} $(x+1, y+1)$ elements of the domain $\left(X^{\star}, Y^{\star}\right)$ such that F_{X}^{⍟}(x)=P(X^{⍟} $\leq x)$ $=P\left(\mathrm{X}_{i}^{\star}1 \leq x\right)=P\left(\mathrm{X}_{i}^{\star} \leq x+1\right)=F_{\mathrm{X}^{\star}}(x+1)$. The same statement also holds for F_{Y}^{⍟} $(x)=F_{\mathrm{Y}^{\star}}(y+1)$.
As already stated, we will prove $\left(F_{\mathrm{X}^{\star}}, F_{\mathrm{Y}^{\star}}\right)=$ (F_{X}^{⍟}, F_{Y}^{⍟}) by showing that $\left(F_{\mathrm{X}^{\star}}, F_{\mathrm{Y}^{\star}}\right) \subseteq$ (F_{X}^{⍟}, F_{Y}^{⍟}) and $\left(F_{\mathrm{X}^{*}}, F_{\mathrm{Y}^{\star}}\right) \supseteq$(F_{X}^{⍟}, F_{Y}^{⍟}).
We start the investigation for (F_{X}^{⋆},F_{Y}^{⋆})⊆(F_{X}^{⍟},F_{Y}^{⍟}). Suppose that $\left(a_{0}, b_{0}\right) \in\left(F_{\mathrm{X}^{*}}, F_{\mathrm{Y}^{*}}\right)$, by using the CDF definition, there is $\left(x_{0}, y_{0}\right)$, which belongs to the domain of $\left(\mathrm{X}^{\star}, \mathrm{Y}^{\star}\right)$ such that a_{0}=F_{X}^{⋆}(x_{0})=F_{X}^{⍟}(x_{0}1) and b_{0}= F_{Y}^{⋆}(y_{0})=F_{Y}^{⍟}(y_{0}1). Thus, a_{0}= F_{X}^{⍟}(x_{0}1) and b_{0}= F_{Y}^{⍟}(y_{0}1) where $\left(x_{0}1, y_{0}1\right)$ belongs to the domain of (X^{⍟},Y^{⍟}) and thus (a_{0},b_{0})∈(F_{X}^{⍟}(X^{⍟}),F_{Y}^{⍟}(Y^{⍟})). As a conclusion (F_{X}^{⋆}),F_{Y}^{⋆})⊆(F_{X}^{⍟},F_{Y}^{⍟}).
Next, we check for (F_{X}^{⋆}, F_{Y}^{⋆})⊇(F_{X}^{⍟}, F_{Y}^{⍟}) (F_{X}^{⍟}, F_{Y}^{⍟}). We work out in similar way. Let (a_{1},b_{1} ) be the element of (F_{X}^{⍟}, F_{Y}^{⍟}). This means, there is (x_{1},y_{1}), which belongs to domain of (X^{⍟},Y^{⍟} ) such that a_{1}= F_{X}^{⍟}(x_{1} )=F_{X}^{⋆} (x_{1}+1) and b_{1}= F_{Y}^{⍟}(y_{1} )=F_{Y}^{⋆}(y_{1}+1). In other words, a_{1}=F_{X}^{⋆} (x_{1}+1) and b_{1}=F_{Y}^{⋆} (y_{1}+1) where (x_{1}+1,y_{1}+1) belongs to domain of (X^{⋆},Y^{⋆} ). So, we have that (a_{1},b_{1} )∈ (F_{X}^{⋆}(X^{⋆}),F_{Y}^{⋆}(Y^{⋆})) and thus, (F_{X}^{⋆},F_{Y}^{⋆} )⊇(F_{X}^{⍟},F_{Y}^{⍟} ).
By combining the two results above, i.e., (F_{X}^{*}, F_{Y}^{*})⊆ (F_{X}^{⍟},F_{Y}^{⍟}) and (F_{X}^{*}, F_{Y}^{*})⊇(F_{X}^{⍟},F_{Y}^{⍟}), we therefore have (F_{X}^{*}, F_{Y}^{*})=(F_{X}^{⍟},F_{Y}^{⍟}).
The second examination is related to Kendall’s Tau measure of (X^{⋆},Y^{⋆}) and (X^{⍟},Y^{⍟}). Let (X_{1},Y_{1}) and (X_{2},Y_{2}) be independent copies of bivariate discrete (X,Y). We assume that for i=1,2 holds: (i) X_{i} and Y_{i} are continued by the method of Stevens [7] or Denuit and Lambert [9] (ii) the Uniform distribution $\mathrm{U}_{i}$ and $\mathrm{V}_{i}$ are independent. Under these conditions and according to Eq. (3), Kendall’s Tau measure of (X^{⍟},Y^{⍟}) is written as follows: τ(X^{⍟},Y^{⍟})=P[(X_{1}^{⍟}X_{2}^{⍟})(Y_{1}^{⍟}Y_{2}^{⍟})>0] P[(X_{1}^{⍟}X_{2}^{⍟})(Y_{1}^{⍟}Y_{2}^{⍟})<0].
Note that, we write the expression of X_{i}^{⍟}= X_{i} +(U_{i} (0,1)1) to become X_{i}^{⍟}= X_{i}^{*}1 similarly for the variable Y_{i}^{⍟}. Thus, we get
τ(X^{⍟},Y^{⍟})
$\begin{gathered} =P\left[\left(\left(X_{1}^{\star}1\right)\left(X_{2}^{\star}1\right)\right)\left(\left(Y_{1}^{\star}1\right)\left(Y_{2}^{\star}1\right)\right)>0\right] \\ P\left[( ( X _ { 1 } ^ { \star }  1 )  ( X _ { 2 } ^ { \star }  1 ) ) \left(\left(Y_{1}^{\star}1\right)\right.\right. \\ \left.\left.\left(Y_{2}^{\star}1\right)\right)<0\right] \\ =P\left[\left(X_{1}^{\star}X_{2}^{\star}\right)\left(Y_{1}^{\star}Y_{2}^{\star}\right)>0\right] \\ P\left[\left(X_{1}^{\star}X_{2}^{\star}\right)\left(Y_{1}^{\star}Y_{2}^{\star}\right)<0\right] \\ =\tau\left(X^{\star}, Y^{\star}\right) \end{gathered}$ (8)
The last expression of Eq. (8) confirmed that τ(X^{⋆}, Y^{⋆})=τ(X^{⍟}, Y^{⍟}). In addition, Denuit and Lambert [9] have stated that for a given bivariate discrete (X, Y), then τ(X, Y)=τ(X^{⍟}, Y^{⍟}). Hence, based on those results, we conclude that τ(X,Y)=τ(X^{⋆}, Y^{⋆})=τ(X^{⍟}, Y^{⍟}). This means the two random independent perturbation forms of the CET process in bivariate discrete did not change Kendall’s Tau measure.
To summarize the analytic studies, in setting consideration for our research problem, two necessary conditions are provided, namely (F_{X}^{⋆}, F_{Y}^{⋆})=(F_{X}^{⍟}, F_{Y}^{⍟}) and τ(X^{⋆}, Y^{⋆})=τ(X^{⍟},Y^{⍟}), such that C(F_{X}^{⋆}, F_{Y}^{⋆}:θ^{⋆})= C(F_{X}^{⍟}, F_{Y}^{⍟}:θ^{⍟}) with θ^{⋆}=θ^{⍟}. In other words, they have the same Copula model and its parameter.
Subsequently, some simulations are presented to confirm the analytical approach. In the simulations, data are drawn from five bivariate discrete (see Table 1). Furthermore, a sample of size N = 200 is generated by choosing a set of parameters corresponding to the characteristics of each distribution model. The characteristics are referred to as low PCC (below of 0.5) and high (above of 0.5) except for Hypergeometric. Therefore, there are nine possible cases to be studied. The procedure simulation listed below can be followed:
It must be noted that for the construction continuous Copula model, a twostep maximum likelihood (TSML) procedure is commonly used. That is, the marginals are estimated in the first step, and then as the second step, the dependence parameter is estimated using the selected family of Copula models based on the CDF of the inferred marginal distributions. This procedure is known as the Inference Function for Marginals (IFM) [24].
In the third simulation step, marginal probability distributions of continuous variable are identified by fitting with eight distribution models, namely Normal, Logistic, Cauchy, Exponential, LogNormal, Gamma, Weibull, and Gumbel.
The notation, formulation, and parameters of each continuous probability function are presented briefly in Table 3. The standard approach to estimate the parameters models $(\boldsymbol{\omega})$ of each distribution is the maximum likelihood method, which requires the maximization of the loglikelihood function that represented as follows [25]:
$\mathcal{L}\left(\boldsymbol{\omega} \mid x_{1}, \ldots, x_{T}\right)=\sum_{i=1}^{T} \log f\left(\left(x_{i}\right) ; \boldsymbol{\omega}\right)$ (9)
Table 3. Probability and cumulative distribution functions associated with the selected eight continuous distribution models
Univariat model 
Probability density function $f(x:$ parameters $)$ 
Cumulatif density function $F(x$ : parameters $)$ 
Domain and parameters range 
$\operatorname{Normal}\left(\boldsymbol{\mu}, \boldsymbol{\sigma}^{2}\right)$ 
$\frac{1}{\sigma \sqrt{2 \pi}} \exp \left(\frac{(x\mu)^{2}}{2 \sigma^{2}}\right)$ 
$\frac{1}{2}\left(1+\operatorname{erf}\left(\frac{x\mu}{\sqrt{2} \sigma}\right)\right)$ 
$x, \mu \in(\infty,+\infty)$ 
$\operatorname{Logistic}(\mu, s)$ 
$\frac{\exp \left(\frac{(x\mu)}{s}\right)}{s\left(1+\exp \left(\frac{(x\mu)}{s}\right)\right)^{2}}$ 
$\frac{1}{1+\exp \left(\frac{(x\mu)}{s}\right)}$ 
$x, \mu \in(\infty,+\infty)$ 
$\operatorname{Cauchy}\left(x_{0}, \gamma\right)$ 
$\frac{1}{\pi \gamma}\left(\frac{\gamma^{2}}{\left(xx_{0}\right)^{2}+\gamma^{2}}\right)$ 
$\frac{1}{\pi} \arctan \left(\frac{xx_{0}}{\gamma}\right)+\frac{1}{2}$ 
$x, x_{0} \in(\infty,+\infty)$ 
Exponential $(\xi)$ 
$\xi \exp (\xi x)$ 
$1\exp (\xi x)$ 
$x \in[0,+\infty)$ 
$\operatorname{Lognormal}\left(\mu, \sigma^{2}\right)$ 
$\frac{1}{x \sigma \sqrt{2 \pi}} \exp \left(\frac{(\ln x\mu)^{2}}{2 \sigma^{2}}\right)$ 
$\frac{1}{2}\left(1+\operatorname{erf}\left(\frac{\ln x\mu}{\sqrt{2} \sigma}\right)\right)$ 
$x, \sigma^{2} \in(0,+\infty)$ 
$\operatorname{Gamma}(\boldsymbol{h}, \boldsymbol{\vartheta})$ 
$\left(\Gamma(h) \vartheta^{h}\right)^{1} x^{h1} e^{\vartheta / x}$ 
$\frac{1}{\Gamma(h)} \gamma\left(h, \frac{x}{\vartheta}\right)$ 
$x, h, \vartheta \in(0,+\infty)$ 
Weibull $(\varrho, k)$ 
$\frac{k}{\varrho}\left(\frac{x}{\varrho}\right)^{k1} \exp \left(\left(\frac{x}{\varrho}\right)^{k}\right)$ 
$1\exp \left(\left(\frac{x}{\varrho}\right)^{k}\right)$ 
$x \in[0,+\infty)$ 
$\operatorname{Gumbel}(\mu, \beta)$ 
$\frac{1}{\beta} \exp \left(\left(\frac{x\mu}{\beta}+e^{\frac{x\mu}{\beta}}\right)\right)$ 
$\exp \left(e^{\frac{x\mu}{\beta}}\right)$ 
$x, \mu \in(\infty,+\infty)$ 
Note: $\operatorname{erf}(x)=\frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{t^{2}} d t$ .
The goodnessoffit quantification level between the given data set (as information) and a theoretical model fitted is required in steps 3 and 4. Due to the theoretical model parameters (Tables 2 and 3) are estimated using the maximum likelihood, then the informationtheoretical, such as the Akaike and Bayesian Information Criteria ( $\mathrm{AIC}$ and BIC, respectively) are used for model selection, as recommended by Akaike [26] and Schwarz [27]. These criteria are defined as follows:
$\operatorname{AIC}(l)=2 \mathcal{L}_{l}+2 p(l)$. (10)
$\mathrm{BIC}(l)=2 \mathcal{L}_{l}+p(l) \log \mathrm{T}$. (11)
Variable $\mathcal{L}_{l}$ is the value for the maximum likelihood of the fitted model, $p(l)$ is the number of parameters, and variable $T$ is the number of observations. Furthermore, the smaller values of $\operatorname{AIC}(l)$ and $\operatorname{BIC}(l)$ means that the selected model shows better agreement with the data set.
In the present paper, the simulation and visualizations are carried out using the R program. Furthermore, several packages in the R program that correspond to this study include “extraDistr” (Wolodzko [28]), “BivGeo” (de Oliveira and Achcar [29]), “bivariate” (Spurdle [30]), “fitdistrplus” (DelignetteMuller and Dutang [31]), “RMKdiscrete” (Kirkpatrick [32]), “copula” (Hofert et al. [33]), and “VineCopula” (Schepsmeier et al. [34]). As information, steps to construct the bivariate Copula model for continuous variables have been provided by Xu et al. [35]. Meanwhile, Hofert et al. [16] have provided the elements of Copula Modeling with R program.
To summarize the simulation studies, there are two main results from 100 iterations performed. Firstly, the comparison of the frequency histograms of dependence parameter, the Kendall's Tau Empiric $\left(\tau_{E}\right)$ of three continuous bivariates, that is, $\left(X^{\star}, Y^{\star}\right),\left(X^{\circledast}, Y^{\circledast}\right)$, and (X^{▴},Y^{▴}) with the random perturbation forms $U(0,1)$ as control, $(U(0,1)1)$ as treatment 1 , and $(\mathrm{U}(0,1)+1)$ as treatment 2 , respectively). These are shown as the same histogram (the lefthand panels of Figure 2$) .$ In other words, the three data sets of 100 iterations performed are the same.
Secondly, as seen in the righthand panels of Figure 2, a comparison of the frequency histograms of dependence parameter, which is, Kendall’s Tau Copula $\tau_{C} \text { of }\left(\mathrm{X}^{\star}, \mathrm{Y}^{\star}\right)$ , $\left(X^{\circledast}, Y^{\circledast}\right)$, and (X^{▴},Y^{▴}) are shown as the same histogram. In other words, each of 100 iterations performed the same as the selected Copula model.
The simulation results are in line with the findings of the analytic studies. Accordingly, the CET form of Denuit and Lambert [9] and Stevens [7] gave the same conclusion, i.e., produced the same Copula model and its parameter. Therefore, in practice, one of two existing random independent perturbation forms of the CET process can be selected in modifying the Copula methods for bivariate discrete variables.
The next discussion is related to the illustration of our proposed technique on the dependence modeling of seismicity data. Hence, to assist in explaining the illustration for the seismologists, we will elaborate on the CET process of bivariate discrete (previous) and the steps of the Copula modeling for the bivariate continuous (new) in detail.
Figure 2. The frequency histograms of the $\tau_{E}$ (lefthand panels) and $\tau_{C}$ (righthand panels) for the nine cases analyzed
4.1 Research area and seismicity data
The Sumatra megathrust region of Indonesia is characterized by high seismic activity due to the presence of the subduction process of the IndoAustralian plates into the Eurasian plates with an average rate of 4 mm/year. Also, the region has three earthquake sources, namely the Sumatra subduction zone, Mentawai fault zone, and the Sumatra fault system.
In this study, the Sumatra megathrust (as the study area) is defined to be a rectangle region with latitude between 6.2° S  5.5° N and longitude between 93.5° E  104.5° E. Due to the presence of large earthquake sources, the seismologists have led to divide the region into five subregions, which do not overlap, namely AcehAndaman (AA), NiasSimelue (NS), MentawaiSiberut (MS), MentawaiPagai (MP), and Enggano (EO), as seen in lefthand panels of Figure 3 [11].
Figure 3. Map of the research area and the scatter plot of the pairs data. The lefthand panels is adopted from PusGeN [11] and Wikipedia.org (2020), while the righthand panels is obtained using the R program
Here, the observed yearly numbers of mainshock earthquakes that occurred in the subregions from January 1971 to December 2018, with Magnitude of Completeness $\left(\mathrm{M}_{\mathrm{c}}\right) \geq 4.6 \mathrm{M}_{\mathrm{w}}$ are the data considered in this application.
The seismicity data source is the earthquake catalog of the United States Geological Survey (www.earthquake.usgs.gov). The data pairs of earthquakes count are graphically shown in the righthand panels of Figure 3. Meanwhile, statistic descriptions such as minmax value, mean, and standard deviations for each subregion are seen in Table 5.
In illustrating the proposed technique, we focus on two adjacent subregions of five, such as AANS, NSMS, MSMP, and MPEO although the other six pairs are possible.
Previously, Orfanogiannaki et al. [36] and Rizal et al. [37] have provided the subregion seismic activity in the study area using the univariate Poisson mixture models (dependent and independent). However, according to the last column in Table 4, i.e., association measure, it is important to note that for every two adjacent subregions, there is a positive association measure and significant (pvalue < 0.01). Therefore, it shows evidence of dependence from two selected subregions exists. From these preliminary results, the data set can also be used in bivariate dependence modeling through the Copula model approaches.
4.2 Results
As a starting point, discrete data are transformed into continuous using the CET process proposed by Stevens [7], and then investigate the bestfit probability model among the eight models in Table 3. These models were jointly used for the investigation. In summary, those initial steps are displayed in Figure 4. The first and second rows describe the CET process, while the third and fourth describe the PDF and CDF of the selected univariate continuous model.
Table 5 shows the level comparisons of the goodness of fit (Logllk, AIC, and BIC) between a theoretical probability model and the given data set. Here, the smallest AIC and BIC scores on each row are denoted by bold marks. However, for the subregion MS, two criteria (AIC and BIC) gave two different models. Therefore, to overcome this problem, we use the third criterion, i.e., the maximum of Loglikelihood function (Logllk). We then decide the appropriate model for subregion MS would be the Gamma distribution.
Table 4. The descriptive statistics for five subregion empirical data and association measure for two adjacent subregions
Subregions 
Min 
Max 
Mean 
Variance 
Association measure 

$\tau(i, i+1)$ 
pvalue 

AA 
$i=1$ 
0 
81 
8.73 
165.12 
0.683 
$<$ 0.01 
NS 
2 
0 
61 
11.94 
221.41 
0.524 
$<$ 0.01 
MS 
3 
0 
31 
4.58 
27.56 
0.285 
$<$ 0.01 
MP 
4 
0 
59 
8.48 
150.31 
0.323 
$<$ 0.01 
EO 
5 
0 
25 
8.58 
32.04 


Figure 4. The visualizing of the CET process with the varying random perturbation and fitted marginal distributions. Columns correspond to the five subregions studied, namely AA, NS, MS, MP, and EO. Rows correspond to (1) histogram of the random perturbation (Uniform distribution), (2) histogram of the discrete variable (red color) and the continuous variable (black color), (3) density curves of fitted marginal distributions, and (4) cumulative density curve of fitted marginal distributions
With the twoinformation, i.e., the marginal distribution for each subregion (Table 5), and the association measure of two adjacent subregions (Table 4), the next step is to search for an appropriate Copula model for the dependence structure of two selected subregions. To achieve this, the investigation is like the procedure of determining the marginal distribution, i.e., trial and error process. Also, among the six Copula models, each is fitted to the data set (pairs of CDFs) of two selected subregions. To decide the best model, the goodnessoffit level is calculated in three terms, namely Loglikelihood, AIC, and BIC. Here, the smallest AIC and BIC scores on each row are denoted by bold marks. The investigation summary for the six Copula models and the selected model, as well as its parameter for each case studied are shown in Table 6.
Table 5. Comparison of the fitted eight continuous probability models on the basis of loglikelihood, AIC, and BIC
Subregions 
Goodness of fit criteria 
Normal 
Logistik 
Cauchy 
Exp 
LogNorm 
Gamma 
Weibull 
Gumbel 
AA 
Logllk 
190.426 
171.148 
154.573 
154.923 
147.947 
153.944 
154.865 
160.340 
AIC 
384.852 
346.296 
313.145 
311.846 
299.895 
311.888 
313.729 
324.680 

BIC 
388.594 
350.038 
316.887 
313.717 
303.637 
315.630 
317.472 
328.422 

NS 
Logllk 
196.994 
193.997 
180.128 
168.972 
165.086 
168.871 
168.550 
183.315 
AIC 
397.988 
391.995 
364.255 
339.944 
334.172 
341.743 
341.101 
370.629 

BIC 
401.730 
395.737 
367.998 
341.816 
337.914 
345.485 
344.843 
374.372 

MS 
Logllk 
147.617 
138.286 
135.332 
126.636 
125.304 
125.222 
125.879 
130.082 
AIC 
299.233 
280.573 
274.665 
255.272 
254.609 
254.443 
255.758 
264.163 

BIC 
302.975 
284.315 
278.407 
257.143 
258.351 
258.186 
259.500 
267.906 

MP 
Logllk 
187.842 
179.539 
155.090 
153.410 
148.818 
153.345 
152.850 
167.278 
AIC 
379.684 
363.078 
314.181 
308.820 
301.636 
310.689 
309.701 
338.556 

BIC 
383.427 
366.820 
317.923 
310.691 
305.378 
314.431 
313.443 
342.298 

EO 
Logllk 
150.461 
148.092 
147.702 
153.799 
151.791 
146.167 
145.532 
144.900 
AIC 
304.923 
300.185 
299.403 
309.597 
307.583 
296.334 
295.064 
293.800 

BIC 
308.665 
303.927 
303.146 
311.469 
311.325 
300.077 
298.807 
297.542 
Table 6. Comparison of the fitted six Copula models on the basis of AIC and BIC
Two adjacent subregions 
Copula 
Loglikelihood 
AIC 
BIC 
Selected copula 

Type 
Parameters 

AANS AA ~ LogNormal (1.80, 0.76) NS ~ LogNormal (1.89, 1.30) 
Gaussian 
26.113 
50.226 
48.354 
Gumbel Hougaard 
2.706 
Student’s 
26.073 
48.147 
44.404 

Clayton 
25.855 
49.709 
47.838 

Gumbel 
27.414 
52.827 
50.956 

Frank 
25.356 
48.711 
46.840 

Joe 
25.913 
49.826 
47.955 

NSMS NS ~ LogNormal (1.89, 1.30) MS ~ Gamma (1.38, 0.27) 
Gaussian 
15.868 
29.735 
27.864 
Gaussian 
0.727 
Student’s 
15.790 
27.580 
23.837 

Clayton 
14.966 
27.932 
26.061 

Gumbel 
15.829 
29.658 
27.787 

Frank 
14.900 
27.800 
25.929 

Joe 
14.275 
26.550 
24.679 

MSMP MS ~ Gamma (1.38, 0.27) MP ~ LogNormal (1.58, 1.24) 
Gaussian 
4.496 
6.993 
5.121 
Clayton 
0.798 
Student’s 
4.358 
4.717 
0.974 

Clayton 
5.738 
9.476 
7.605 

Gumbel 
4.984 
7.968 
6.097 

Frank 
4.178 
6.356 
4.485 

Joe 
5.381 
8.763 
6.891 

MPEO MP ~ LogNormal (1.58, 1.24) EO ~ Gumbel (6.61, 4.21) 
Gaussian 
5.652 
9.303 
7.432 
Gaussian 
0.501 
Student’s 
5.480 
6.960 
3.217 

Clayton 
4.816 
7.632 
5.761 

Gumbel 
5.026 
8.051 
6.180 

Frank 
5.284 
8.568 
6.697 

Joe 
4.183 
6.366 
4.495 
4.3 Comparison with other count models
In this section, we compare our proposed technique, i.e., Copula continuous extension technique, and the popular models for count data, such as bivariate Negative Binomial [38, 39], bivariate Poisson [40, 41], and double Poisson distribution [12]. The first two models have been briefly described in subsection 2.1, as seen in Table 1.
The double Poisson is a special case of the Poisson bivariate. It is important to consider the bivariate Poisson formula for bivariate discrete $(\mathrm{X}, \mathrm{Y})$, denoted by $\operatorname{BivPoi}\left(\lambda_{1}, \lambda_{2}, \lambda_{3}\right)$. Marginally, each random variable follows a univariate Poisson distribution with $\mathrm{E}[\mathrm{X}]=\operatorname{Var}[\mathrm{X}]=\lambda_{1}+\lambda_{3}, \mathrm{E}[\mathrm{Y}]=\operatorname{Var}[\mathrm{Y}]=$ $\lambda_{2}+\lambda_{3}$, and $\operatorname{Cov}[\mathrm{X}, \mathrm{Y}]=\lambda_{3}$. The parameter $\lambda_{3}$ is a measure of dependence between the two random variables $X$ and $Y$. If $\lambda_{3}=0$, then the two variables are independent and the bivariate Poisson formula is represented as the product of two independent univariate Poisson distributions (called as double Poisson distribution).
The comparison of four models selected for count data modeling in Table 7 shows that, on the basis of AIC and BIC, the Copula continuous extension technique is superior to the models of double Poisson, bivariate Poisson, and bivariate Negative Binomial.
Some aspects can be read from Table 7 which is associated with the information in Table 4. Firstly, it is reasonable that double Poisson distribution has the last rank, since the data set of the two selected subregions studied were not independent (Table 4). Secondly, it is important to note that according to the fifth and sixth columns of Table 4, an overdispersion condition, i.e., variance greater than mean, in the observation data from each subregion exists. As we know, the marginal of the bivariate Negative Binomial allows for overdispersion found in the data set, which can not be accounted by the bivariate Poisson. Therefore, the bivariate Negative Binomial have a better performance than the bivariate Poisson. Thirdly, although the bivariate Negative Binomial can overcome the problems of dependent and overdispersion in the data set, it is not enough to capture the joint probability behavior of our bivariate data, as well as the selected Copula model, due to this model have assumptions that must be considered [12, 13].
Finally, the joint probability function is determined from the two adjacent subregions considered by combining the selected marginals (Table 5) and selected the Copula model (Table 6). For two adjacent subregions AANS, NSMS, MSMP, and MPEO are expressed as follows, respectively.
$H_{\mathrm{AA}, \mathrm{NS}}(x, y)=\exp \left(\left(\left(\log \left(\frac{1}{2}\left(1+\operatorname{erf}\left(\frac{\ln x1.80}{1.232}\right)\right)\right)\right)^{2.706}+\left(\log \left(\frac{1}{2}\left(1+\operatorname{erf}\left(\frac{\ln y1.89}{1.612}\right)\right)\right)\right)^{2.706}\right)^{0.370}\right)$ (12)
$H_{\mathrm{NS}, \mathrm{MS}}(x, y)=\Phi_{G}\left[\Phi^{1}\left(\frac{1}{2}\left(1+\operatorname{erf}\left(\frac{\ln x1.80}{1.232}\right)\right)\right), \Phi^{1}\left(\frac{1}{\Gamma(1.38)} \gamma\left(1.38, \frac{x}{0.27}\right)\right) ; 0.727\right]$ (13)
$H_{\mathrm{MS}, \mathrm{MP}}(x, y)=\left(\left[\frac{1}{\Gamma(1.38)} \gamma\left(1.38, \frac{x}{0.27}\right)\right]^{0.798}+\left[\frac{1}{2}\left(1+\operatorname{erf}\left(\frac{\ln x1.58}{1.575}\right)\right)\right]^{0.798}1\right)^{1.253}$ (14)
$H_{\mathrm{MP}, \mathrm{EO}}(x, y)=\Phi_{G}\left[\Phi^{1}\left(\frac{1}{2}\left(1+\operatorname{erf}\left(\frac{\ln x1.58}{1.575}\right)\right)\right), \Phi^{1}\left(\exp \left(e^{\frac{x6.61}{4.21}}\right)\right) ; 0.501\right]$ (15)
Another way of visualizing the selected Copula model is to present its densities as contour plots (first row), as shown in Figure 5. Furthermore, as seen in the second row in Figure 5, the scatter plots of simulated and observed data are shown. Also, the association measure of the simulated data of AANS, NSMS, MSMP, and MPEO were 0.668, 0.514, 0.273, and 0.314, respectively. These results are near to the association measures from the observed data, i.e., 0.683, 0.524, 0.285, and 0.323, as shown in Table 4. In other words, the selected Copula models yielding from the Copula continuous extension technique can capture the association behavior of bivariate discrete variables.
Table 7. Comparison of the four models based on AIC and BIC
Two adjacent subregions 
Bivariate Poisson 
Double Poisson 
Bivariate Negative Binomial 
Copula continuous extension technique 

AIC 
BIC 
AIC 
BIC 
AIC 
BIC 
Model 
AIC 
BIC 

AANS 
1341.643 
1349.336 
1516.381 
1521.510 
330.843 
336.456 
Gumbel 
52.827 
50.956 
NSMS 
1137.008 
1144.701 
1223.226 
1228.355 
373.964 
379.577 
Gaussian 
29.735 
27.864 
MSMP 
1035.361 
1043.055 
1059.908 
1065.037 
296.043 
307.656 
Clayton 
9.476 
7.605 
MPEO 
1058.489 
1066.182 
1067.261 
1072.389 
319.571 
325.184 
Gaussian 
9.303 
7.432 
As a note, the AICBIC listed in the Copula continuous extension technique refers to the selected Copula model as shown in Table 6.
Figure 5. Some visualizations are based on selected Copula models. Columns correspond to subregions studied, namely AANS, NSMS, MSMP, and MPEO. Rows correspond to contour plots of the densities, and scatter plots observed and simulated
A simple technique has been provided in modifying the Copula modeling method for bivariate discrete variables, which is the continuous extension. There are two steps carried out in this technique. The first is applying the Continuous Extension Technique (CET) at the marginal bivariate discrete variables to become continuous ones. In the second step, a Copula model was constructed for a new bivariate variable (continuous) by following the procedure from the Copula theory for continuous variables. However, two forms of random perturbation of the CET can be used, i.e., $\mathrm{U}(0,1) \text { and }(\mathrm{U}(0,1)1)$ as proposed by Stevens [7] and modified by Denuit and Lambert [9], respectively. Based on analytical and simulation studies, two different forms of CET produced the same conclusions about Kendall’s Tau measure and the selected Copula model. Therefore, practitioners can select one of two existing CET forms in practice.
The advantage of the present technique is not only from practical point of view, but it is also able to preserve the parameter dependence of the former discrete variables. Thus, the selected Copula model based on the Copula continuous extension technique can be used to represent the dependence modeling of the discrete variables.
To help the seismologists understand the proposed technique, an application to the seismicity data recorded in five subregions of the Sumatra megathrust, namely AcehAndaman (AA), NiasSimelue (NS), MentawaiSiberut (MS), MentawaiPagai (MP), and Enggano (EO) has been presented. The illustration was based on the observed yearly numbers of mainshock earthquakes that occurred from January 1971 to December 2018, with Magnitude of Completeness $\left(\mathrm{M}_{\mathrm{c}}\right) 4.6 \mathrm{M}_{\mathrm{w}}$ . According to the selected Copula model, the evidence of dependence seismic activity in each of the two adjacent subregions exists. The results of these analyses provide new information regarding the seismicity behavior in the Sumatra megathrust. Therefore, as future study, the dependence modeling using the Copula models can be applied as an alternative approach in developing the prediction of earthquake hazard models in Sumatra megathrust.
The authors sincerely wish to thank the journal editor and reviewers who critically reviewed the manuscript and made valuable suggestions for its improvement. The first author would like to thank LPDP (Lembaga Pengelola Dana Pendidikan), Ministry of Finance, Republic Indonesia for providing the Ph.D. scholarship program. Partial funding is also supported by P2MI research grant of Institut Teknologi Bandung, Indonesia 2021 (Grant numbers: 62/IT1.C02/SKTA/2021).
CET 
Continous extension technique 
PCC 
Pearson correlation coefficient 
$\mathrm{U}(0,1)$ 
Uniform distribution $(0,1)$ 
$\mathrm{V}(0,1)$ 
Uniform distribution $(0,1)$ that independent with $(0,1)$ 
X 
Discrete random variable 
$X^{\star}$ 
A continuous random variable that resulting from the CET process with the Stevens method. 
X^{⍟} 
A continuous random variable that resulting from the CET process with the Denuit and Lambert method. 
E [] 
The expectation of arandom variable 
Var () 
The variance of a random variable 
Cov () 
The covariance of the two variables 
$\mathrm{F}()$ 
The cumulative distribution function 

Probablitiy distribution function 
PMF 
Probablitiy mass function 
CDF 
Cumulative distribution function 
H() 
Joint cumulative distribution 
C() 
The Copula model 
$\operatorname{erf}(x)$ 
$\frac{2}{\sqrt{\pi}} \int_{0}^{x} e^{t^{2}} d t$. 
T 
The number of observation 
R 
Real numbers 
pvalue 
the probability of declaring that the test result is at least the same as the actual observed result, assuming that the null hypothesis is correct. 
log 
Logarithm function 
Logllk 
Logarithm of the loglikelihood function 
n/a 
not available 
$\text { AIC }$ 
Akaike Information Criteria 
$\text { BIC }$ 
Bayesian Information Criteria 
$\mathrm{p}(l)$ 
the number of parameters of the fitted model 
$M_{C}$ 
Magnitude of Completeness 
$\mathrm{M}_{\mathrm{w}}$ 
moment magnitude 
$\subseteq$ 
Subset 
Greek symbols 

$\tau$ 
Kendall’s Tau measure 
$\rho$ 
Pearson correlation coefficient 
$\sigma$ 
Standard deviation 
$\mathcal{L}$ 
The likelihood function 
$\theta$ 
Parameter of Copula models 
$\omega$ 
Parameters of univariate continuous models 
$\Phi$ 
CDF of a standard normal distribution 
$\Phi^{1}$ 
Invers of $\Phi$ 
$v$ 
Degrees of freedom of the bivariate tdistribution 
$\Gamma$  Gamma function 
[1] Zhang, X., Jiang, H. (2019). Application of Copula function in financial risk analysis. Computers & Electrical Engineering, 77: 376388. https://doi.org/10.1016/j.compeleceng.2019.06.011
[2] Bhatti, M.I., Do, H.Q. (2019). Recent development in copula and its applications to the energy, forestry and environmental sciences. International Journal of Hydrogen Energy, 44(36): 1945319473. https://doi.org/10.1016/j.ijhydene.2019.06.015
[3] Dong, H., Xu, X., Sui, H., Xu, F., Liu, J. (2017). Copulabased joint statistical model for polarimetric features and its application in PolSAR image classification. IEEE Transactions on Geoscience and Remote Sensing, 55(10): 57775789. https://doi.org/10.1109/TGRS.2017.2714169
[4] Ghahroodi, Z.R., Saba, R.A., Baghfalaki, T. (2019). Gaussian copulabased regression models for the analysis of mixed outcomes: An application on household's utilization of health services data. Journal of Statistical Theory and Applications, 18(3): 182197. https://doi.org/10.2991/jsta.d.190306.009
[5] Wan, J., Li, S. (2019). Modeling and application of industrial process fault detection based on pruning vine copula. Chemometrics and Intelligent Laboratory Systems, 184: 113. https://doi.org/10.1016/j.chemolab.2018.11.005
[6] Marshall, A. (1996). Copulas, marginals, and joint distributions. Institute of Mathematic Statistics, 28: 213222. https://doi.org/10.1214/lnms/1215452620
[7] Stevens, W.L. (1950). Fiducial limits of the parameter of a discontinuous distribution. Biometrika, 37(1/2): 117129. https://doi.org/10.1093/biomet/37.12.117
[8] Machado, J.A.F., Santos Silva, J.M.C. (2005). Quantiles for counts. Journal of the American Statistical Association, 100(472): 12261237. https://doi.org/10.1198/016214505000000330
[9] Denuit, M., Lambert, P. (2005). Constraints on concordance measures in bivariate discrete data. Journal of Multivariate Analysis, 93(1): 4057. https://doi.org/10.1016/j.jmva.2004.01.004
[10] Sklar, A. (1959). Functions de repartition an dimensions et leurs marges. Publications de l’Institut de Statistique de l’Universite de Paris, 8: 229231.
[11] PusGeN (2017). Peta Sumber dan Bahaya Gempa Indonesia Tahun 2017, Pusat Penelitian dan Pengembangan Perumahan dan Pemukiman. Badan Peneliti dan Pengembangan Kementerian Pekerjaan Umum dan Perumahan Rakyat, pp. 8991 and 189 (in indonesian). Documentation available at https://www.scribd.com/document/394240076/BUKUPETAGEMPA2017pdf.
[12] Kocherlakota, S., Kocherlakota, K.K. (1992). Bivariate Discrete Distributions. Marcel Decker, New York. https://archive.org/details/bivariatediscret00koch.
[13] Montgomery, D.C., Runger, G.C. (2007). Applied Statistics and Probability for Engineers. John Wiley & Sons.
[14] Joe, H. (1997). Multivariate Models and Multivariate Dependence Concepts. CRC Press. https://doi.org/10.1201/9780367803896
[15] Nelsen, R.B. (2006), An Introduction to Copulas. 2nd edition. New York: Springer. https://doi.org/10.1007/0387286780
[16] Hofert, M., Kojadinovic, I., Mächler, M., Yan, J. (2019). Elements of Copula Modeling with R. Springer. https://doi.org/10.1007/9783319896359
[17] Marshall, A.W., Olkin, I. (1985). A family of bivariate distributions generated by the bivariate Bernoulli distribution. Journal of the American Statistical Association, 80(390): 332338. https://doi.org/10.2307/2287890
[18] Trivedi, P., Zimmer, D. (2007). Copula modeling: an introduction for practitioners. Foundations and Trends® in Econometrics, 1(1): 1111. http://dx.doi.org/10.1561/0800000005
[19] Kruskal, W.H. (1958). Ordinal measures of association. Journal of the American Statistical Association, 53(284): 814861. https://doi.org/10.1080/01621459.1958.10501481
[20] Lehmann, E.L. (1966). Some concepts of dependence. The Annals of Mathematical Statistics, 37(5): 11371153. https://doi.org/10.1214/aoms/1177699260
[21] Agresti, A. (2010). Analysis of Ordinal Categorical Data (Second ed.). New York: John Wiley & Sons. https://doi.org/10.1002/9780470594001
[22] Cherubini, U., Luciano, E., Vecchiato, W. (2004). Copula Methods in Finance West Sussex: John Wiley and Sons. https://doi.org/10.1002/9781118673331
[23] Angus, J.E. (1994). The probability integral transform and related results. SIAM Review, 36(4): 652654. https://doi.org/10.1137/1036146
[24] Joe, H., Xu, J.J. (1996). The estimation method of inference functions for margins for multivariate models. Technical Report 166, Department of Statistics, University of British Columbia. https://dx.doi.org/10.14288/1.0225985
[25] Ross, S.M. (2014). Introduction to Probability Models. Academic Press. https://doi.org/10.1016/C20120035648
[26] Akaike, H. (1974). A new look at the statistical model identification. IEEE Transactions on Automatic Control, 19(6): 716723. https://doi.org/10.1109/TAC.1974.1100705
[27] Schwarz, G. (1978). Estimating the dimension of a model. The Annals of Statistics, 6(2): 461464. https://doi.org/10.1214/aos/1176344136
[28] Wolodzko, T. (2019). ExtraDistr: Additional Univariate and Multivariate Distributions,. R Package Version, 1(11). https://cran.rproject.org/web/packages/extraDistr/extraDistr.pdf.
[29] de Oliveira, R.P., Achcar, J.A. (2018). BasuDhar’s bivariate geometric distribution in presence of censored data and covariates: Some computational aspects. Electronic Journal of Applied Statistical Analysis, 11(1): 108136. https://doi.org/10.1285/i20705948v11n1p108
[30] Spurdle, A. (2020). Bivariate (Version 0.6.0). https://cran.rproject.org/web/packages/bivariate/index.html.
[31] DelignetteMuller, M.L., Dutang, C. (2015). Fitdistrplus: An R package for fitting distributions. Journal of Statistical Software, 64(4): 134. https://doi.org/10.18637/jss.v064.i04
[32] Kirkpatrick, R.M. (2014). RMKdiscrete (Version 0.1). http://cran.rproject. org/web/packages/RMKdiscrete.
[33] Hofert, M., Kojadinovic, I., Maechler, M., Yan, J., Maechler, M.M., Suggests, M.A.S.S. (2014). Package ‘copula’. http://ie.archive.ubuntu.com/disk1/disk1/cran.rproject.org/web/packages/copula/copula. pdf.
[34] Schepsmeier, U., Stoeber, J., Brechmann, E.C., Graeler, B., Nagler, T., Erhardt, T., Killiches, M. (2015). Package ‘VineCopula’. R package version, 2(5). https://cran.rproject.org/web/packages/VineCopula/index.html.
[35] Xu, Y., Tang, X.S., Wang, J.P., Kuo‐Chen, H. (2016). Copula‐based joint probability function for PGA and CAV: A case study from Taiwan. Earthquake Engineering and Structural Dynamics, 45(13): 21232136. https://doi.org/10.1002/eqe.2748
[36] Orfanogiannaki, K., Karlis, D., Papadopoulos, G.A. (2014). Identification of temporal patterns in the seismicity of Sumatra using Poisson Hidden Markov models. Research in Geophysics, 4(1). https://doi.org/10.4081/rg.2014.4969
[37] Rizal, J., Gunawan, A.Y., Indratno, S.W., Meilano, I. (2018). Identifying dynamic changes in Megathrust segmentation via Poisson mixture model. In Journal of Physics: Conference Series, 1097(1): 012083. https://iopscience.iop.org/article/10.1088/17426596/1097/1/012083.
[38] Iwasaki, M., Tsubaki, H. (2006). Bivariate negative binomial generalized linear models for environmental count data. Journal of Applied Statistics, 33(9): 909923. https://doi.org/10.1080/02664760600744157
[39] Fernando, S.M., Sooriyarachchi, M.R. (2018). Bivariate negative binomial modelling of epidemiological data. Open Science Journal of Statistics and Application, 5(3): 47.
[40] Karlis, D., Ntzoufras, I. (2003). Analysis of sports data by using bivariate Poisson models. Journal of the Royal Statistical Society: Series D (The Statistician), 52(3): 381393. https://doi.org/10.1111/14679884.00366
[41] Berkhout, P., Plug, E. (2004). A bivariate Poisson count data model using conditional probabilities. Statistica Neerlandica, 58(3): 349364. https://doi.org/10.1111/j.14679574.2004.00126.x