This vignette is built with the following version of
SHELF
:
## [1] '1.10.0.9000'
Random sampling is used in this vignette. The seed is set as follows.
We illustrate the process of incorporating dependence between uncertain quantities using a multivariate (bivariate) normal copula.
As an example, suppose a clinical trial is to be conducted for a new treatment. The trial will take place at two centres, with treatment intended to last for one year. It is believed that a proportion of patients recruited at each centre will not complete the treatment and hence drop out of the trial. Denote these two uncertain proportions by \(X_1\) and \(X_2\). We suppose that patient characteristics are believed to be different at the second centre, such that the expert is expecting a slightly higher drop-out rate (although she is not certain that \(X_2>X_1\)).
We first elicit marginal distributions for \(X_1\) and \(X_2\). We suppose the experts states \[ P(X_1\le 0.12) = 0.25, P(X_1\le 0.15) = 0.5, P(X_1 \le 0.20)= 0.75, \] \[ P(X_2\le 0.15) = 0.25, P(X_2\le 0.2) = 0.5, P(X_1 \le 0.25)= 0.75. \]
We fit distributions to each of these set of judgements.
p <- c(0.25, 0.5, 0.75)
v1 <- c(0.12, 0.15, 0.2)
v2 <- c(0.15, 0.2, 0.25)
myfit1 <- fitdist(vals = v1, probs = p, lower = 0, upper = 1)
myfit2 <- fitdist(vals = v2, probs = p, lower = 0, upper = 1)
We choose to use beta distributions for each marginal, and suppose
that we have gone through the process of feedback with the expert, so
that she is satisfied with the two fitted distributions. The two fitted
distributions are plotted below. (To aid comparison, lower and upper
axes limits xl
and xu
are specified in the
plotfit
commands.)
We can incorporate dependence through the use of a bivariate normal copula. The general form of the joint density function of \(X_1\) and \(X_2\) is a little complex, but simulating from the joint distribution is more straightforward. The idea is as follows.
To obtain \(r\), the expert is asked
to consider her concordance probability \[
p=P(X_1 > 0.15, X_2 > 0.2, \mbox{ or }X_1 < 0.15, X_2 <
0.2),
\] i.e. her probability that the two uncertain proportions are
either both above their medians or both below their medians. Values 0,
0.5 and 1 for this probability corresponding to perfect negative
correlation, independence, and perfect positive correlation
respectively. Suppose she judges \(p=0.8\), in that if one proportion is
higher than her median, she believes the other is likely to be also. The
correlation parameter \(r\) can be
obtained as \[
r = \sin\left(2\pi\left(\frac{p}{2} - 0.25\right)\right) = 0.81.
\] The function copulaSample
can be used to sample
from the required distribution. The function generalises to \(d\) dependent variables \(X_1,\ldots, X_d\) (though we don’t
recommend using this method for \(d>3\)), and takes as input a matrix of
concordance probabilities, where element \(i,j\) of this matrix is the probability
\[P(X_i > m_i, X_j > m_j \mbox{ or }X_i
< m_i, X_j < m_j),\] where \(m_i\) and \(m_j\) are the corresponding elicited
medians. It is only necessary to specify the upper triangular elements
of this matrix. Note that with \(d>2\), the elicited pairwise concordance
probabilities may not result in a positive-definite variance matrix. We
discuss this further at the end.
conc.prob <- matrix(0, 2, 2)
conc.prob[1, 2] <- 0.8
X <- copulaSample(myfit1, myfit2, cp = conc.prob,
n = 1000, d = c("beta", "beta"))
The object X
is a 1000 \(\times\) 2 matrix, with \(i\)-th column corresponding to \(X_i\). We verify that the sample has the
right properties, checking the sample quartiles and quadrant
probability.
## 25% 50% 75%
## 0.1192782 0.1546022 0.1962070
## 25% 50% 75%
## 0.1515046 0.2014171 0.2518534
## [1] 0.794
We plot the sample below.
library(ggplot2)
ggplot(data.frame(X), aes(x = X1, y = X2)) +
geom_point(alpha = 0.1, colour = "red") +
geom_hline(yintercept = 0.2) +
geom_vline(xintercept = 0.15) +
labs(x=expression(X[1]), y = expression(X[2]))
Note that an interactive tool is available for specifying a concordance probability and viewing a corresponding joint sample for two uncertain quantities. In this example, it would be run with the command
With three or more uncertain quantities, it is possible to specify pair-wise concordance probabilities that are not coherent, resulting in a correlation matrix that is not positive definite. To illustrate this, suppose we have three uncertain quantities \(X_1, X_2\) and \(X_3\), with elicited medians \(m_1,m_2\) and \(m_3\) and consider the following three statements:
Statement 3 is not ‘consistent’ with the first two: Statements 1 and 2 imply that we expect to see either both \(X_2\) and \(X_3\) above their medians (if \(X_1>m_1\)), or both below their medians (if \(X_1<m_1\)), but Statement 3 says that we expect to see one of \(X_2,X_3\) above its median, and the other to be below.
Continuing the example, suppose we have elicited a marginal distribution for \(X_3\) (with the elicited quartiles being 0.2, 0.25 and 0.35.)
Defining \[ p_{i,j}=P(X_i > m_i, X_j > m_j \mbox{ or }X_i < m_i, X_j < m_j), \] the expert has already stated \(p_{1,2}=0.8\), and we suppose she now also judges \[ p_{1,3} = 0.7,\quad p_{2,3}=0.1 \] We set up the matrix of concordance probabilities as follows (upper diagonal elements only )
so we have the matrix
## [,1] [,2] [,3]
## [1,] 0 0.8 0.7
## [2,] 0 0.0 0.1
## [3,] 0 0.0 0.0
We now attempt to obtain a joint sample using a multivariate normal copula
## Elicited correlation matrix is not positive definite.
## Consider adjusting one of the concordance probabilities
## to be within the following limits.
##
## p_{1,2} p_{1,3} p_{2,3}
## lower 0.2 0.1 0.5
## upper 0.4 0.3 0.9
The output tells us how any one of the concordance probabilities should be adjusted to make it cohere with the other two. (This feature is only available for three uncertain quantities.) For example, \(p_{2,3}\) would need to be specified in the range \([0.5, 0.9]\), given the values of \(p_{1,2}\) and \(p_{1,3}\).
The valid ranges for each concordance probability (given the other two) are obtained by considering a multivariate normal random vector \[ \left(\begin{array}{c}Z_1 \\ Z_2\\Z_3 \end{array}\right) \sim N\left\{ \left(\begin{array}{c}0 \\ 0 \\0 \end{array}\right) , \left(\begin{array}{ccc}1 & r_{1,2} & r_{1,3} \\ r_{2,1} & 1 & r_{2,3} \\ r_{3,1} & r_{3,2} & 1 \end{array}\right)\right\}, \] where each correlation parameter \(r_{i,j}\) is obtained from a concordance probability \(p_{i,j}\) via
\[ r_{i,j} = \sin\left(2\pi\left(\frac{p_{i,j}}{2} - 0.25\right)\right) \]
To obtain the range for any concordance probability \(p_{i,j}\) given the other two \(p_{i,k}\), \(p_{j,k}\), we note that \[ Var(Z_i|Z_j, Z_k) = 1 - (r_{i,j} \quad r_{i,k})\left(\begin{array}{cc}1 & r_{j,k} \\ r_{j,k} & 1 \end{array}\right)^{-1}\left(\begin{array}{c} r_{i,j}\\ r_{i,k}\end{array}\right), \] and so, given the values of \(r_{j,k}\) and \(r_{i,k}\), we can solve a quadratic equation to find the limits of \(r_{i,j}\) (and hence \(p_{i,j}\)) such that \(Var(Z_i|Z_j, Z_k)>0\).
To confirm that \(p_{2,3}\) would need to be in the range \([0.5, 0.9]\), we do
conc.prob[2, 3] <- 0.9
theta <- copulaSample(myfit1, myfit2, myfit3,
cp = conc.prob, n = 1000,
d = rep("beta", 3))
and to plot the samples we do
GGally::ggpairs(data.frame(theta),
lower = list(continuous = GGally::wrap(GGally::ggally_points,
color = "red",
alpha = 0.1)),
columnLabels = c("X[1]", "X[2]", "X[3]"),
labeller = ggplot2::label_parsed)
## Registered S3 method overwritten by 'GGally':
## method from
## +.gg ggplot2