learn_DAG() functionlibrary(BCDAG)This is the second of a series of three vignettes for the R package BCDAG. In this vignette we focus on function learn_DAG(), which implements a Markov Chain Monte Carlo (MCMC) algorithm to sample from the joint posterior of DAG structures and DAG-parameters under the Gaussian assumption.
The underlying Bayesian Gaussian DAG-model can be summarized as follows: \[\begin{eqnarray} X_1, \dots, X_q \,|\,\boldsymbol L, \boldsymbol D, \mathcal{D} &\sim& \mathcal{N}_q\left(\boldsymbol 0, (\boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top)^{-1}\right)\\ (\boldsymbol L, \boldsymbol D)\,|\,\mathcal{D} &\sim& \text{DAG-Wishart}(\boldsymbol{a}_{c}^{\mathcal{D}}, \boldsymbol U) \\ p(\mathcal{D}) &\propto& w^{|\mathcal{S}_\mathcal{D}|}(1-w)^{\frac{q(q-1)}{2} - {|\mathcal{S}_\mathcal{D}|}} \end{eqnarray}\]
In particular \(\mathcal{D}=(V,E)\) denotes a DAG structure with set of nodes \(V=\{1,\dots,q\}\) and set of edges \(E\subseteq V \times V\). Moreover, \((\boldsymbol L, \boldsymbol D)\) are model parameters providing the decomposition of the precision (inverse-covariance) matrix \(\boldsymbol{\Omega} = \boldsymbol{L}\boldsymbol{D}^{-1}\boldsymbol{L}^\top\); specifically, \(\boldsymbol{L}\) is a \((q, q)\) matrix of coefficients such that for each \((u, v)\)-element \(\boldsymbol{L}_{uv}\) with \(u \ne v\), \(\boldsymbol{L}_{uv} \ne 0\) if and only if \((u, v) \in E\), while \(\boldsymbol{L}_{uu} = 1\) for each \(u = 1,\dots, q\); also, \(\boldsymbol{D}\) is a \((q, q)\) diagonal matrix with \((u, u)\)-element \(\boldsymbol{D}_{uu}\).
The latter decomposition follows from the equivalent Structural Equation Model (SEM) representation of a Gaussian DAG-model; see also Castelletti & Mascaro (2021).
Conditionally to \(\mathcal{D}\), a prior to \((\boldsymbol{L}, \boldsymbol{D})\) is assigned through a compatible DAG-Wishart distribution with rate hyperparameter \(\boldsymbol{U}\), a \((q,q)\) s.p.d. matrix, and shape hyperparameter \(\boldsymbol{a}^{\mathcal {D}}_{c}\), a \((q,1)\) vector; see also Cao et al. (2019) and Peluso & Consonni (2020).
Finally, a prior on DAG \(\mathcal{D}\) is assigned through a Binomial distribution on the number of edges in the graph; in \(p(\mathcal{D})\), \(w \in (0,1)\) is a prior probability of edge inclusion, while \(|\mathcal{S_{\mathcal{D}}}|\) denotes the number of edges in \(\mathcal{D}\); see again Castelletti & Mascaro (2021) for further details.
Target of the MCMC scheme is therefore the joint posterior of \((\boldsymbol{L},\boldsymbol{D},\mathcal{D})\), \[\begin{equation} p(\boldsymbol L, \boldsymbol D, \mathcal{D}\,|\, \boldsymbol X) \propto p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})p(\boldsymbol{L},\boldsymbol{D}\,|\,\mathcal{D}) \,p(\mathcal{D}), \end{equation}\] where \(\boldsymbol{X}\) denotes a \((n,q)\) data matrix as obtained through \(n\) i.i.d. draws from the Gaussian DAG-model and \(p(\boldsymbol{X}\,|\,\boldsymbol L, \boldsymbol D, \mathcal{D})\) is the likelihood function. See also Castelletti & Mascaro (2022+) for full details.
We first randomly generate a DAG \(\mathcal{D}\), the DAG parameters \((\boldsymbol{L},\boldsymbol{D})\) and then \(n=1000\) i.i.d. observations from a Gaussian DAG-model as follows:
set.seed(1)
q <- 8
w <- 0.2
DAG <- rDAG(q,w)
a <- q
U <- diag(1,q)
outDL <- rDAGWishart(n=1, DAG, a, U)
L <- outDL$L; D <- outDL$D
Omega <- L %*% solve(D) %*% t(L)
Sigma <- solve(Omega)
n <- 1000
X <- mvtnorm::rmvnorm(n = n, sigma = Sigma)See also our vignette about data generation from Gaussian DAG-models.
learn_DAG()Function learn_DAG() implements an MCMC algorithm to sample from the joint posterior of DAGs and DAG parameters. This is based on a Partial Analytic Structure (PAS) algorithm (Godsill, 2012) which, at each iteration:
fast = TRUE);We implement it as follows:
out <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = FALSE, collapse = FALSE)Inputs of learn_DAG() correspond to three different sets of arguments:
S, burn and data are standard inputs required by any MCMC algorithm. In particular, S defines the desired length of the chain, which is obtained by discarding the first burn observations (the total number of sampled observations is therefore S + burn); data is instead the \((n,q)\) matrix \(\boldsymbol{X}\);a, U and w are hyperparameters of the priors on DAGs (w) and DAG parameters (a, U); see also Equation [REF]. The same appear in functions rDAG() and rDAGWishart() which were introduced in our vignette [ADD REF TO THE VIGNETTE].fast, save.memory and collapse are boolean arguments which allow to: implement an approximate proposal distribution within the MCMC scheme (fast = TRUE); change the array structure of the stored MCMC output into strings in order to save memory (save.memory = TRUE); implement an MCMC for DAG structure learning only, without sampling from the posterior of parameters (collapse = TRUE). See also A note on fast = TRUE and Castelletti & Mascaro (2022+) for full details.The output of learn_DAG() is an object of class bcdag:
class(out)
#> [1] "bcdag"bcdag objects include the output of the MCMC algorithm together with a collection of meta-data representing the input arguments of learn_DAG(); these are stored in the attributes of the object::
str(out)
#> List of 3
#> $ Graphs: num [1:8, 1:8, 1:5000] 0 0 0 0 1 0 0 1 0 0 ...
#> $ L : num [1:8, 1:8, 1:5000] 1 0 0 0 -0.0194 ...
#> $ D : num [1:8, 1:8, 1:5000] 0.164 0 0 0 0 ...
#> - attr(*, "class")= chr "bcdag"
#> - attr(*, "type")= chr "complete"
#> - attr(*, "input")=List of 10
#> ..$ S : num 5000
#> ..$ burn : num 1000
#> ..$ data : num [1:1000, 1:8] -0.299 -0.351 -0.631 0.219 -0.351 ...
#> ..$ a : num 8
#> ..$ U : num [1:8, 1:8] 1 0 0 0 0 0 0 0 0 1 ...
#> ..$ w : num 0.2
#> ..$ fast : logi FALSE
#> ..$ save.memory: logi FALSE
#> ..$ collapse : logi FALSE
#> ..$ verbose : logi TRUEAttribute type refers to the output of learn_DAG(), whose structure depends on the choice of boolean arguments save.memory and collapse.
When both are set equal to FALSE, as in the previous example, the output of learn_DAG() is a complete bcdag object, collecting three \((q,q,S)\) arrays with the DAG structures (in the form of \(q \times q\) adjacency matrices) and the DAG parameters \(\boldsymbol{L}\) and \(\boldsymbol{D}\) (both as \(q \times q\) matrices) sampled by the MCMC:
out$Graphs[,,1]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 1 0 0
#> [3,] 0 0 0 0 0 0 0 1
#> [4,] 0 0 0 0 0 0 0 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 1 0 1 0 0 0 0
#> [8,] 1 0 0 0 0 0 0 0
round(out$L[,,1],2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1.00 0.00 0 0.00 0 0.00 0 0.00
#> [2,] 0.00 1.00 0 0.00 0 0.07 0 0.00
#> [3,] 0.00 0.00 1 0.00 0 0.00 0 0.94
#> [4,] 0.00 0.00 0 1.00 0 0.00 0 0.00
#> [5,] -0.02 0.00 0 0.00 1 0.00 0 0.00
#> [6,] 0.00 0.00 0 0.00 0 1.00 0 0.00
#> [7,] 0.00 -0.02 0 -1.97 0 0.00 1 0.00
#> [8,] 0.40 0.00 0 0.00 0 0.00 0 1.00
round(out$D[,,1],2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0.16 0.00 0.00 0.00 0.00 0.0 0.00 0.00
#> [2,] 0.00 0.57 0.00 0.00 0.00 0.0 0.00 0.00
#> [3,] 0.00 0.00 0.74 0.00 0.00 0.0 0.00 0.00
#> [4,] 0.00 0.00 0.00 0.94 0.00 0.0 0.00 0.00
#> [5,] 0.00 0.00 0.00 0.00 77.94 0.0 0.00 0.00
#> [6,] 0.00 0.00 0.00 0.00 0.00 0.8 0.00 0.00
#> [7,] 0.00 0.00 0.00 0.00 0.00 0.0 0.31 0.00
#> [8,] 0.00 0.00 0.00 0.00 0.00 0.0 0.00 0.96When collapse = TRUE and save.memory = FALSE the output of learn_DAG() is a collapsed bcdag object, consisting of a \((q,q,S)\) array with the adjacency matrices of the DAGs sampled by the MCMC:
collapsed_out <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = FALSE, collapse = TRUE)names(collapsed_out)
#> [1] "Graphs"
class(collapsed_out)
#> [1] "bcdag"
attributes(collapsed_out)$type
#> [1] "collapsed"
collapsed_out$Graphs[,,1]
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 0 0 0 1 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 0 0 0 0 0 0 1
#> [8,] 1 0 1 0 0 0 0 0When save.memory = TRUE and collapse = FALSE, the output is a compressed bcdag object, collecting samples from the joint posterior on DAGs and DAG parameters in the form of a vector of strings:
compressed_out <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = TRUE, collapse = FALSE)names(compressed_out)
#> [1] "Graphs" "L" "D"
class(compressed_out)
#> [1] "bcdag"
attributes(compressed_out)$type
#> [1] "compressed"In such a case, we can access to the MCMC draws as:
compressed_out$Graphs[1]
#> [1] "0;0;0;0;1;0;0;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;1;1;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0;0"
compressed_out$L[1]
#> [1] "1;0;0;0;-0.0207148822282879;0;0;0.389080449183578;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0.464492744238912;0;0;0;1;0;0;-1.76444008583527;-0.020197409100651;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1;0;0;0;0;0;0;0;0;1"
compressed_out$D[1]
#> [1] "0.154822760388322;0;0;0;0;0;0;0;0;0.544421200660809;0;0;0;0;0;0;0;0;0.395261505201729;0;0;0;0;0;0;0;0;1.04755025995276;0;0;0;0;0;0;0;0;67.9926231564594;0;0;0;0;0;0;0;0;0.831649133274509;0;0;0;0;0;0;0;0;0.303043124888535;0;0;0;0;0;0;0;0;1.76018923453861"In addition, we implement bd_decode, an internal function that can be used to visualize the previous objects as matrices:
bd_decode(compressed_out$Graphs[1])
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 0 0 0
#> [3,] 0 0 0 0 0 0 0 0
#> [4,] 0 0 0 0 0 0 0 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 0 0 1 0 0 0 0
#> [8,] 1 0 1 1 0 0 0 0
round(bd_decode(compressed_out$L[1]),2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 1.00 0 0.00 0.00 0 0 0 0
#> [2,] 0.00 1 0.00 0.00 0 0 0 0
#> [3,] 0.00 0 1.00 0.00 0 0 0 0
#> [4,] 0.00 0 0.00 1.00 0 0 0 0
#> [5,] -0.02 0 0.00 0.00 1 0 0 0
#> [6,] 0.00 0 0.00 0.00 0 1 0 0
#> [7,] 0.00 0 0.00 -1.76 0 0 1 0
#> [8,] 0.39 0 0.46 -0.02 0 0 0 1
round(bd_decode(compressed_out$D[1]),2)
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0.15 0.00 0.0 0.00 0.00 0.00 0.0 0.00
#> [2,] 0.00 0.54 0.0 0.00 0.00 0.00 0.0 0.00
#> [3,] 0.00 0.00 0.4 0.00 0.00 0.00 0.0 0.00
#> [4,] 0.00 0.00 0.0 1.05 0.00 0.00 0.0 0.00
#> [5,] 0.00 0.00 0.0 0.00 67.99 0.00 0.0 0.00
#> [6,] 0.00 0.00 0.0 0.00 0.00 0.83 0.0 0.00
#> [7,] 0.00 0.00 0.0 0.00 0.00 0.00 0.3 0.00
#> [8,] 0.00 0.00 0.0 0.00 0.00 0.00 0.0 1.76Finally, if save.memory = TRUE and collapse = TRUE, the output of learn_DAG() is a compressed and collapsed bcdag object collecting only the sampled DAGs represented as vector of strings:
comprcoll_out <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = TRUE, collapse = TRUE)names(comprcoll_out)
#> [1] "Graphs"
class(comprcoll_out)
#> [1] "bcdag"
attributes(comprcoll_out)$type
#> [1] "compressed and collapsed"
bd_decode(comprcoll_out$Graphs[1])
#> [,1] [,2] [,3] [,4] [,5] [,6] [,7] [,8]
#> [1,] 0 0 0 0 0 0 0 0
#> [2,] 0 0 0 0 0 1 0 0
#> [3,] 0 0 0 0 0 0 0 0
#> [4,] 1 0 0 0 0 0 0 0
#> [5,] 1 0 0 0 0 0 0 0
#> [6,] 0 0 0 0 0 0 0 0
#> [7,] 0 0 0 1 0 0 0 0
#> [8,] 1 0 1 0 0 0 0 0fast = TRUEStep 1. of the MCMC scheme implemented by learn_DAG() updates DAG \(\mathcal{D}\) by randomly drawing a new candidate DAG \(\mathcal{D}'\) from a proposal distribution and then accepting it with probability given by the Metropolis Hastings (MH) acceptance rate; see also Castelletti & Mascaro (2021). For a given DAG \(\mathcal{D}\), the proposal distribution \(q(\mathcal{D}'\,|\,\mathcal{D})\) is built over the set \(\mathcal{O}_{\mathcal{D}}\) of direct successors DAGs that can be reached from \(\mathcal{D}\) by inserting, deleting or reversing a single edge in \(\mathcal{D}\). A DAG \(\mathcal{D}'\) is then proposed uniformly from the set \(\mathcal{O}_{\mathcal{D}}\) so that \(q(\mathcal{D}'\,|\,\mathcal{D})=1/|\mathcal{O}_{\mathcal{D}}|\). Moreover, the MH rate requires to evaluate the ratio of proposals \(q(\mathcal{D}'\,|\,\mathcal{D})/q(\mathcal{D}\,|\,\mathcal{D}') = |\mathcal{O}_{\mathcal{D}'}|/|\mathcal{O}_{\mathcal{D}}|\), and accordingly the construction of both \(\mathcal{O}_{\mathcal{D}}\) and \(\mathcal{O}_{\mathcal{D}'}\).
If fast = FALSE, the proposal ratio is computed exactly; this requires the enumerations of \(\mathcal{O}_\mathcal{D}\) and \(\mathcal{O}_{\mathcal{D}'}\) which may become computationally expensive, especially when \(q\) is large. However, the ratio approaches \(1\) as the number of variables \(q\) increases: option fast = TRUE implements such an approximation, which therefore avoids the construction of \(\mathcal{O}_\mathcal{D}\) and \(\mathcal{O}_{\mathcal{D}'}\). A comparison between fast = FALSE and fast = TRUE in the execution of learn_DAG() produces the following results in terms of computational time:
# No approximation
time_nofast <- system.time(out_nofast <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = FALSE, save.memory = FALSE, collapse = FALSE))
# Approximation
time_fast <- system.time(out_fast <- learn_DAG(S = 5000, burn = 1000, data = X,
a, U, w,
fast = TRUE, save.memory = FALSE, collapse = FALSE))time_nofast
#> utente sistema trascorso
#> 16.44 0.00 16.44
time_fast
#> utente sistema trascorso
#> 7.34 0.01 7.36Finally, the corresponding estimated posterior probabilities of edge inclusion are the following:
round(get_edgeprobs(out_nofast), 2)
#> 1 2 3 4 5 6 7 8
#> 1 0.00 0.06 0.07 0.00 0 0.00 0.00 0.00
#> 2 0.05 0.00 0.02 0.03 0 0.21 0.04 0.00
#> 3 0.09 0.02 0.00 0.00 0 0.01 0.02 0.52
#> 4 0.00 0.06 0.00 0.00 0 0.01 0.52 0.00
#> 5 1.00 0.00 0.00 0.00 0 0.00 0.01 0.00
#> 6 0.03 0.26 0.00 0.00 0 0.00 0.02 0.00
#> 7 0.03 0.01 0.00 0.48 0 0.03 0.00 0.02
#> 8 1.00 0.00 0.48 0.01 0 0.00 0.03 0.00
round(get_edgeprobs(out_fast), 2)
#> 1 2 3 4 5 6 7 8
#> 1 0.00 0.04 0.01 0.02 0.00 0.04 0.01 0.00
#> 2 0.05 0.00 0.00 0.03 0.00 0.20 0.01 0.00
#> 3 0.05 0.00 0.00 0.00 0.01 0.01 0.06 0.55
#> 4 0.02 0.03 0.00 0.00 0.00 0.02 0.55 0.01
#> 5 1.00 0.00 0.01 0.00 0.00 0.00 0.01 0.00
#> 6 0.06 0.24 0.01 0.00 0.01 0.00 0.03 0.01
#> 7 0.00 0.02 0.03 0.45 0.00 0.01 0.00 0.00
#> 8 1.00 0.00 0.45 0.02 0.00 0.04 0.00 0.00Ben-David E, Li T, Massam H, Rajaratnam B (2015). “High dimensional Bayesian inference for Gaussian directed acyclic graph models.” arXiv pre-print.
Cao X, Khare K, Ghosh M (2019). “Posterior graph selection and estimation consistency for high-dimensional Bayesian DAG models.” The Annals of Statistics, 47(1), 319–348.
Castelletti F, Consonni G (2021). “Bayesian causal inference in probit graphical models” Bayesian Analysis, In press.
Castelletti F, Mascaro A (2021). “Structural learning and estimation of joint causal effects among network-dependent variables.” Statistical Methods & Applications, 30, 1289–1314.
Castelletti F, Mascaro A (2022). “BCDAG: An R package for Bayesian structural and Causal learning of Gaussian DAGs.” arXiv pre-print.
Godsill, SJ (2012). “On the relationship between Markov chain Monte Carlo methods for model uncertainty.” Journal of computational and graphical statistics, 10(2), 230-248.
Peluso S, Consonni G (2020). “Compatible priors for model selection of high-dimensional Gaussian DAGs.” Electronic Journal of Statistics, 14(2), 4110–4132.