Bayesian Networks are directed acyclic graphs used to model beliefs about the world. Abstractly, they represent the chance that a certain effect occurs given a set of possible causes, or the chance that a cause or causes led to an outcome. A Bayesian Network is comprised of variables connected through conditional dependence, which can be thought of as casual relationships. Each variable contains a domain of possible values and a joint probability distribution of the variable taking on specific values given the sets of values for its parents. From this model, one can infer the posterior probability that a set of query variables take on specific values, with or without a set of evidence (i.e. known values for specific variables).
Gibbs Sampling
When the Bayesian network is too complex (i.e., contains many variables and edges), exact inference methods such as Variable Elimination can become computationally inefficient. As an alternative, we will implement a Monte Carlo approach that combines two methods: Gibbs Sampling (GS) and Forward Sampling (FS).
Gibbs Sampling can be described as follows: Given the Markov Blanket of a random variable \(X\) (i.e., \(MB(X) = Pa(X) \cup Ch(X) \cup Pa(Chi(X))\)), \(X\) is independent of the remaining variables of the network and can be sampled independently. Therefore, Algorithm 1 describes the general GS process used to estimate the marginal probability distribution of each variable of the network based on the historic state of the network \(\textbf{x}\).
In each iteration, we shuffle the order of variables an we sample a value for each non-evidence variable based on the probability distribution obtained by looking at the last observed values of it Markov Blanket.
The \(\texttt{domain(X)}\) function retrieves a list of the possible values that the variable \(X\) can take.
The \(\texttt{normalize(v)}\) function normalizes a probability distribution (i.e., a vector), such that the sum of its elements is 1.
Note that Algorithm 1 calls the function \(\texttt{forwardSample}(z)\) (line 4) to obtain a sampled value for variable \(z\) using Forward Sampling. This method follows the previous definition that a variable \(X\) can be sampled based on the “given” (i.e., currently instantiated) variables of its Markov Blanket. Hence, FS first sorts the set of non-evidence variables in topological order using the \texttt{topologicalOrder()} function; that is, for every directed edge \(x \rightarrow y\) of the network, \(x\) has to appear before \(y\) in the ordering.
Algorithm 2 shows the FS implementation. For every variable in the topological ordering, only its parents’ values have been instantiated, according to the previous state of the network $x$. Therefore, we sample a value for the \(i\)-th non-evidence variable from the probability distribution \(P(Z[i]\, | \, x[Pa(z)])\).
Python Implementation
We create the \(\texttt{GibbsSampling}\) class that will be used to instantiate an object that will store some instance variables, such as \(\texttt{self.bn}\) and \(\texttt{self.query}\), and define the methods (both public and private) that will be used for the sampling process. We start with the initialization method and the main method of the class, \(\texttt{solve}\).
Note that the \(\texttt{solve}\) method takes two arguments: the graph used by our Bayesian Network (\(\texttt{bn}\)) and the set of query variables (\(\texttt{query}\)). For the sake of brevity, we are not providing the implementation code of the BN graph. However, \(\texttt{bn}\) is a graph that contains a set of interconnected nodes; each node is an object that represents a variable and has some important properties, such as \(\texttt{domain}\), which specifies the possible values the variable can take; \(\texttt{parents}\), the list of parent nodes; \(\texttt{children}\), the list of children nodes; and \(\texttt{probabilityDist}\), which is a matrix that specifies the conditional probability tables of each variable (see https://www.bnlearn.com/bnrepository/ for more information).
Note that within the \(\texttt{solve}\) function, the first step is to sort the variables of the graph in topological order. That is, we obtain a variable ordering such that every child appears after its parents. In order to do so, we use the function \(\texttt{topologicalSort}\) which uses a recursive helper function \(\texttt{topologicalSortHelper}\):
Now, let’s implement the \(\texttt{forwardSampling}\) method, which is the Forward Sampling method defined in Algorithm 2 above that is used to sample the initial values of each non-evidence variable based on the values of its parents:
Finally, let’s implement the \(\texttt{sample_markov_blanket}\) method used in Algorithm 1, which is used to sample the value of a non-evidence variable based on its Markov Blanket and the values observed in previous iterations of the algorithm, saved in \(\texttt{self.x}\). Note that the first part of the method is similar to the \(\texttt{forwardSampling}\) method; however, the probability distribution of a given variable given its parents is later affected by the observed value of its children and their parents in the second part of the algorithm.
Thanks to Dalton Gomez for providing the Bayesian Networks definition at the beginning of this post.