Interpretable Deep Neural Networks Part 5: Logic Explanations

This is the last post of our review on interpretable deep neural networks by design. Previously discussed concept-based methods focused on ranking the most relevant basis concepts to provide a reduced subset of concepts as interpretations. Barbiero et al.[1] addressed the problem that previous approaches fail to provide concise and formal explanations of how the top-ranked concepts are leveraged by the models to make predictions. Therefore, they proposed an entropy-based criterion to generate logical explanations from neural networks.

Methodology

Consider the network \(g: \, \mathcal{X} \rightarrow \mathcal{C}\) that maps the input data into the concept space \(\mathcal{C}\) (i.e., a encoded space of latent representations). This paper is set in a classification context; then, for each class \(i\) of the problem, it is proposed to use an independent entropy-based layer \(f^i\). The outcomes of each layer are a set of embeddings \(h^i\) and a truth table \(\mathcal{T}^i\) that is used to explain how the network leveraged concepts to make predictions for the $i$-th class.

The entropy-based layer \(f^i\) is a linear layer with a matrix of learnable parameters \(W^i\) and bias vector \(b^i\). Similar to FLINT, the relevance score of concept \(j\) for class \(i\) is denoted as \(\gamma^i_j = | W^i_j |\), where \(W^i_j\) represents the vector of weights departing from the \(j\)-th input of \(f^i\). Then the relative importance of each concept is summarized in the categorical distribution \(\boldsymbol\alpha^i =\{ \alpha^i_1, \dots, \alpha^i_C \}\) (\(\sum^C_{j=1} \alpha^i_j =1\) and \(\alpha^i_j \in [0, 1]\)), which is modeled by the softmax function: \(\alpha^i_j = \frac{e^{\gamma^i_j / \tau}}{\sum_{l=1}^C e^{\gamma^i_l / \tau}},\) where \(\tau \in \mathbb{R}^+\) represents the temperature parameter used to tune the softmax function. Then \(\boldsymbol\alpha^i\) is re-scaled to \(\tilde{\boldsymbol\alpha}^i=\{ \tilde{\alpha}^i_1, \dots, \tilde{\alpha}^i_C \}\) so that its maximum value is 1 and its minimum value is 0.

The inputs of the layer \(f^i\), \(\textbf{c}\), are then weighted by the estimated normalized importance \(\tilde{\alpha}^i_j\) using the element-wise multiplication: \(\tilde{\textbf{c}}^i =\textbf{c} \odot \tilde{\boldsymbol\alpha}^i\).
This is used to compute the embeddings \(h^i\) as: \(h^i = W^i \tilde{\textbf{c}}^i + b^i.\) It was not mentioned in the paper explicitly, but \(h^i\) is the prediction output of the network, which is later referenced as \(f^i(\textbf{c})\). The loss function used to learn the parameters of \(f^i\) is: \(\mathcal{L} = L(f, y) + \lambda \sum_{i=1}^r H(\boldsymbol\alpha^i),\) where \(L(f, y)\) is a usual supervised learning function (e.g., cross-entropy), \(H(\cdot)\) is the entropy function, and \(\lambda\) is a hyperparameter that balances the relative importance of low-entropy solutions.

Fig. 1. Detailed view of the entropy-based layer corresponding to class "1" [1].

Furthermore, the truth table \(\mathcal{T}^i\) is generated by $f^i$ to represent its behavior in Boolean-like representations of the input concepts. To do this, a Boolean representation of input \(\textbf{c}\) is obtained using a threshold \(\epsilon\): \(\bar{\textbf{c}} = \mathbb{I}_{\textbf{c} \geq \epsilon}\). Also, a mask \(\boldsymbol\mu^i\) is used to point the most relevant concepts: \(\boldsymbol\mu^i = \mathbb{I}_{\tilde{\boldsymbol\alpha}^i \geq \epsilon}\). Then, the binary concept tuple \(\hat{\textbf{c}}^i\) consists of the components of \(\bar{\textbf{c}}\) that correspond to 1’s in \(\boldsymbol\mu^i\). Given a dataset of generated concepts \(\textbf{C}\), the truth table \(\mathcal{T}^i\) is obtained by stacking the \(\hat{\textbf{c}}^i\) vector and \(\bar{f}^i(\textbf{c})\), the binary representation of the network output \(f^i(\textbf{c})\) (\(\bar{f}^i(\textbf{c})=\mathbb{I}_{f^i(\textbf{c}) > \epsilon}\)) of each sample in \(\textbf{C}\).

The truth table \(\mathcal{T}^i\) is transformed into a logic formula in disjunctive normal form (DNF). The \(t\)-th row in \(\mathcal{T}^i\) is converted into a logic formula \(/\varphi^i_t\) connecting with the logic AND (\(\land\)) the true concepts and negated instances of the false ones. The row-level logic formulas are then combined with the logic OR (\(\lor\)) to provide a class-level formula: \(\bigvee_{t \in \mathcal{T}^i} \varphi^i_t\). Finally, the extended class-level formula can be aggregated using existing techniques such as the Quine-McCluskey algorithm [2,3].

Experimental Results

Experiments considered four datasets: MNIST, Caltech-UCSD Birds 200 (CUB), multiparameter intelligent monitoring in intensive care II (MIMIC-II), and varieties of democracy (V-Dem). MNIST is used to classify if the number within a given image is even or odd. To do this, a CNN is used to map the original image space into a concept space that consists of the ten possible classes that correspond to each character from “1” to “10”. Similarly, another CNN is used to transform the images of the CUB dataset into a space of 312 binary attributes (e.g., “blackWing” and “redPeak”) which are considered as concepts that will be then used to classify each image into one of the 200 available bird classes.

The data from MIMIC-II is used to determine if a patient in the intensive care unit will recover or not. In this case, the binary inputs are considered interpretable concepts so there is no need for the use of an encoder network \(g\). Finally, the data in V-Dem consists of 482 attributes that are transformed into a concept space with 82 attributes using an entropy-based network. Then, these features are used to discern electoral democracies from non-electoral ones using another entropy-based network. Fig. 2 shows the type of logical explanations that are expected for these datasets.

Fig. 2. Ideal examples of the logical explanations expected for the four studied cases [1].

Critique

We consider that the main contribution of the work presented by Barbiero et al. is the method used to generate concept-based logical explanations at a sample level. Results presented for three of the tested datasets consisted of short DNF formulas that can be easily interpreted by a human. This process could also be applied to methods that, unlike this one, do not rely on the \textit{a priori} availability of interpretable concepts, such as SENNs or FLINT. However, this contribution is independent of that of the formulation of entropy-based networks. For example, FLINT also minimizes the entropy of the ``attribute functions” (i.e. activations of basis concepts) in order to impose conciseness of interpretations while considering diversity as an additional desired property simultaneously. In that sense, we discuss a couple of limitations of entropy-based networks below.

It is assumed that \(W^i_j\) can be used to estimate the relevance that the \(j\)-th concept has for the classification of the \(i\)-th class. However, \(W^i_j\) is not a direct mapping from input $\textbf{c}$ (which represents the encoded concepts in the latent space) to \(h\), it is actually a mapping from the vector \(\tilde{\textbf{c}}^i\) to \(h\). Note that \(\tilde{\textbf{c}}^i\) is obtained by the element-wise multiplication \(\tilde{\textbf{c}}^i = \textbf{c} \odot \tilde{\boldsymbol\alpha}^i\). Now suppose that at some point at the beginning of the training (i.e., when the matrix $W^i$ does not contain reliable information yet) the vector \(\tilde{\boldsymbol\alpha}^i\) prunes away some of the concepts. Nevertheless, \(\tilde{\boldsymbol\alpha}^i\) is directly dependent on the weights \(W^i_j\) calculated in a previous iteration. In other words, some concepts may have been removed from further consideration based on the weights of a matrix that was not already trained. This opens the question of how much this methodology relies on weight initialization. A possible turnaround would be to train the network \(f^i\) until the weights achieve a certain degree of stabilization before starting to calculate vectors \(\tilde{\boldsymbol\alpha}^i\).

Another aspect that may require further explanation is the use of threshold \(\epsilon\). In particular, variables \(\textbf{c}\), \(\tilde{\boldsymbol\alpha}^i\), and \(f(\textbf{c})\) are converted into Boolean variables by comparison to a threshold $\epsilon$, whose value is set to 0.5 by default. However, there is no formal justification for these three variables to be affected by the same threshold. For instance, the vector \(\tilde{\boldsymbol\alpha}^i\) is very likely to be extremely sparse due to the fact that one of the minimization objectives is to reduce the entropy of vector \(\boldsymbol\alpha^i\), which is later normalized into \(\tilde{\boldsymbol\alpha}^i\) using the softmax function. As a consequence, it is logical to expect that only one element of such a sparse vector as \(\tilde{\boldsymbol\alpha}^i\) will be greater than 0.5. On the other hand, there is no condition for the input vector \(\textbf{c}\) to be as sparse as \(\tilde{\boldsymbol\alpha}^i\). Thus, if both variables are expected to behave differently, there is no reason for binarizing them using the same threshold value. The values in \(\textbf{c}\) could be simply scaled between 0 and 1 (as is the case of the ICU dataset) so that binarizing it does not necessarily mean that the most important concepts are being considered. In fact, it seems unnecessary to binarize \(\textbf{c}\) when the objective of \(\tilde{\boldsymbol\alpha}^i\) is to be multiplied by \(\textbf{c}\) to obtain \(\tilde{\textbf{c}}^i\) and prune the irrelevant concepts away. In any case, \(\tilde{\textbf{c}}^i\) is the variable that should be binarized to create the truth table.

Conclusions

Deep neural networks are highly-complex black-box models that have become the state-of-the-art in a wide range of tasks, outperforming classic machine learning techniques. Unfortunately, their increased complexity is what prevents humans from a direct understanding of the reasons why they make certain decisions. Therefore, the application of DNNs in critical areas is limited and, thus, the need for reliable interpretable DNNs is urgent.

The five works discussed in this manuscript are examples of the latest attempts to “open the black box” of DNNs. They provided a set of interpretability principles and offered new paradigms tailored for this new type of problem (e.g., supervised learning with interpretation). As representative XAI works, we found a few issues such as limited applicability, poor theoretical justification, and arguably interpretable results. Nevertheless, the discussed methods show room for improvement and offer further research opportunities. We also noted that reliability of interpretations would improve if they were accompanied by a sort of uncertainty quantification, which is a possible future research direction.

References

  1. P. Barbiero, G. Ciravegna, F. Giannini, P. Li ́o, M. Gori, and S. Melacci, “Entropy-based logic explanations of neural networks,” Proceedings of the AAAI Conference on Artificial Intelligence, vol. 36, no. 6, pp. 6046–6054, Jun. 2022.
  2. E. J. McCluskey Jr., “Minimization of boolean functions*,” Bell System Technical Journal, vol. 35, no. 6, pp. 1417–1444, 1956.
  3. T. K. Jain, D. S. Kushwaha, and A. K. Misra, “Optimization of the quine-mccluskey method for the minimization of the boolean expressions,” in Fourth International Conference on Autonomic and Autonomous Systems (ICAS’08), 2008, pp. 165–168.
Written on October 12, 2023