Interpretable Deep Neural Networks: Open Questions

The five works discussed in the previous posts summarized the current state of the research on interpretable neural networks by design. Thus, the analysis of their limitations unveils four open research questions, which we discuss in this new post.

Multicollinear Basis Concepts

Interpretability methods have been widely used for variable importance (VI) estimation. Two of the most common methods for this type of task are LIME and SHAP, which provide relevance scores that indicate the impact each input feature has on a given decision of the model. Similarly, SENNs, FLINT, and entropy-based networks (LEN) calculate relative relevance scores for each basis concept generated by the network. All of these approaches attempt to interpret the output of the model as a linear combination of input features or basis concepts weighted by the estimated relevance scores. Therefore, the limitation of these approaches is the assumption of mutual independence among input features or basis concepts.

Take the case of hyperspectral band selection, where the task is to find the best combination of spectral bands given a desired number of bands \(k\). Previous experiments showed that selecting the top-\(k\) spectral bands based on individual relevance scores (e.g., entropy) yielded poor classification results [1]. One of the reasons was the presence of multicollinearity among the top-\(k\) bands (i.e., three or more bands were highly correlated). In other words, top-\(k\) bands were redundant and failed as descriptors of the target concept that we aimed to learn (e.g., herbicide-resistance classification). This is relevant to our discussion because, similar to the band selection problem, the interpretability methods above attempt to generate concise interpretations based on a reduced set of the most activated concepts (see Fig. 1).

FLINT imposes the conciseness and diversity principles by minimizing the entropy at the concept and sample levels. However, this does not assure avoiding generating correlated concepts. Therefore, How to avoid learning multicollinear basis concepts and How to select a reduced set of the most relevant concepts for interpretation are open research questions.

Interpretable Basis Concepts

Some interpretability methods, such as FLINT, learn to generate basis concepts by transforming the input data into an encoded representation in a latent space. This is an advantage for the case where the concepts of interest cannot be identified a priori or there are no accessible datasets from which we can learn to discern them. However, it may be challenging to interpret the encoded information directly.

Even though FLINT imposed interpretability properties such as fidelity to output, fidelity to input, and conciseness and diversity, the concepts that were generated were arguably interpretable.
For example, Fig. 1 and Fig. 2 show that the generated concepts (or attributes) present some artifacts near the edge of the image that do not have an interpretable meaning. This opens the question of How to effectively impose interpretability of automatically generated basis concepts, which is related to the one of How to quantify interpretability of basis concepts.

Fig. 1. (Left) Local interpretations for test samples. (Right) Examples of attribute functions detecting the same part across various test samples [2].

Fig. 2. Example class-attribute pair analysis [2].

Uncertainty Quantification

The methods we discussed in this manuscript provide interpretations that consist of a set of basis concepts or prototypes and their corresponding relevance scores. The exception is the entropy-based networks described here, which transform the set of concepts and scores into a logical DNF formula. However, one way to improve the reliability and credibility of automatically generated interpretations is to provide uncertainty quantification along with them.

To the best of our knowledge, no method has been proposed to merge interpretability and uncertainty quantification. Thus, How to quantify uncertainty of automatically generated interpretations is an open research question. We hypothesize that it is possible to train NNs that generate interpretations and prediction intervals for the relevance scores jointly. Thus, these prediction intervals would consist of an estimate of the upper and the lower bound of the true relevance scores. It is also possible to incorporate an open-set recognition paradigm [3] that would not only generate prediction intervals but also answers such as “I do not know” when given a test sample that is considerably different from the training samples (i.e., when given out-of-distribution samples).

The open research question above has to do with how uncertainty quantification aids interpretability. Another research opportunity is the study of how interpretability aids uncertainty quantification. This opens the question of How to train interpretable DNNs to explain how different features contribute to the uncertainty levels on the output.

This is a research area that has not received a lot of attention yet. For example, Brown and Talbert [4] used MC-Dropout [5] (which casts dropout training in DNNs as approximate Bayesian inference in deep Gaussian processes) to quantify model uncertainty in NNs and combined it with LIME to estimate the contribution of each input feature on the model uncertainty. In this context, model uncertainty arises due to model selection, training data variance, and parameter uncertainty [6]. Future work could also address interpretability of data noise variance, which measures the variance of the error between observable target values and the outputs produced by the learned models.

References

G. Morales, J. W. Sheppard, R. D. Logan, and J. A. Shaw, “Hyperspectral dimensionality reduction based on inter-band redundancy analysis and greedy spectral selection,” Remote Sensing, vol. 13, no. 18, 2021.
J. Parekh, P. Mozharovskyi, and F. d’Alche-Buc, “A framework to learn with interpretation,” in Advances in Neural Information Processing Systems, M. Ranzato, A. Beygelzimer, Y. Dauphin, P. Liang, and J. W. Vaughan, Eds., vol. 34, 2021, pp. 24 273–24 285.
C. Geng, S.-J. Huang, and S. Chen, “Recent advances in open set recognition: A survey,” IEEE Transactions on Pattern Analysis and Machine Intelligence, vol. 43, no. 10, p. 3614—3631, October 2021.
K. E. Brown and D. A. Talbert, “Using explainable AI to measure feature contribution to uncertainty,” The International FLAIRS Conference Proceedings, vol. 35, May 2022.
Y. Gal and Z. Ghahramani, “Dropout as a Bayesian approximation: Representing model uncertainty in deep learning,” in Proceedings of The 33rd International Conference on Machine Learning, 20–22 Jun 2016, pp. 1050–1059.
T. Pearce, A. Brintrup, M. Zaki, and A. Neely, “High-quality prediction intervals for deep learning: A distribution-free, ensembled approach,” in Proceedings of the 35th International Conference on Machine Learning, 2018, pp. 4072–4081.

Written on October 12, 2023