Deep learning-based systems have not been widely adopted in critical areas such as healthcare and criminal justice due to their lack of interpretability. In addition to high performance, interpretability is necessary for obtaining an appropriate level of trust in this kind of system. In this five-part post, we discuss five recent works related to the development of interpretable deep neural networks by design; that is, they incorporate the interpretability objective into the learning process.
The discussed methods are Self-explaining neural networks, ProtoAttend, concept whitening, a framework to learn with interpretation (FLINT), and Entropy-based logic explanations of neural networks.
Their novelty and contributions as well as their potential drawbacks and gaps are presented and analyzed.