Towards better visual explanations for deep image classifiers

Understanding and explaining the decisions of neural networks is of great importance, for safe deployment as well as for legal reasons. In this paper, we consider visual explanations for deep image classifiers that are both informative and understandable by humans. Motivated by the recent FullGrad method, we find that bringing information from multiple layers is very effective in producing explanations. Based on this observation, we propose a new method, DeepMaps, that combines information from hidden activities. We show that our method outranks alternative explanations with respect to metrics established in the literature, which are based on pixel perturbations. While these evaluations are based on changes in the class scores, we propose to directly consider the change in the network's decisions. Noting that perturbation-based metrics can fail to distinguish random explanations from sensible ones, we propose to measure the quality of a given explanation by comparing it to explanations for randomly selected other images. We demonstrate through experiments that DeepMaps outperforms existing methods according to the resulting evaluation metrics as well.

Authors' notes