Improving deep learning interpretability by saliency guided training | Proceedings of the 35th International Conference on Neural Information Processing Systems (2024)

Advanced Search
Browse
About
- Sign in
- Register

Advanced Search

nips

research-article

Free Access

Authors:
Aya Abdelsalam Ismail Department of Computer Science, University of Marylandm, Data Science and Statistical Computing, Genentech, Inc.

Department of Computer Science, University of Marylandm, Data Science and Statistical Computing, Genentech, Inc.
Search about this author

,
Soheil Feizi Department of Computer Science, University of Maryland, Data Science and Statistical Computing, Genentech, Inc.

Department of Computer Science, University of Maryland, Data Science and Statistical Computing, Genentech, Inc.
Search about this author

,
Héctor Corrada Bravo Department of Computer Science, University of Maryland, Data Science and Statistical Computing, Genentech, Inc.

Department of Computer Science, University of Maryland, Data Science and Statistical Computing, Genentech, Inc.
Search about this author

NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing SystemsDecember 2021Article No.: 2047Pages 26726–26739

Published:10 June 2024Publication History

0citation
0
Downloads

Metrics

NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

Improving deep learning interpretability by saliency guided training

Pages 26726–26739

PreviousChapterNextChapter

ABSTRACT

Saliency methods have been widely used to highlight important input features in model predictions. Most existing methods use backpropagation on a modified gradient function to generate saliency maps. Thus, noisy gradients can result in unfaithful feature attributions. In this paper, we tackle this issue and introduce a saliency guided training procedure for neural networks to reduce noisy gradients used in predictions while retaining the predictive performance of the model. Our saliency guided training procedure iteratively masks features with small and potentially noisy gradients while maximizing the similarity of model outputs for both masked and unmasked inputs. We apply the saliency guided training procedure to various synthetic and real data sets from computer vision, natural language processing, and time series across diverse neural architectures, including Recurrent Neural Networks, Convolutional Networks, and Transformers. Through qualitative and quantitative evaluations, we show that saliency guided training procedure significantly improves model interpretability across various domains while preserving its predictive performance.

Skip Supplemental Material Section

Supplemental Material

Available for Download

pdf

3540261.3542308_supp.pdf (15.3 MB)

Supplemental material.

References

Julius Adebayo, Justin Gilmer, Michael Muelly, Ian Goodfellow, Moritz Hardt, and Been Kim. Sanity checks for saliency maps. In Advances in Neural Information Processing Systems, 2018.Google Scholar
Julius Adebayo, Michael Muelly, Ilaria Liccardi, and Been Kim. Debugging tests for model explanations. arXiv preprint arXiv:2011.05429, 2020.Google Scholar
Marco Ancona, Enea Ceolini, Cengiz Öztireli, and Markus Gross. Towards better understanding of gradient-based attribution methods for deep neural networks. International Conference on Learning Representations, 2018.Google Scholar
Jimmy Ba and Rich Caruana. Do deep nets really need to be deep? In Advances in Neural Information Processing Systems, 2014.Google ScholarDigital Library
Sebastian Bach, Alexander Binder, Grégoire Montavon, Frederick Klauschen, Klaus-Robert Müller, and Wojciech Samek. On pixel-wise explanations for non-linear classifier decisions by layer-wise relevance propagation. In PLoS ONE, 2015.Google ScholarCross Ref
David Baehrens, Timon Schroeter, Stefan Harmeling, Motoaki Kawanabe, Katja Hansen, and Klaus-Robert MÃžller. How to explain individual classification decisions. In Journal of Machine Learning Research, 2010.Google ScholarDigital Library
Shaojie Bai, J Zico Kolter, and Vladlen Koltun. An empirical evaluation of generic convolutional and recurrent networks for sequence modeling. arXiv preprint arXiv:1803.01271, 2018.Google Scholar
Oana-Maria Camburu, Tim Rocktäschel, Thomas Lukasiewicz, and Phil Blunsom. e-snli: Natural language inference with natural language explanations. arXiv preprint arXiv:1812.01193, 2018.Google Scholar
Rich Caruana, Yin Lou, Johannes Gehrke, Paul Koch, Marc Sturm, and Noemie Elhadad. Intelligible models for healthcare: Predicting pneumonia risk and hospital 30-day readmission. In International conference on knowledge discovery and data mining, 2015.Google ScholarDigital Library
Terrance DeVries and Graham W Taylor. Improved regularization of convolutional neural networks with cutout. arXiv preprint arXiv:1708.04552, 2017.Google Scholar
Jay DeYoung, Sarthak Jain, Nazneen Fatema Rajani, Eric Lehman, Caiming Xiong, Richard Socher, and Byron C Wallace. Eraser: A benchmark to evaluate rationalized nlp models. arXiv preprint arXiv:1911.03429, 2019.Google Scholar
Nicholas Frosst and Geoffrey Hinton. Distilling a neural network into a soft decision tree. arXiv preprint arXiv:1711.09784, 2017.Google Scholar
Gerry. 265 bird species, 2021. URL https://www.kaggle.com/gpiosenka/100-bird-species/.Google Scholar
Reza Ghaeini, Xiaoli Z Fern, Hamed Shahbazi, and Prasad Tadepalli. Saliency learning: Teaching the model where to pay attention. arXiv preprint arXiv:1902.08649, 2019.Google Scholar
Amirata Ghorbani, Abubakar Abid, and James Zou. Interpretation of neural networks is fragile. In AAAI Conference on Artificial Intelligence, 2019.Google ScholarDigital Library
Kaiming He, Xiangyu Zhang, Shaoqing Ren, and Jian Sun. Deep residual learning for image recognition. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.Google ScholarCross Ref
Sepp Hochreiter and Jürgen Schmidhuber. Long short-term memory. In Neural computation, 1997.Google ScholarDigital Library
Sara Hooker, Dumitru Erhan, Pieter-Jan Kindermans, and Been Kim. A benchmark for interpretability methods in deep neural networks. In Advances in Neural Information Processing Systems, 2019.Google Scholar
Qibin Hou, Peng-Tao Jiang, Yunchao Wei, and Ming-Ming Cheng. Self-erasing network for integral object attention. arXiv preprint arXiv:1810.09821, 2018.Google Scholar
Aya Abdelsalam Ismail, Mohamed Gunady, Luiz Pessoa, Hector Corrada Bravo, and Soheil Feizi. Input-cell attention reduces vanishing saliency of recurrent neural networks. In Advances in Neural Information Processing Systems, 2019.Google Scholar
Aya Abdelsalam Ismail, Mohamed Gunady, Héctor Corrada Bravo, and Soheil Feizi. Benchmarking deep learning interpretability in time series predictions. arXiv preprint arXiv:2010.13924, 2020.Google Scholar
Pieter-Jan Kindermans, Kristof Schütt, Klaus-Robert Müller, and Sven Dähne. Investigating the influence of noise and distractors on the interpretation of neural networks. arXiv preprint arXiv:1611.07270, 2016.Google Scholar
Pieter-Jan Kindermans, Kristof T Schütt, Maximilian Alber, Klaus-Robert Müller, Dumitru Erhan, Been Kim, and Sven Dähne. Learning how to explain neural networks: Patternnet and patternattribution. arXiv preprint arXiv:1705.05598, 2017.Google Scholar
Pieter-Jan Kindermans, Sara Hooker, Julius Adebayo, Maximilian Alber, Kristof T Schütt, Sven Dähne, Dumitru Erhan, and Been Kim. The (un) reliability of saliency methods. In Explainable AI: Interpreting, Explaining and Visualizing Deep Learning, 2019.Google Scholar
Alex Krizhevsky, Geoffrey Hinton, et al. Learning multiple layers of features from tiny images. Citeseer, 2009.Google Scholar
Alex Krizhevsky, Ilya Sutskever, and Geoffrey E Hinton. Imagenet classification with deep convolutional neural networks. Advances in neural information processing systems, 2012.Google ScholarDigital Library
Solomon Kullback and Richard A Leibler. On information and sufficiency. The annals of mathematical statistics, 1951.Google Scholar
Colin Lea, Michael Flynn, Rene Vidal, Austin Reiter, and Gregory Hager. Temporal convolu-tional networks for action segmentation and detection. In Conference on Computer Vision and Pattern Recognition, 2017.Google ScholarCross Ref
Yann LeCun, Léon Bottou, Yoshua Bengio, and Patrick Haffner. Gradient-based learning applied to document recognition. Proceedings of the IEEE, 1998.Google ScholarCross Ref
Yann LeCun, Corinna Cortes, and CJ Burges. Mnist handwritten digit database, 2010.Google Scholar
Yann LeCun, Yoshua Bengio, and Geoffrey Hinton. Deep learning. nature, 2015.Google Scholar
Alexander Levine, Sahil Singla, and Soheil Feizi. Certifiably robust interpretation in deep learning. arXiv preprint arXiv:1905.12105, 2019.Google Scholar
Kunpeng Li, Ziyan Wu, Kuan-Chuan Peng, Jan Ernst, and Yun Fu. Tell me where to look: Guided attention inference network. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition, 2018.Google ScholarCross Ref
Zachary C Lipton. The mythos of model interpretability. In Queue, 2018.Google ScholarDigital Library
Yin Lou, Rich Caruana, and Johannes Gehrke. Intelligible models for classification and regression. In Proceedings of the 18th ACM SIGKDD international conference on Knowledge discovery and data mining, 2012.Google ScholarDigital Library
Scott M Lundberg and Su-In Lee. A unified approach to interpreting model predictions. In Advances in Neural Information Processing Systems, 2017.Google Scholar
Ziad Obermeyer and Ezekiel J Emanuel. Predicting the future—big data, machine learning, and clinical medicine. In The New England journal of medicine, 2016.Google ScholarCross Ref
Aaron van den Oord, Sander Dieleman, Heiga Zen, Karen Simonyan, Oriol Vinyals, Alex Graves, Nal Kalchbrenner, Andrew Senior, and Koray Kavukcuoglu. Wavenet: A generative model for raw audio. arXiv preprint arXiv:1609.03499, 2016.Google Scholar
Jeffrey Pennington, Richard Socher, and Christopher D Manning. Glove: Global vectors for word representation. In Proceedings of the 2014 conference on empirical methods in natural language processing (EMNLP), 2014.Google ScholarCross Ref
Simon Perkins, Kevin Lacker, and James Theiler. Grafting: Fast, incremental feature selection by gradient descent in function space. The Journal of Machine Learning Research, 2003.Google Scholar
Vitali Petsiuk, Abir Das, and Kate Saenko. Rise: Randomized input sampling for explanation of black-box models. arXiv preprint arXiv:1806.07421, 2018.Google Scholar
Marco Tulio Ribeiro, Sameer Singh, and Carlos Guestrin. " why should i trust you?" explaining the predictions of any classifier. In Proceedings of the 22nd ACM SIGKDD international conference on knowledge discovery and data mining, 2016.Google ScholarDigital Library
Michael L Rich. Machine learning, automated suspicion algorithms, and the fourth amendment. In University of Pennsylvania Law Review, 2016.Google Scholar
Andrew Slavin Ross, Michael C Hughes, and Finale Doshi-Velez. Right for the right reasons: Training differentiable models by constraining their explanations. arXiv preprint arXiv:1703.03717, 2017.Google Scholar
Wojciech Samek, Alexander Binder, Grégoire Montavon, Sebastian Lapuschkin, and Klaus-Robert Müller. Evaluating the visualization of what a deep neural network has learned. IEEE transactions on neural networks and learning systems, 2016.Google Scholar
Ramprasaath R Selvaraju, Michael Cogswell, Abhishek Das, Ramakrishna Vedantam, Devi Parikh, and Dhruv Batra. Grad-cam: Visual explanations from deep networks via gradient-based localization. In Proceedings of the IEEE international conference on computer vision, 2017.Google ScholarCross Ref
Avanti Shrikumar, Peyton Greenside, and Anshul Kundaje. Learning important features through propagating activation differences. In International Conference on Machine Learning, 2017.Google ScholarDigital Library
Karen Simonyan and Andrew Zisserman. Very deep convolutional networks for large-scale image recognition. arXiv preprint arXiv:1409.1556, 2014.Google Scholar
Karen Simonyan, Andrea Vedaldi, and Andrew Zisserman. Deep inside convolutional networks: Visualising image classification models and saliency maps. CoRR, 2013.Google Scholar
Krishna Kumar Singh and Yong Jae Lee. Hide-and-seek: Forcing a network to be meticulous for weakly-supervised object and action localization. In 2017 IEEE international conference on computer vision (ICCV). IEEE, 2017.Google ScholarCross Ref
Sahil Singla, Eric Wallace, Shi Feng, and Soheil Feizi. Understanding impacts of high-order loss approximations and features in deep learning interpretation. In International Conference on Machine Learning. PMLR, 2019.Google Scholar
Daniel Smilkov, Nikhil Thorat, Been Kim, Fernanda Viégas, and Martin Wattenberg. Smoothgrad: removing noise by adding noise. arXiv preprint arXiv:1706.03825, 2017.Google Scholar
Mukund Sundararajan, Ankur Taly, and Qiqi Yan. Axiomatic attribution for deep networks. In International Conference on Machine Learning, 2017.Google Scholar
Harini Suresh, Nathan Hunt, Alistair Johnson, Leo Anthony Celi, Peter Szolovits, and Marzyeh Ghassemi. Clinical intervention prediction and understanding using deep networks. arXiv preprint arXiv:1705.08498, 2017.Google Scholar
James Thorne, Andreas Vlachos, Christos Christodoulopoulos, and Arpit Mittal. Fever: a large-scale dataset for fact extraction and verification. arXiv preprint arXiv:1803.05355, 2018.Google Scholar
Richard Tomsett, Dan Harborne, Supriyo Chakraborty, Prudhvi Gurram, and Alun Preece. Sanity checks for saliency metrics. In Proceedings of the AAAI conference on artificial intelligence, 2020.Google ScholarCross Ref
Sana Tonekaboni, Shalmali Joshi, Kieran Campbell, David Duvenaud, and Anna Goldenberg. What went wrong and when? instance-wise feature importance for time-series models. arXiv preprint arXiv:2003.02821, 2020.Google Scholar
Ashish Vaswani, Noam Shazeer, Niki Parmar, Jakob Uszkoreit, Llion Jones, Aidan N Gomez, Łukasz Kaiser, and Illia Polosukhin. Attention is all you need. In Advances in Neural Information Processing Systems, 2017.Google ScholarDigital Library
Lezi Wang, Ziyan Wu, Srikrishna Karanam, Kuan-Chuan Peng, Rajat Vikram Singh, Bo Liu, and Dimitris N Metaxas. Sharpen focus: Learning with attention separability and consistency. In Proceedings of the IEEE/CVF International Conference on Computer Vision, 2019.Google Scholar
Yunchao Wei, Jiashi Feng, Xiaodan Liang, Ming-Ming Cheng, Yao Zhao, and Shuicheng Yan. Object region mining with adversarial erasing: A simple classification to semantic segmentation approach. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2017.Google ScholarCross Ref
Mike Wu, Michael C Hughes, Sonali Parbhoo, Maurizio Zazzi, Volker Roth, and Finale Doshi-Velez. Beyond sparsity: Tree regularization of deep models for interpretability. In AAAI Conference on Artificial Intelligence, 2018.Google Scholar
Omar Zaidan and Jason Eisner. Modeling annotators: A generative approach to learning from annotator rationales. In Proceedings of the 2008 conference on Empirical methods in natural language processing, 2008.Google ScholarCross Ref
Matthew D Zeiler and Rob Fergus. Visualizing and understanding convolutional networks. In European conference on computer vision, 2014.Google ScholarCross Ref
Bolei Zhou, Aditya Khosla, Agata Lapedriza, Aude Oliva, and Antonio Torralba. Learning deep features for discriminative localization. In Proceedings of the IEEE conference on computer vision and pattern recognition, 2016.Google ScholarCross Ref

Cited By

View all

Recommendations

A depth perception and visual comfort guided computational model for stereoscopic 3D visual saliency
With the emerging development of three-dimensional (3D) related technologies, 3D visual saliency modeling is becoming particularly important and challenging. This paper presents a new depth perception and visual comfort guided saliency computational ...
Read More
Co-Saliency Detection Guided by Group Weakly Supervised Learning
The detection results of many existing co-saliency detection methods are easily interfered by the unrelated salient objects, which have similar appearance characteristics to co-salient objects. Therefore, mining the inter-saliency cues which contain the ...
Read More
Visual saliency guided video compression algorithm
Recently Saliency maps from input images are used to detect interesting regions in images/videos and focus on processing these salient regions. This paper introduces a novel, macroblock level visual saliency guided video compression algorithm. This is ...
Read More

Login options

Check if you have access through your login credentials or your institution to get full access on this article.

Full Access

Get this Publication

Information
Contributors

Published in
NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems
December 2021
30517 pages
ISBN:9781713845393
Editors:
M. Ranzato,
A. Beygelzimer,
Y. Dauphin,
P.S. Liang,
J. Wortman Vaughan
Copyright © 2021 Neural Information Processing Systems Foundation, Inc.
Sponsors
In-Cooperation
Publisher
Curran Associates Inc.
Red Hook, NY, United States
Publication History
- Published: 10 June 2024
Qualifiers
- research-article
- Research
- Refereed limited
Conference
Funding Sources
Other Metrics
View Article Metrics

Bibliometrics
Citations0

Article Metrics
- Total Citations
  View Citations
- Total Downloads
- Downloads (Last 12 months)0
- Downloads (Last 6 weeks)0
Other Metrics
View Author Metrics
Cited By
This publication has not been cited yet

Improving deep learning interpretability by saliency guided training | Proceedings of the 35th International Conference on Neural Information Processing Systems (2024)

New Citation Alert added!

New Citation Alert!

NIPS '21: Proceedings of the 35th International Conference on Neural Information Processing Systems

ABSTRACT

Supplemental Material

Available for Download

References

Cited By

Recommendations

Login options

Full Access

Published in

Sponsors

In-Cooperation

Publisher

Publication History

Qualifiers

Conference

Funding Sources

Other Metrics

Article Metrics

Other Metrics

Cited By

Digital Edition

Caption

Export Citations