We analyze several established methods that are well defined for network architectures utilizing rectified linear units (ReLUs [43]) and do not require retraining the model under study. These methods fall into two broad categories. The first category of saliency methods evaluates the effect of small perturbations of the input on the output. These methods rely on the gradient of the DNN’s output with respect to its input, which can be computed efficiently through a method called back-propagation—the iterative calculation of the gradient, layer by layer from the network’s output to its input, avoiding redundant terms from the naive application of the chain rule [44,45]. We select two gradient-based methods whose interpretation for linear models is straightforward: (1) Gradient: Computes the gradient of the output neurons with respect to the values of the input pixels, SðxÞ ¼ ∂M ∂x . This measures the sensitivity of the output to the input, and for a linear model is equivalent to the regression coefficients. (2) Input × gradient: Computes the element-wise product of the input and the gradient of the output with respect to the input pixels, SðxÞ ¼ x ⊙ ∂M ∂x . For a linear model, it measures the contribution of the pixel to the output. Other gradient-based methods exist, such as Smoothgrad [23] or Integrated gradients [46], but we did not study these methods due to their significantly higher computational cost. We inspected their effect on a small subset of input maps, and the results are qualitatively very similar to those of the Gradient and Input× gradient methods.
The second category of saliency methods tries to distribute the network’s output among the neurons of the second-to-last layer. The amount allocated to each neuron, interpreted as a relevance measure, is propagated iteratively through the network, back to the input space. We select the following propagation-based methods: (1) Guided back-propagation: Masks out negative gradients and negative activations when backpropagating the gradient of the output with respect to the input [18]. (2) Deconvnet: Uses a deconvolution network [47], M−1, built on top of the DNN architecture. To compute the saliency map corresponding to the input x, the feature maps ffi g, for each layer i in the model M, are fed as inputs to the deconvolution network’s layers. At each stage of the propagation through M−1, intermediate representations are unpooled, rectified, and filtered, until pixel space is reached [48]. (3) Deep Taylor decomposition: Distributes the relevance of neurons among its preceding layer by approximating the layer’s function with a first-order Taylor expansion [20]. (4) Layer-wise relevance propagation (LRP): Distributes the relevance of neurons among its preceding layer, taking into consideration the weights that define the layer. We consider two different rules that are common in the literature. The first one, LRP-ϵ, uses a rule to propagate the relevance: Ri ¼ Pj aiwij ϵþ Pi aiwij Rj, where ai is the activation of neuron i, wij the weight connecting neuron j to neuron i, the relevances R are the layer’s output, and ϵ absorbs weak or contradictory contributions to the activations. For ReLU-based networks [49], ϵ ¼ 0 renders this method equivalent to Intput × gradient. We choose ϵ ¼ 10−3, because larger values result in saliency maps indistinguishable from random noise. The second rule, LRP-αβ, propagates the relevance according to Ri ¼ Pj ðα ðaiwijÞþ Pi ðaiwijÞþ − β ðaiwijÞ− Pi ðaiwijÞ−ÞRj, where ðÞþ and ðÞ− refer to positive and negative contributions. We use α ¼ 1 and β ¼ 1, a popular choice that renders this method equivalent to the Excitation Backprop method [50]. We also validate that our results do not change qualitatively when the parameters α and β are modified slightly. We apply the same method to all of the layers in the DNN under study.
Clearly, the different saliency methods provide very different answers to the basic question of which input pixels are more relevant to the DNNs output. It is therefore important to find a criterion to choose the method(s) most appropriate to interpret the model in the present context. Past work has shown that some saliency methods lack robustness [53,54] and could be inappropriate for our combination of data and model. To assess the robustness of each method, we perform a model parameter randomization test, following the tests performed in Ref. [55]. For each method, we compute saliency maps not only on the trained DNN, but also on the models that result from randomizing the networks’ parameters. We perform this randomization incrementally, starting with only the output layer, all the way to the first convolutional layer. Methods that yield saliency maps that are insensitive to these randomizations fail the test, as the structures in these saliency maps cannot then stem from features the DNN has learned during training.
Visually, the gradient-based methods (and LRP-ϵ) are very sensitive to the model’s parameters, while propagation-based methods exhibit strong correlations between the saliency map computed on the trained and the random models.
Comments