“Why did my model make this prediction?” AllenNLP interpretation

Making predictions

Published by FirstAlign

One of the most common circumstances that any AI or NLP practitioner has faced is scratching their head thinking about a possible reason that their model made a certain prediction.

With technological advancement in NLP we are creating huge neural models for which are quite accurate in a constrained domain and have created some state of the art NLP models, but these models either underperform humans on some tasks and/ or shows one or another form of social bias. 

These models are a black box of sorts so we can’t see inside them. This makes interpreting why a certain prediction is made so difficult when the AI makes a mistake or shows a social bias.

In order to understand the prediction, we can use instance level interpretation which answers the question regarding why model made a certain prediction by providing the explanation. 

These explanations can be in following forms;

  1. By visualizing the local decision boundary of model.
  2. By modifying the input adversarially.
  3. By highlighting the saliency of input features.

There is open source code available for performing instance level interpretation but the problem with these code bases is that they are used for specific task or model, i.e. for computer vision purposes so most of the time NLP practitioners are unable to leverage model interpretation.

In order to solve the problem, researchers at AllenNLP through Interpret: A Framework for Explaining Predictions of NLP Models have proposed an opensource toolkit for NLP model interpretation which works on the top of AllenNLP demonstrated by thier AllenNLP: A Deep Semantic Natural Language Processing Platform in 2018. Using AllenNLP interpret we can use existing methods of interpretation on new models as well as create our own interpretation methods for a new model. 

AllenNLP offers following features.

  1. It has collection of interpretation technique for very wide variety of models.
  2. It has API which is independent of Task or Model.
  3. An interactive front-end to visualize the interpretation.

Following are the most common use cases for AllenNLP:

  1. Uncovering the model bias: Using AllenNLP we can uncover the bias in a model, if model tends to lean toward certain output or lean away from certain output we can detect it using AllenNLP
  2. Finding Decision Rule: Using AllenNLP we can find a decision rule for certain output, for example if we are performing named entity recognition it will find the decision rule that how a certain entity is name, location, organization etc.
  3. Error Detection: Using AllenNLP we can detect the error, for example if model predict a sentiment which is positive as negative then we can diagnose the reason why it did that.

How does it interpret models?

AllenNLP uses two types of interpretation for models;

Saliency Map Visualization: This is the technique widely used in computer vision in which an image (Saliency Map) is created to show the quality, what is unique to each pixel to better understand and analyze the picture. Here it does the same but rather than visualizing the pixel it visualizes the input token by showing its importance. 

Saliency Map

In above example we want to predict what could be the most suitable word to fill [MASK]. The resulting prediction is nurse. According to the saliency map the interpretation for those results is the presence of word emergency and her. If we look at other predictions it is quite clear why those choices are made due to the words emergency and her.

Adversarial attacks: Where saliency maps interpret by input token. This method describes the interpretation by exploiting the weakness of the model. So here input token is changed so that better results can be achieved. It can then tell us that why model wasn’t performing well or was showing bias.

HotFlip Attack

In above figures an adversarial attack is carried out. This flips the words in a sentence until the prediction changes. In this way we can understand which token in a sentence effects the prediction. An interpretation for a prediction then can be given.


Let’s see the demo of AllenNLP, in this demo we are going to perform a Sentiment Analysis and interpret the results.

Allen NLP

In the above example we have used the sentence “a very well-made, funny and entertaining picture” to analyze its sentiment. It shows that the sentiment of this sentence is positive, now let’s see why the model predicted it as positive.

SmoothGrad Visualization

The Saliency map above shows the interpretation of prediction. The three words that contribute in the decision making that this sentence has positive sentiment are ‘funny’, ‘entertaining’ and ‘picture’.

Word Selection

Adversarial attack is shown is shown above. In the hotflip attack we change the words until the prediction is changed. It could be understood that what words were essential to the prediction with three words “unfunny, bigoted and sitcom” changing the prediction. This give us the idea as to why it was positive before these changes.


There has been a huge advancement in model creation for NLP. The downside of these models is they lack general common sense in the ways that one would expect. In order to know why a model makes a particular prediction we need to interpret that model, but for doing this in NLP there is lack of resources. Here AllenNLP comes to rescue allowing us to understand and interpret a particular models in predefined as well as user defined ways.

It does this using Saliency Mapping and Adversarial Attacks. Our little experiment in this blog shows how it can interpret sentiment depending on the input.

Happy Coding ❤

Click here to connect with us

Spread the word

Related posts