diff --git a/README.md b/README.md
index 001be3e..733f93c 100644
--- a/README.md
+++ b/README.md
@@ -4,14 +4,14 @@ Project repo for Computational Linguistics Team Lab at the University of Stuttga
 ## Introduction
 This repository contains code and datasets for classifying citation intents in research papers.
 
-We implemented 3 different classifiers and evaluated the results:
+We implemented 3 classifiers and evaluated on test dataset:
 
- - Perceptron Classifier - Baseline (Implemented from scratch)
+ - Perceptron Classifier - Baseline model (Implemented from scratch)
  - Feedforward Neural Network Classifier (using [PyTorch](https://pytorch.org/))
  - BiLSTM + Attention with ELMo Embeddings (using [AllenNLP](https://allennlp.org/) library)
 
-This README documentation focuses on running the code base, training the models and predictions. For more information about our project work and detailed error analysis, check [this](https://www.overleaf.com/project/5f1b0e8a6d0fb80001ceb5eb) report. <br/>
-For more information on the Citation Intent Classification in Scientific Publications, follow this [link](https://arxiv.org/pdf/1904.01608.pdf) to the original published paper and the [GitHub repo](https://github.com/allenai/scicite)
+This README documentation focuses on running the code base, training the models and predictions. For more information about our project work, model results and detailed error analysis, check [this](https://www.overleaf.com/project/5f1b0e8a6d0fb80001ceb5eb) report. <br/>
+For more information on the Citation Intent Classification in Scientific Publications, follow this [link](https://arxiv.org/pdf/1904.01608.pdf) to the original published paper and their [GitHub repo](https://github.com/allenai/scicite)
 
 ## Environment & Setup
 TODO
@@ -27,11 +27,12 @@ We have 3 different intents/classes in the dataset:
 |  | background | method | result |
 |:---|:---:|:---:|:---:|
 | train | 4.8 K | 2.3 K | 1.1 K |
+| dev | 0.5 K | 0.3 K | 0.1 K |
 | test | 1 K | 0.6 K | 0.2 K |
 
 ## Methods (Classification)
 ### 1) Perceptron Classifier (Baseline Classifier)
-We implemented [Perceptron](https://en.wikipedia.org/wiki/Perceptron) as a baseline classifier, from scratch (including evaluation). Perceptron is an algorithm for supervised learning of classification. It's a Linear and a Binary Classifier, which means it can only decide whether or not an input feature belongs to some specific class and also it's only capable of learning linearly separable patterns.
+We implemented [Perceptron](https://en.wikipedia.org/wiki/Perceptron) as a baseline classifier, from scratch (including evaluation). Perceptron is an algorithm for supervised learning of classification. It's a linear and binary classifier, which means it can only decide whether or not an input feature belongs to some specific class and  it's only capable of learning linearly separable patterns.
 ```python
 class Perceptron:
   def __init__(self, label: str, weights: dict, theta_bias: float):
@@ -47,15 +48,13 @@ Since we have 3 different classes for Classification, we create a Perceptron obj
 
 Check the source [code](/classifier/linear_model.py) for more details on the implementation of Perceptron Classifier.
 
-#### Running the Model
-
+### Running the Model
 > `(citation-env) [user@server citation-analysis]$ python -m testing.model_testing`
   
-[link](/testing/model_testing.py) to the source code. All the Hyperparameters can be modified to experiment with.
+[Link](/testing/model_testing.py) to the source code. All the Hyperparameters can be modified to experiment with.
   
-**Evaluation**  
+### Evaluation  
 we used ***f1_score*** metric for evaluation of our baseline classifier.
-  
 > F1 score is a weighted average of Precision and Recall(or Harmonic Mean between Precision and Recall). 
 > The formula for F1 Score is:  
 > F1 = 2 * (precision * recall) / (precision + recall)  
@@ -72,12 +71,47 @@ eval.metrics.f1_score(y_true, y_pred, labels, average)
 [Link](/eval/metrics.py) to the metrics source code.
 
 ### Results
-<img src="/plots/perceptron/confusion_matrix_plot.png?raw=true" width="400" height = "300" alt = "Confusion Matrix Plot" />
+<img src="/plots/perceptron/confusion_matrix_plot.png?raw=true" width="500" height = "375" alt = "Confusion Matrix Plot" />
 
 ### 2) Feedforward Neural Network (using PyTorch)
+A feed-forward neural network classifier with a single hidden layer containing 9 units. While a feed-forward neural network is clearly not the ideal architecture for sequential text data, it was of interest to add a sort of second baseline and examine the added gains (if any) relative to a single perceptron. The input to the feedforward network remained the same; only the final model was suitable for more complex inputs such as word embeddings.
 
-TODO
+Check this feed-forward model source [code](/classifier/linear_model.py) for more details.
 
 ### 3) BiLSTM + Attention with ELMo (AllenNLP Model)
+The Bi-directional Long Short Term Memory (BiLSTM) model built using the [AllenNLP](https://allennlp.org/) library. For word representations, we used 100-dimensional [GloVe](https://nlp.stanford.edu/projects/glove/) vectors trained on a corpus of 6B tokens from Wikipedia. For contextual representations, we used [ELMo](https://allennlp.org/elmo) Embeddings which have been trained on a dataset of 5.5B tokens. This model uses the entire input text, as opposed to selected features in the text, as in the first two models. It has a single-layer BiLSTM with a hidden dimension size of 50 for each direction. 
+
+We used AllenNLP's [Config Files](https://guide.allennlp.org/using-config-files) to build our model, just need to implement a model and a dataset reader (with a Config file).
+
+Our BiLSTM AllenNLP model contains 4 major components:
+
+ 1. Dataset Reader - [CitationDatasetReader](/utils/reader.py)
+	 - It reads the data from the file, tokenizes the input text and creates AllenNLP `Instances` 
+	 - Each `Instance` contains a dictionary of `tokens` and `label`
+ 2. Model - [BiLstmClassifier](/calssifier/nn.py)
+	 - The model's `forward()` method is called for every data instance by passing `tokens` and `label`
+	 - The signature of `forward()` needs to match with field names of the `Instance` created by the DatasetReader
+	 - The `forward()` method finally returns an output dictionary with the predicted label, loss, softmax probabilities and so on...
+ 3. Config File - [basic_model.json](configs/basic_model.json?raw=true)
+	 - The AllenNLP Configuration file takes the constructor parameters for various objects (Model, DatasetReader, Predictor, ...)
+	 - We can also define a number of Hyperparameters from the Config file.
+		 - Depth and Width of the Network
+		 - Number of Epochs
+		 - Optimizer & Learning Rate
+		 - Batch Size
+		 - Dropout
+		 - Embeddings
+ 4. Predictor - [IntentClassificationPredictor](/testing/intent_predictor.py)
+	 - AllenNLP uses `Predictor`, a wrapper around trained model, for making predictions.
+	 - The Predictor uses a pre-trained/saved model and dataset reader to predict new Instances
+
+### Running the Model
+TODO
+
+### Evaluation
+TODO
+
+### Results
+<img src="/plots/bilstm_model/confusion_matrix_plot.png?raw=true" width="500" height = "375" alt = "Confusion Matrix Plot" />
 
-TODO
\ No newline at end of file
+## References
\ No newline at end of file
diff --git a/classifier/nn.py b/classifier/nn.py
index 1f0f51e..d2c17d2 100644
--- a/classifier/nn.py
+++ b/classifier/nn.py
@@ -104,10 +104,6 @@ class BiLstmClassifier(Model):
         output_dict['probabilities'] = class_probabilities
         output_dict['positive_label'] = label
         output_dict['prediction'] = label
-        # citation_text = []
-        # for batch_text in output_dict['tokens']:
-        #     citation_text.append([self.vocab.get_token_from_index(token_id.item()) for token_id in batch_text])
-        # output_dict['tokens'] = citation_text
 
         return output_dict
 
diff --git a/plots/bilstm_model/confusion_matrix_plot.png b/plots/bilstm_model/confusion_matrix_plot.png
new file mode 100644
index 0000000..1f824c8
Binary files /dev/null and b/plots/bilstm_model/confusion_matrix_plot.png differ