WIP : README Documentation - AllenNLP Model & plot

isaac
Pavan Mandava 5 years ago
parent e24fad7be5
commit d31d2c5e7b

@ -4,14 +4,14 @@ Project repo for Computational Linguistics Team Lab at the University of Stuttga
## Introduction ## Introduction
This repository contains code and datasets for classifying citation intents in research papers. This repository contains code and datasets for classifying citation intents in research papers.
We implemented 3 different classifiers and evaluated the results: We implemented 3 classifiers and evaluated on test dataset:
- Perceptron Classifier - Baseline (Implemented from scratch) - Perceptron Classifier - Baseline model (Implemented from scratch)
- Feedforward Neural Network Classifier (using [PyTorch](https://pytorch.org/)) - Feedforward Neural Network Classifier (using [PyTorch](https://pytorch.org/))
- BiLSTM + Attention with ELMo Embeddings (using [AllenNLP](https://allennlp.org/) library) - BiLSTM + Attention with ELMo Embeddings (using [AllenNLP](https://allennlp.org/) library)
This README documentation focuses on running the code base, training the models and predictions. For more information about our project work and detailed error analysis, check [this](https://www.overleaf.com/project/5f1b0e8a6d0fb80001ceb5eb) report. <br/> This README documentation focuses on running the code base, training the models and predictions. For more information about our project work, model results and detailed error analysis, check [this](https://www.overleaf.com/project/5f1b0e8a6d0fb80001ceb5eb) report. <br/>
For more information on the Citation Intent Classification in Scientific Publications, follow this [link](https://arxiv.org/pdf/1904.01608.pdf) to the original published paper and the [GitHub repo](https://github.com/allenai/scicite) For more information on the Citation Intent Classification in Scientific Publications, follow this [link](https://arxiv.org/pdf/1904.01608.pdf) to the original published paper and their [GitHub repo](https://github.com/allenai/scicite)
## Environment & Setup ## Environment & Setup
TODO TODO
@ -27,11 +27,12 @@ We have 3 different intents/classes in the dataset:
| | background | method | result | | | background | method | result |
|:---|:---:|:---:|:---:| |:---|:---:|:---:|:---:|
| train | 4.8 K | 2.3 K | 1.1 K | | train | 4.8 K | 2.3 K | 1.1 K |
| dev | 0.5 K | 0.3 K | 0.1 K |
| test | 1 K | 0.6 K | 0.2 K | | test | 1 K | 0.6 K | 0.2 K |
## Methods (Classification) ## Methods (Classification)
### 1) Perceptron Classifier (Baseline Classifier) ### 1) Perceptron Classifier (Baseline Classifier)
We implemented [Perceptron](https://en.wikipedia.org/wiki/Perceptron) as a baseline classifier, from scratch (including evaluation). Perceptron is an algorithm for supervised learning of classification. It's a Linear and a Binary Classifier, which means it can only decide whether or not an input feature belongs to some specific class and also it's only capable of learning linearly separable patterns. We implemented [Perceptron](https://en.wikipedia.org/wiki/Perceptron) as a baseline classifier, from scratch (including evaluation). Perceptron is an algorithm for supervised learning of classification. It's a linear and binary classifier, which means it can only decide whether or not an input feature belongs to some specific class and it's only capable of learning linearly separable patterns.
```python ```python
class Perceptron: class Perceptron:
def __init__(self, label: str, weights: dict, theta_bias: float): def __init__(self, label: str, weights: dict, theta_bias: float):
@ -47,15 +48,13 @@ Since we have 3 different classes for Classification, we create a Perceptron obj
Check the source [code](/classifier/linear_model.py) for more details on the implementation of Perceptron Classifier. Check the source [code](/classifier/linear_model.py) for more details on the implementation of Perceptron Classifier.
#### Running the Model ### Running the Model
> `(citation-env) [user@server citation-analysis]$ python -m testing.model_testing` > `(citation-env) [user@server citation-analysis]$ python -m testing.model_testing`
[link](/testing/model_testing.py) to the source code. All the Hyperparameters can be modified to experiment with. [Link](/testing/model_testing.py) to the source code. All the Hyperparameters can be modified to experiment with.
**Evaluation** ### Evaluation
we used ***f1_score*** metric for evaluation of our baseline classifier. we used ***f1_score*** metric for evaluation of our baseline classifier.
> F1 score is a weighted average of Precision and Recall(or Harmonic Mean between Precision and Recall). > F1 score is a weighted average of Precision and Recall(or Harmonic Mean between Precision and Recall).
> The formula for F1 Score is: > The formula for F1 Score is:
> F1 = 2 * (precision * recall) / (precision + recall) > F1 = 2 * (precision * recall) / (precision + recall)
@ -72,12 +71,47 @@ eval.metrics.f1_score(y_true, y_pred, labels, average)
[Link](/eval/metrics.py) to the metrics source code. [Link](/eval/metrics.py) to the metrics source code.
### Results ### Results
<img src="/plots/perceptron/confusion_matrix_plot.png?raw=true" width="400" height = "300" alt = "Confusion Matrix Plot" /> <img src="/plots/perceptron/confusion_matrix_plot.png?raw=true" width="500" height = "375" alt = "Confusion Matrix Plot" />
### 2) Feedforward Neural Network (using PyTorch) ### 2) Feedforward Neural Network (using PyTorch)
A feed-forward neural network classifier with a single hidden layer containing 9 units. While a feed-forward neural network is clearly not the ideal architecture for sequential text data, it was of interest to add a sort of second baseline and examine the added gains (if any) relative to a single perceptron. The input to the feedforward network remained the same; only the final model was suitable for more complex inputs such as word embeddings.
TODO Check this feed-forward model source [code](/classifier/linear_model.py) for more details.
### 3) BiLSTM + Attention with ELMo (AllenNLP Model) ### 3) BiLSTM + Attention with ELMo (AllenNLP Model)
The Bi-directional Long Short Term Memory (BiLSTM) model built using the [AllenNLP](https://allennlp.org/) library. For word representations, we used 100-dimensional [GloVe](https://nlp.stanford.edu/projects/glove/) vectors trained on a corpus of 6B tokens from Wikipedia. For contextual representations, we used [ELMo](https://allennlp.org/elmo) Embeddings which have been trained on a dataset of 5.5B tokens. This model uses the entire input text, as opposed to selected features in the text, as in the first two models. It has a single-layer BiLSTM with a hidden dimension size of 50 for each direction.
We used AllenNLP's [Config Files](https://guide.allennlp.org/using-config-files) to build our model, just need to implement a model and a dataset reader (with a Config file).
Our BiLSTM AllenNLP model contains 4 major components:
1. Dataset Reader - [CitationDatasetReader](/utils/reader.py)
- It reads the data from the file, tokenizes the input text and creates AllenNLP `Instances`
- Each `Instance` contains a dictionary of `tokens` and `label`
2. Model - [BiLstmClassifier](/calssifier/nn.py)
- The model's `forward()` method is called for every data instance by passing `tokens` and `label`
- The signature of `forward()` needs to match with field names of the `Instance` created by the DatasetReader
- The `forward()` method finally returns an output dictionary with the predicted label, loss, softmax probabilities and so on...
3. Config File - [basic_model.json](configs/basic_model.json?raw=true)
- The AllenNLP Configuration file takes the constructor parameters for various objects (Model, DatasetReader, Predictor, ...)
- We can also define a number of Hyperparameters from the Config file.
- Depth and Width of the Network
- Number of Epochs
- Optimizer & Learning Rate
- Batch Size
- Dropout
- Embeddings
4. Predictor - [IntentClassificationPredictor](/testing/intent_predictor.py)
- AllenNLP uses `Predictor`, a wrapper around trained model, for making predictions.
- The Predictor uses a pre-trained/saved model and dataset reader to predict new Instances
### Running the Model
TODO
### Evaluation
TODO
### Results
<img src="/plots/bilstm_model/confusion_matrix_plot.png?raw=true" width="500" height = "375" alt = "Confusion Matrix Plot" />
TODO ## References

@ -104,10 +104,6 @@ class BiLstmClassifier(Model):
output_dict['probabilities'] = class_probabilities output_dict['probabilities'] = class_probabilities
output_dict['positive_label'] = label output_dict['positive_label'] = label
output_dict['prediction'] = label output_dict['prediction'] = label
# citation_text = []
# for batch_text in output_dict['tokens']:
# citation_text.append([self.vocab.get_token_from_index(token_id.item()) for token_id in batch_text])
# output_dict['tokens'] = citation_text
return output_dict return output_dict

Binary file not shown.

After

Width:  |  Height:  |  Size: 34 KiB

Loading…
Cancel
Save