- BiLSTM + Attention with ELMo Embeddings (using [AllenNLP](https://allennlp.org/) library)
- BiLSTM + Attention with ELMo Embeddings (using [AllenNLP](https://allennlp.org/) library)
This README documentation focuses on running the code base, training the models and predictions. For more information about our project work, model results and detailed error analysis, check [this](https://www.overleaf.com/project/5f1b0e8a6d0fb80001ceb5eb) report. <br/>
This README documentation focuses on running the code base, training the models and predictions. For more information about our project work, model results and detailed error analysis, check [this](https://www.overleaf.com/project/5f1b0e8a6d0fb80001ceb5eb) report. Slides from the mid-term presentation are available [here](/presentation.pdf).<br/>
For more information on the Citation Intent Classification in Scientific Publications, follow this [link](https://arxiv.org/pdf/1904.01608.pdf) to the original published paper and their [GitHub repo](https://github.com/allenai/scicite)
For more information on the Citation Intent Classification in Scientific Publications, follow this [link](https://arxiv.org/pdf/1904.01608.pdf) to the original published paper and their [GitHub repo](https://github.com/allenai/scicite)
## Environment & Setup
## Environment & Setup
TODO
It's recommended to use **Python 3.5 or greater**. Now we can install and create a Virtual Environment to run this project.
#### Installing virtualenv
```shell
python3 -m pip install --user virtualenv
```
#### Creating a virtual environment
**venv** (for Python 3) allows us to manage separate package installations for different projects.
```shell
python3 -m venv citation-env
```
#### Activating the virtual environment
Before we start installing or using packages in the virtual environment we need to _activate_ it.
```shell
source citation-env/bin/activate
```
#### Leaving the virtual environment
To leave the virtual environment, simply run:
```shell
deactivate
```
After activating the Virtual Environment, the console should look like this:
[Link](/eval/metrics.py) to the metrics source code.
[Link](/eval/metrics.py) to the metrics source code.
### Results
### Results
<imgsrc="/plots/perceptron/confusion_matrix_plot.png?raw=true"width="600" height ="450" alt ="Confusion Matrix Plot"/>
<imgsrc="/plots/perceptron/confusion_matrix_plot.png?raw=true"width="500" height ="375" alt ="Confusion Matrix Plot"/>
### 2) Feedforward Neural Network (using PyTorch)
### 2) Feedforward Neural Network (using PyTorch)
A feed-forward neural network classifier with a single hidden layer containing 9 units. While a feed-forward neural network is clearly not the ideal architecture for sequential text data, it was of interest to add a sort of second baseline and examine the added gains (if any) relative to a single perceptron. The input to the feedforward network remained the same; only the final model was suitable for more complex inputs such as word embeddings.
A feed-forward neural network classifier with a single hidden layer containing 9 units. While a feed-forward neural network is clearly not the ideal architecture for sequential text data, it was of interest to add a sort of second baseline and examine the added gains (if any) relative to a single perceptron. The input to the feedforward network remained the same; only the final model was suitable for more complex inputs such as word embeddings.
@ -81,7 +125,7 @@ Check this feed-forward model source [code](/classifier/linear_model.py) for mor
### 3) BiLSTM + Attention with ELMo (AllenNLP Model)
### 3) BiLSTM + Attention with ELMo (AllenNLP Model)
The Bi-directional Long Short Term Memory (BiLSTM) model built using the [AllenNLP](https://allennlp.org/) library. For word representations, we used 100-dimensional [GloVe](https://nlp.stanford.edu/projects/glove/) vectors trained on a corpus of 6B tokens from Wikipedia. For contextual representations, we used [ELMo](https://allennlp.org/elmo) Embeddings which have been trained on a dataset of 5.5B tokens. This model uses the entire input text, as opposed to selected features in the text, as in the first two models. It has a single-layer BiLSTM with a hidden dimension size of 50 for each direction.
The Bi-directional Long Short Term Memory (BiLSTM) model built using the [AllenNLP](https://allennlp.org/) library. For word representations, we used 100-dimensional [GloVe](https://nlp.stanford.edu/projects/glove/) vectors trained on a corpus of 6B tokens from Wikipedia. For contextual representations, we used [ELMo](https://allennlp.org/elmo) Embeddings which have been trained on a dataset of 5.5B tokens. This model uses the entire input text, as opposed to selected features in the text, as in the first two models. It has a single-layer BiLSTM with a hidden dimension size of 50 for each direction.
We used AllenNLP's [Config Files](https://guide.allennlp.org/using-config-files) to build our model, just need to implement a model and a dataset reader (with a Config file).
We used AllenNLP's [Config Files](https://guide.allennlp.org/using-config-files) to build our model, just need to implement a model and a dataset reader (with a JSON Config file).
Our BiLSTM AllenNLP model contains 4 major components:
Our BiLSTM AllenNLP model contains 4 major components:
@ -94,24 +138,53 @@ Our BiLSTM AllenNLP model contains 4 major components:
- The `forward()` method finally returns an output dictionary with the predicted label, loss, softmax probabilities and so on...
- The `forward()` method finally returns an output dictionary with the predicted label, loss, softmax probabilities and so on...
- AllenNLP uses `Predictor`, a wrapper around trained model, for making predictions.
- AllenNLP uses `Predictor`, a wrapper around the trained model, for making predictions.
- The Predictor uses a pre-trained/saved model and dataset reader to predict new Instances
- The Predictor uses a pre-trained/saved model and dataset reader to predict new Instances
### Running the Model
### Running the Model
TODO
AllenNLP provides `train`, `evaluate` and `predict` commands to interact with the models from command line.
#### Training
```shell
$ allennlp train \
configs/basic_model.json \
-s $SAVED_MODELS_PATH/experiment_10 \
--include-package classifier
```
We ran a few experiments on this model, the run configurations, results and archived models are available in the `SAVED_MODELS_PATH` directory. <br/>
**Note:** If the GPU cores are not available, set the `"cuda_device":` to `-1` in the [config file](/configs/basic_model.json?raw=true), or the available GPU Core.