Since we have 3 different classes for Classification, we create a Perceptron object for each class. Each Perceptron has score and update functions. During training, for a set of input features it takes the score from the Perceptron for each label and assigns the label with max score(for all the data instances). It compares the assigned label with the true label and decides whether or not to update the weights (with some learning rate).
Since we have 3 different classes for Classification, we create a Perceptron object for each class. Each Perceptron has score and update functions. During training, for a set of input features it takes the score from the Perceptron for each label and assigns the label with max score(for all the data instances). It compares the assigned label with the true label and decides whether or not to update the weights (with some learning rate).
Check the source [code](classifier/linear_model.py) for more details on the implementation of Perceptron Classifier.
Check the source [code](/classifier/linear_model.py) for more details on the implementation of Perceptron Classifier.
A feed-forward neural network classifier with a single hidden layer containing 9 units. While clearly not the ideal architecture for sequential text data, the feed-forward neural network provides a second baseline. The input to the feedforward network remained the same as the perceptron; only the third model is suitable for more complex inputs such as word embeddings.
A feed-forward neural network classifier with a single hidden layer containing 9 units. While clearly not the ideal architecture for sequential text data, the feed-forward neural network provides a second baseline. The input to the feedforward network remained the same as the perceptron; only the third model is suitable for more complex inputs such as word embeddings.
@ -133,20 +133,20 @@ class FeedForward(torch.nn.Module):
```
```
Check the source [code](classifier/nn_ff.py) for more details on the implementation of the feed-forward neural network.
Check the source [code](/classifier/nn_ff.py) for more details on the implementation of the feed-forward neural network.
### 3) BiLSTM + Attention with ELMo (AllenNLP Model)
### 3) BiLSTM + Attention with ELMo (AllenNLP Model)
The Bi-directional Long Short Term Memory (BiLSTM) model built using the [AllenNLP](https://allennlp.org/) library. For word representations, we used 100-dimensional [GloVe](https://nlp.stanford.edu/projects/glove/) vectors trained on a corpus of 6B tokens from Wikipedia. For contextual representations, we used [ELMo](https://allennlp.org/elmo) Embeddings which have been trained on a dataset of 5.5B tokens. This model uses the entire input text, as opposed to selected features in the text, as in the first two models. It has a single-layer BiLSTM with a hidden dimension size of 50 for each direction.
The Bi-directional Long Short Term Memory (BiLSTM) model built using the [AllenNLP](https://allennlp.org/) library. For word representations, we used 100-dimensional [GloVe](https://nlp.stanford.edu/projects/glove/) vectors trained on a corpus of 6B tokens from Wikipedia. For contextual representations, we used [ELMo](https://allennlp.org/elmo) Embeddings which have been trained on a dataset of 5.5B tokens. This model uses the entire input text, as opposed to selected features in the text, as in the first two models. It has a single-layer BiLSTM with a hidden dimension size of 50 for each direction.
@ -155,10 +155,10 @@ We used AllenNLP's [Config Files](https://guide.allennlp.org/using-config-files)
Our BiLSTM AllenNLP model contains 4 major components:
Our BiLSTM AllenNLP model contains 4 major components:
- AllenNLP uses `Predictor`, a wrapper around the trained model, for making predictions.
- AllenNLP uses `Predictor`, a wrapper around the trained model, for making predictions.
- The Predictor uses a pre-trained/saved model and dataset reader to predict new Instances
- The Predictor uses a pre-trained/saved model and dataset reader to predict new Instances
@ -188,7 +188,7 @@ $ allennlp train \
--include-package classifier
--include-package classifier
```
```
We ran a few experiments on this model, the run configurations, results and archived models are available in the `SAVED_MODELS_PATH` directory. <br/>
We ran a few experiments on this model, the run configurations, results and archived models are available in the `SAVED_MODELS_PATH` directory. <br/>
**Note:** If the GPU cores are not available, set the `"cuda_device":` to `-1` in the [config file](configs/basic_model.json?raw=true), otherwise the available GPU Core.
**Note:** If the GPU cores are not available, set the `"cuda_device":` to `-1` in the [config file](/configs/basic_model.json?raw=true), otherwise the available GPU Core.
### Evaluation
### Evaluation
To evaluate the model, simply run:
To evaluate the model, simply run:
@ -215,10 +215,10 @@ We also have an another way to make predictions without using `allennlp predict`
Modify [this](testing/bilstm_predict.py) source to run predictions on different experiments. It also saves the Confusion Matrix Plot (as shown below) after prediction.
Modify [this](/testing/bilstm_predict.py) source to run predictions on different experiments. It also saves the Confusion Matrix Plot (as shown below) after prediction.