Compare commits

...

2 Commits

Author SHA1 Message Date
Pavan Mandava 7cb854863e Fixed paths in README
3 years ago
yelircaasi 3d6f24bf5b
Update README.md
5 years ago

@ -89,14 +89,14 @@ class MultiClassPerceptron:
``` ```
Since we have 3 different classes for Classification, we create a Perceptron object for each class. Each Perceptron has score and update functions. During training, for a set of input features it takes the score from the Perceptron for each label and assigns the label with max score(for all the data instances). It compares the assigned label with the true label and decides whether or not to update the weights (with some learning rate). Since we have 3 different classes for Classification, we create a Perceptron object for each class. Each Perceptron has score and update functions. During training, for a set of input features it takes the score from the Perceptron for each label and assigns the label with max score(for all the data instances). It compares the assigned label with the true label and decides whether or not to update the weights (with some learning rate).
Check the source [code](/classifier/linear_model.py) for more details on the implementation of Perceptron Classifier. Check the source [code](classifier/linear_model.py) for more details on the implementation of Perceptron Classifier.
### Running the Model ### Running the Model
```shell ```shell
(citation-env) [user@server citation-analysis]$ python3 -m testing.model_testing (citation-env) [user@server citation-analysis]$ python3 -m testing.model_testing
``` ```
[Link](/testing/model_testing.py) to the test source code. All the Hyperparameters can be modified to experiment with. [Link](testing/model_testing.py) to the test source code. All the Hyperparameters can be modified to experiment with.
### Evaluation ### Evaluation
we used ***f1_score*** metric for evaluation of our baseline classifier. we used ***f1_score*** metric for evaluation of our baseline classifier.
@ -114,10 +114,10 @@ eval.metrics.f1_score(y_true, y_pred, labels, average)
**labels** : list of labels/classes **labels** : list of labels/classes
**average**: string - [None, 'micro', 'macro'] If None, the scores for each class are returned. **average**: string - [None, 'micro', 'macro'] If None, the scores for each class are returned.
[Link](/eval/metrics.py) to the metrics source code. [Link](eval/metrics.py) to the metrics source code.
### Results ### Results
<img src="/plots/perceptron/confusion_matrix_plot.png?raw=true" width="600" height = "450" alt = "Confusion Matrix Plot" /> <img src="plots/perceptron/confusion_matrix_plot.png" width="600" height = "450" alt = "Confusion Matrix Plot" />
### 2) Feed-forward Neural Network Classifier (Baseline Classifier) ### 2) Feed-forward Neural Network Classifier (Baseline Classifier)
A feed-forward neural network classifier with a single hidden layer containing 9 units. While clearly not the ideal architecture for sequential text data, the feed-forward neural network provides a second baseline. The input to the feedforward network remained the same as the perceptron; only the third model is suitable for more complex inputs such as word embeddings. A feed-forward neural network classifier with a single hidden layer containing 9 units. While clearly not the ideal architecture for sequential text data, the feed-forward neural network provides a second baseline. The input to the feedforward network remained the same as the perceptron; only the third model is suitable for more complex inputs such as word embeddings.
@ -133,20 +133,20 @@ class FeedForward(torch.nn.Module):
``` ```
Check the source [code](/classifier/nn_ff.py) for more details on the implementation of the feed-forward neural network. Check the source [code](classifier/nn_ff.py) for more details on the implementation of the feed-forward neural network.
### Running the Model ### Running the Model
```shell ```shell
(citation-env) [user@server citation-analysis]$ python3 -m testing.ff_model_testing (citation-env) [user@server citation-analysis]$ python3 -m testing.ff_model_testing
``` ```
[Link](/testing/ff_model_testing.py) to the test source code. All the Hyperparameters can be modified to experiment with. [Link](testing/ff_model_testing.py) to the test source code. All the Hyperparameters can be modified to experiment with.
### Evaluation ### Evaluation
As in theperceptron, we used ***f1_score*** metric for evaluation of our baseline classifier. As in the perceptron classifier, we used ***f1_score*** metric for evaluation of our baseline classifier.
### Results ### Results
<img src="/plots/perceptron/confusion_matrix_plot_ff.png?raw=true" width="600" height = "450" alt = "Confusion Matrix Plot" /> <img src="plots/ffnn_model/confusion_matrix_plot_ff.png" width="600" height = "450" alt = "Confusion Matrix Plot" />
### 3) BiLSTM + Attention with ELMo (AllenNLP Model) ### 3) BiLSTM + Attention with ELMo (AllenNLP Model)
The Bi-directional Long Short Term Memory (BiLSTM) model built using the [AllenNLP](https://allennlp.org/) library. For word representations, we used 100-dimensional [GloVe](https://nlp.stanford.edu/projects/glove/) vectors trained on a corpus of 6B tokens from Wikipedia. For contextual representations, we used [ELMo](https://allennlp.org/elmo) Embeddings which have been trained on a dataset of 5.5B tokens. This model uses the entire input text, as opposed to selected features in the text, as in the first two models. It has a single-layer BiLSTM with a hidden dimension size of 50 for each direction. The Bi-directional Long Short Term Memory (BiLSTM) model built using the [AllenNLP](https://allennlp.org/) library. For word representations, we used 100-dimensional [GloVe](https://nlp.stanford.edu/projects/glove/) vectors trained on a corpus of 6B tokens from Wikipedia. For contextual representations, we used [ELMo](https://allennlp.org/elmo) Embeddings which have been trained on a dataset of 5.5B tokens. This model uses the entire input text, as opposed to selected features in the text, as in the first two models. It has a single-layer BiLSTM with a hidden dimension size of 50 for each direction.
@ -155,10 +155,10 @@ We used AllenNLP's [Config Files](https://guide.allennlp.org/using-config-files)
Our BiLSTM AllenNLP model contains 4 major components: Our BiLSTM AllenNLP model contains 4 major components:
1. Dataset Reader - [CitationDatasetReader](/utils/reader.py) 1. Dataset Reader - [CitationDatasetReader](utils/reader.py)
- It reads the data from the file, tokenizes the input text and creates AllenNLP `Instances` - It reads the data from the file, tokenizes the input text and creates AllenNLP `Instances`
- Each `Instance` contains a dictionary of `tokens` and `label` - Each `Instance` contains a dictionary of `tokens` and `label`
2. Model - [BiLstmClassifier](/calssifier/nn.py) 2. Model - [BiLstmClassifier](calssifier/nn.py)
- The model's `forward()` method is called for every data instance by passing `tokens` and `label` - The model's `forward()` method is called for every data instance by passing `tokens` and `label`
- The signature of `forward()` needs to match with field names of the `Instance` created by the DatasetReader - The signature of `forward()` needs to match with field names of the `Instance` created by the DatasetReader
- This Model uses [ELMo](https://allennlp.org/elmo) deep contextualised embeddings. - This Model uses [ELMo](https://allennlp.org/elmo) deep contextualised embeddings.
@ -173,7 +173,7 @@ Our BiLSTM AllenNLP model contains 4 major components:
- Dropout - Dropout
- Embeddings - Embeddings
- All the classes that the Config file uses must register using Python decorators (for example, `@Model.register('bilstm_classifier'`). - All the classes that the Config file uses must register using Python decorators (for example, `@Model.register('bilstm_classifier'`).
4. Predictor - [IntentClassificationPredictor](/classifier/intent_predictor.py) 4. Predictor - [IntentClassificationPredictor](classifier/intent_predictor.py)
- AllenNLP uses `Predictor`, a wrapper around the trained model, for making predictions. - AllenNLP uses `Predictor`, a wrapper around the trained model, for making predictions.
- The Predictor uses a pre-trained/saved model and dataset reader to predict new Instances - The Predictor uses a pre-trained/saved model and dataset reader to predict new Instances
@ -188,7 +188,7 @@ $ allennlp train \
--include-package classifier --include-package classifier
``` ```
We ran a few experiments on this model, the run configurations, results and archived models are available in the `SAVED_MODELS_PATH` directory. <br /> We ran a few experiments on this model, the run configurations, results and archived models are available in the `SAVED_MODELS_PATH` directory. <br />
**Note:** If the GPU cores are not available, set the `"cuda_device":` to `-1` in the [config file](/configs/basic_model.json?raw=true), otherwise the available GPU Core. **Note:** If the GPU cores are not available, set the `"cuda_device":` to `-1` in the [config file](configs/basic_model.json?raw=true), otherwise the available GPU Core.
### Evaluation ### Evaluation
To evaluate the model, simply run: To evaluate the model, simply run:
@ -215,10 +215,10 @@ We also have an another way to make predictions without using `allennlp predict`
```shell ```shell
(citation-env) [user@server citation-analysis]$ python3 -m testing.bilstm_predict (citation-env) [user@server citation-analysis]$ python3 -m testing.bilstm_predict
``` ```
Modify [this](/testing/bilstm_predict.py) source to run predictions on different experiments. It also saves the Confusion Matrix Plot (as shown below) after prediction. Modify [this](testing/bilstm_predict.py) source to run predictions on different experiments. It also saves the Confusion Matrix Plot (as shown below) after prediction.
### Results ### Results
<img src="/plots/bilstm_model/confusion_matrix_plot.png?raw=true" width="600" height = "450" alt = "Confusion Matrix Plot" /> <img src="plots/bilstm_model/confusion_matrix_plot.png" width="600" height = "450" alt = "Confusion Matrix Plot" />
## References ## References
[\[1\]](https://github.com/allenai/scicite) SciCite GitHub Repository<br /> [\[1\]](https://github.com/allenai/scicite) SciCite GitHub Repository<br />

Loading…
Cancel
Save