diff --git a/README.md b/README.md index ef77880..1652c80 100644 --- a/README.md +++ b/README.md @@ -89,14 +89,14 @@ class MultiClassPerceptron: ``` Since we have 3 different classes for Classification, we create a Perceptron object for each class. Each Perceptron has score and update functions. During training, for a set of input features it takes the score from the Perceptron for each label and assigns the label with max score(for all the data instances). It compares the assigned label with the true label and decides whether or not to update the weights (with some learning rate). -Check the source [code](/classifier/linear_model.py) for more details on the implementation of Perceptron Classifier. +Check the source [code](classifier/linear_model.py) for more details on the implementation of Perceptron Classifier. ### Running the Model ```shell (citation-env) [user@server citation-analysis]$ python3 -m testing.model_testing ``` -[Link](/testing/model_testing.py) to the test source code. All the Hyperparameters can be modified to experiment with. +[Link](testing/model_testing.py) to the test source code. All the Hyperparameters can be modified to experiment with. ### Evaluation we used ***f1_score*** metric for evaluation of our baseline classifier. @@ -114,10 +114,10 @@ eval.metrics.f1_score(y_true, y_pred, labels, average) **labels** : list of labels/classes **average**: string - [None, 'micro', 'macro'] If None, the scores for each class are returned. -[Link](/eval/metrics.py) to the metrics source code. +[Link](eval/metrics.py) to the metrics source code. ### Results -Confusion Matrix Plot +Confusion Matrix Plot ### 2) Feed-forward Neural Network Classifier (Baseline Classifier) A feed-forward neural network classifier with a single hidden layer containing 9 units. While clearly not the ideal architecture for sequential text data, the feed-forward neural network provides a second baseline. The input to the feedforward network remained the same as the perceptron; only the third model is suitable for more complex inputs such as word embeddings. @@ -133,20 +133,20 @@ class FeedForward(torch.nn.Module): ``` -Check the source [code](/classifier/nn_ff.py) for more details on the implementation of the feed-forward neural network. +Check the source [code](classifier/nn_ff.py) for more details on the implementation of the feed-forward neural network. ### Running the Model ```shell (citation-env) [user@server citation-analysis]$ python3 -m testing.ff_model_testing ``` -[Link](/testing/ff_model_testing.py) to the test source code. All the Hyperparameters can be modified to experiment with. +[Link](testing/ff_model_testing.py) to the test source code. All the Hyperparameters can be modified to experiment with. ### Evaluation As in the perceptron classifier, we used ***f1_score*** metric for evaluation of our baseline classifier. ### Results -Confusion Matrix Plot +Confusion Matrix Plot ### 3) BiLSTM + Attention with ELMo (AllenNLP Model) The Bi-directional Long Short Term Memory (BiLSTM) model built using the [AllenNLP](https://allennlp.org/) library. For word representations, we used 100-dimensional [GloVe](https://nlp.stanford.edu/projects/glove/) vectors trained on a corpus of 6B tokens from Wikipedia. For contextual representations, we used [ELMo](https://allennlp.org/elmo) Embeddings which have been trained on a dataset of 5.5B tokens. This model uses the entire input text, as opposed to selected features in the text, as in the first two models. It has a single-layer BiLSTM with a hidden dimension size of 50 for each direction. @@ -155,10 +155,10 @@ We used AllenNLP's [Config Files](https://guide.allennlp.org/using-config-files) Our BiLSTM AllenNLP model contains 4 major components: - 1. Dataset Reader - [CitationDatasetReader](/utils/reader.py) + 1. Dataset Reader - [CitationDatasetReader](utils/reader.py) - It reads the data from the file, tokenizes the input text and creates AllenNLP `Instances` - Each `Instance` contains a dictionary of `tokens` and `label` - 2. Model - [BiLstmClassifier](/calssifier/nn.py) + 2. Model - [BiLstmClassifier](calssifier/nn.py) - The model's `forward()` method is called for every data instance by passing `tokens` and `label` - The signature of `forward()` needs to match with field names of the `Instance` created by the DatasetReader - This Model uses [ELMo](https://allennlp.org/elmo) deep contextualised embeddings. @@ -173,7 +173,7 @@ Our BiLSTM AllenNLP model contains 4 major components: - Dropout - Embeddings - All the classes that the Config file uses must register using Python decorators (for example, `@Model.register('bilstm_classifier'`). - 4. Predictor - [IntentClassificationPredictor](/classifier/intent_predictor.py) + 4. Predictor - [IntentClassificationPredictor](classifier/intent_predictor.py) - AllenNLP uses `Predictor`, a wrapper around the trained model, for making predictions. - The Predictor uses a pre-trained/saved model and dataset reader to predict new Instances @@ -188,7 +188,7 @@ $ allennlp train \ --include-package classifier ``` We ran a few experiments on this model, the run configurations, results and archived models are available in the `SAVED_MODELS_PATH` directory.
-**Note:** If the GPU cores are not available, set the `"cuda_device":` to `-1` in the [config file](/configs/basic_model.json?raw=true), otherwise the available GPU Core. +**Note:** If the GPU cores are not available, set the `"cuda_device":` to `-1` in the [config file](configs/basic_model.json?raw=true), otherwise the available GPU Core. ### Evaluation To evaluate the model, simply run: @@ -215,10 +215,10 @@ We also have an another way to make predictions without using `allennlp predict` ```shell (citation-env) [user@server citation-analysis]$ python3 -m testing.bilstm_predict ``` -Modify [this](/testing/bilstm_predict.py) source to run predictions on different experiments. It also saves the Confusion Matrix Plot (as shown below) after prediction. +Modify [this](testing/bilstm_predict.py) source to run predictions on different experiments. It also saves the Confusion Matrix Plot (as shown below) after prediction. ### Results -Confusion Matrix Plot +Confusion Matrix Plot ## References [\[1\]](https://github.com/allenai/scicite) SciCite GitHub Repository