diff --git a/README.md b/README.md index 63832b9..5f8527b 100644 --- a/README.md +++ b/README.md @@ -111,7 +111,7 @@ Save the edited file and `source` it source set_env.sh ``` -Run the below line to unset the environment variables +Run the below line to unset the environment variables (when done with experiments) ```shell sh unset_env.sh ``` @@ -139,7 +139,7 @@ Pass the data split name to `-d` flag. Possible values are: `5-dpd`, `10-dpd`, ` Example training command: `sh train_baseline.sh -d 50-dpd` ### Belief State Prediction -Choose a checkpoint of the saved baseline model to generate belief state predictions. +Choose a checkpoint of the saved baseline model to generate belief states. Set the `MODEL_CHECKPOINT` environment variable with the path to the chosen model checkpoint. It should only contain the path from the "experiment-{datetime}" folder. ```shell @@ -172,7 +172,7 @@ python evaluate.py | 50-dpd | 28.64 | | 100-dpd | 33.11 | | 125-dpd | 35.79 | -| 250-dpd | 40.38 | +| 250-dpd | **40.38** | ## Prompt Learning Experiments @@ -181,8 +181,9 @@ The data for training the prompt learning model is available under [data/prompt- `create_dataset.py` ([link](utils/create_dataset.py)) has the scripts for converting/creating the data for training the prompt-based model. > **Note:** -> Running `create_dataset.py` can take some time as it needs to download, install and run Stanford CoreNLP `stanza` package. -> All the data required for training the prompt-based model is available under [data](data) directory of this repo. +> Running `create_dataset.py` can take some time as it needs to download, install and run Stanford CoreNLP `stanza` package. This scripts downloads coreNLP files of size `~1GB` and requires significant amount of RAM and processor capabilities to run efficiently. +> +> All the data required for training the prompt-based model is already available under the [data](data) directory of this repo. ### Install the requirements After following the environment setup steps in the previous [section](#environment-setup), install the required python modules for prompt model training. @@ -206,17 +207,17 @@ Example training command: `sh train_baseline.sh -d 50-dpd` **Some `train_prompting.sh` flags**: -`--num_epochs 10` - Number of epochs +`--num_epochs` - Number of epochs -`--learning_rate 5e-5` - Initial learning rate for Optimizer +`--learning_rate` - Initial learning rate for Optimizer -`--with_inverse_prompt` - Use Inverse Prompt while training +`--with_inverse_prompt` - Use Inverse Prompt while training **(recommended)** -`--inverse_prompt_weight 0.1` - Weight of the inverse prompt for loss function +`--inverse_prompt_weight` - Weight of the inverse prompt for loss function **Note:** The defaults in `train_prompting.sh` are the best performing values. -### Belief State Generations (Prompt Generation) +### Belief State Generations (Prompt-based slot generation) Now, the belief states can be generated by prompting. Choose a prompt fine-tuned model from the saved epochs and run the below script to generate belief states. Generate belief states by running the below script: @@ -228,9 +229,9 @@ The argument `-m` takes the relative path of saved model from `SAVED_MODELS_PROM Example: `sh test_prompting.sh -m 50-dpd/experiment-20221003T172424/epoch-09` -The generated belief states (outputs) are saved under `OUTPUTS_DIR_PROMPT` folder. Some of the best outputs are uploaded to this repository and can be found under [outputs](outputs) folder. +The generated belief states (outputs) are saved under `OUTPUTS_DIR_PROMPT` folder. Some of the output files are uploaded to this repository and can be found under [outputs](outputs/prompt-learning) folder. -### Prompting Evaluation +### Evaluation of prompt-based generations The standard Joint Goal Accuracy (**JGA**) is used to evaluate the belief state predictions. In order to exclude the influence of wrongly extracted values, **JGA*** is computed only for values that are extracted correctly at each turn. The [evaluate.py](prompt-learning/evaluate.py) file can be used to verify the below JGA scores. @@ -245,7 +246,7 @@ python evaluate.py -o path/to/outputs/file | 5-dpd | 30.66 | 71.04 | | 10-dpd | 42.65 | 86.43 | | 50-dpd | 47.06 | 91.63 | -| 100-dpd | 47.74 | 92.31 | +| 100-dpd | **47.74** | **92.31** | | 125-dpd | 46.49 | 91.86 | | 250-dpd | 47.06 | 92.08 |