diff --git a/README.md b/README.md index 3bdfda6..51a774f 100644 --- a/README.md +++ b/README.md @@ -1,154 +1,244 @@ -# Prompt-based methods for Dialog State Tracking + +# Prompt-based methods for Dialog State Tracking + +Repository for my master thesis at the University of Stuttgart (IMS). + +Refer to this thesis [proposal](proposal/proposal_submission_1st.pdf) document for detailed explanation about thesis experiments. + +## Dataset +MultiWOZ 2.1 [dataset](https://github.com/budzianowski/multiwoz/blob/master/data/MultiWOZ_2.1.zip) is used for training and evaluation of the baseline/prompt-based methods. MultiWOZ is a fully-labeled dataset with a collection of human-human written conversations spanning over multiple domains and topics. Only single-domain dialogues are used in this setup for training and testing. Each dialogue contains multiple turns and may also contain a subdomain *booking*. Five domains - *Hotel, Train, Restaurant, Attraction, Taxi* are used in the experiments and excluded the other two domains as they only appear in the training set. Under few-shot settings, only a portion of the training data is utilized to measure the performance of the DST task in a low-resource scenario. Dialogues are randomly picked for each domain. The below table contains some statistics of the dataset and data splits for the few-shot experiments. -Repository for my master thesis at the University of Stuttgart (IMS). +| Data Split | # Dialogues | # Total Turns | +|--|:--:|:--:| +| 5-dpd | 25 | 100 | +| 10-dpd | 50 | 234 | +| 50-dpd | 250 | 1114 | +| 100-dpd | 500 | 2292 | +| 125-dpd | 625 | 2831 | +| 250-dpd | 1125 | 5187 | +| valid | 190 | 900 | +| test | 193 | 894 | -Refer to this thesis [proposal](proposal/proposal_submission_1st.pdf) document for detailed explanation about thesis experiments. - -## Dataset -MultiWOZ 2.1 [dataset](https://github.com/budzianowski/multiwoz/blob/master/data/MultiWOZ_2.1.zip) is used for training and evaluation of the baseline/prompt-based methods. MultiWOZ is a fully-labeled dataset with a collection of human-human written conversations spanning over multiple domains and topics. Only single-domain dialogues are used in this setup for training and testing. Each dialogue contains multiple turns and may also contain a subdomain *booking*. Five domains - *Hotel, Train, Restaurant, Attraction, Taxi* are used in the experiments and excluded the other two domains as they only appear in the training set. Under few-shot settings, only a portion of the training data is utilized to measure the performance of the DST task in a low-resource scenario. Dialogues are randomly picked for each domain. The below table contains some statistics of the dataset and data splits for the few-shot experiments. - -| Data Split | # Dialogues | # Total Turns | -|--|:--:|:--:| -| 5-dpd | 25 | 100 | -| 10-dpd | 50 | 234 | -| 50-dpd | 250 | 1114 | -| 100-dpd | 500 | 2292 | -| 125-dpd | 625 | 2831 | -| 250-dpd | 1125 | 5187 | -| valid | 190 | 900 | -| test | 193 | 894 | - -In the above table, term "*dpd*" refers to "*dialogues per domain*". For example, *50-dpd* means *50 dialogues per each domain*. - -All the training and testing data can be found under [/data/baseline/](data/baseline) folder. - -## Environment Setup -Python 3.6 is required for training the baseline model. `conda` is used for creating environments. - -### Create conda environment (for baseline model) -Create an environment for baseline training with a specific python version (Python 3.6). -```shell -conda create -n python=3.6 -``` -### Create conda environment (for prompt learning) -Create an environment for prompt-based methods -```shell -# TODO -``` - -#### Activate the conda environment -To activate the conda environment, run: -```shell +In the above table, term "*dpd*" refers to "*dialogues per domain*". For example, *50-dpd* means *50 dialogues per each domain*. + +All the training and testing data can be found under [/data/](data/) folder. + +## Environment Setup +Python 3.6 is required for training the baseline mode. Python 3.10 is required for training the prompt-based model. `conda` is used for creating the environments. + +### Create conda environment (for baseline model) +Create an environment for baseline training with a specific python version (Python 3.6 is **required**). +```shell +conda create -n python=3.6 +``` +### Create conda environment (for prompt learning) +Create an environment for prompt-based methods (Python 3.10 is **required**) +```shell +conda create -n python=3.10 +``` + +#### Activate the conda environment +To activate the conda environment, run: +```shell conda activate -``` - -#### Deactivating the conda evironment -To deactivate the conda environment, run: (Only after running all the experiments) -```shell +``` + +#### Deactivating the conda environment +To deactivate the conda environment, run: (Only after running all the experiments) +```shell conda deactivate -``` -#### Download and extract SOLOIST pre-trained model -Download and unzip the pretrained model, this is used for finetuning the baseline and prompt-based methods. For more details about the pre-trained SOLOIST model, refer to the GitHub [repo](https://github.com/pengbaolin/soloist). - -Download the zip file, replace the `/path/to/folder` from the below command to a folder of your choice. -```shell -wget https://bapengstorage.blob.core.windows.net/soloist/gtg_pretrained.tar.gz \ - -P /path/to/folder/ -``` - -Extract the downloaded pretrained model zip file. -```shell +``` +#### Download and extract SOLOIST pre-trained model +Download and unzip the pretrained model, this is used for fine-tuning the baseline and prompt-based methods. For more details about the pre-trained SOLOIST model, refer to the GitHub [repo](https://github.com/pengbaolin/soloist). + +Download the zip file, replace the `/path/to/folder` from the below command to a folder of your choice. +```shell +wget https://bapengstorage.blob.core.windows.net/soloist/gtg_pretrained.tar.gz \ -P /path/to/folder/ +``` + +Extract the downloaded pretrained model zip file. +```shell tar -xvf /path/to/folder/gtg_pretrained.tar.gz -``` - -#### Clone the repository -Clone the repository for source code -```shell +``` + +#### Clone the repository +Clone the repository for source code +```shell git clone https://git.pavanmandava.com/pavan/master-thesis.git -``` -Pull the changes from remote (if local is behind the remote) -```shell +``` +Pull the changes from remote (if local is behind the remote) +```shell git pull -``` -Change directory -```shell +``` +Change directory +```shell cd master-thesis -``` - -#### Set Environment variables -Next step is to set environment variables that contains path to pre-trained model, saved models and output dirs. - -Edit the [set_env.sh](set_env.sh) file and set the paths for: (`nano` or `vim` can be used) - -`PRE_TRAINED_SOLOIST` - Path to the extracted pre-trained SOLOIST model - -`SAVED_MODELS_BASELINE` - Path for saving the trained models at checkpoints +``` + +#### Set Environment variables +Next step is to set environment variables that contains path to pre-trained model, saved models and output dirs. + +Edit the [set_env.sh](set_env.sh) file and set the paths (as required) for the following: + +`PRE_TRAINED_SOLOIST` - Path to the extracted pre-trained SOLOIST model + +`SAVED_MODELS_BASELINE` - Path for saving the trained baseline models (fine-tuning) at checkpoints + +`OUTPUTS_DIR_BASELINE` - Path for storing the baseline model outputs (belief state predictions) -`OUTPUTS_DIR_BASELINE` - Path for storing the outputs of belief state predictions. +`SAVED_MODELS_PROMPT` - Path for saving the trained prompt-based models (after each epoch) -```shell +`OUTPUTS_DIR_PROMPT` - Path for storing the prompt model outputs (generations) + +```shell nano set_env.sh -``` -Save the edited file and `source` it -```shell +``` + +Save the edited file and `source` it +```shell source set_env.sh ``` -Run the below line to unset the environment variables -```shell + +Run the below line to unset the environment variables +```shell sh unset_env.sh -``` +``` + +## Baseline Experiments +SOLOIST ([Peng et al., 2021](https://arxiv.org/abs/2005.05298)), the baseline model for this thesis, is a task-oriented dialog system that uses transfer learning and machine teaching to build task bots at scale. SOLOIST uses the pre-train, fine-tune paradigm for building end-to-end dialog systems using a transformer-based auto-regressive language model GPT-2. In the pre-training stage, SOLOIST is initialized with 12-layer GPT-2 (117M parameters) and further trained on two task-oriented dialog corpora for solving *belief state prediction* task. In the fine-tuning stage, the pre-trained SOLOIST is fine-tuned on MultiWOZ 2.1 dataset to perform belief prediction task. + +### Install the requirements +After following the environment setup steps in the previous [section](#environment-setup), install the required python modules for baseline model training. + +Change directory to `baseline` and install the requirements. Make sure the correct baseline conda environment is activated before installing the requirements. +```shell +cd baseline +pip install requirements.txt +``` + +### Train the baseline model +Train a separate model for each data split. Edit the [train_baseline.sh](baseline/train_baseline.sh) file to modify the hyperparameters while training (learning rate, epochs). Use `CUDA_VISIBLE_DEVICES` to specify a CUDA device (GPU) for training the model. +```shell +sh train_baseline.sh -d +``` +Pass the data split name to `-d` flag. Possible values are: `5-dpd`, `10-dpd`, `50-dpd`, `100-dpd`, `125-dpd`, `250-dpd` + +Example training command: `sh train_baseline.sh -d 50-dpd` + +### Belief State Prediction +Choose a checkpoint of the saved baseline model to generate belief state predictions. + +Set the `MODEL_CHECKPOINT` environment variable with the path to the chosen model checkpoint. It should only contain the path from the "experiment-{datetime}" folder. +```shell +export MODEL_CHECKPOINT=// +``` +Example: `export MODEL_CHECKPOINT=experiment-20220831/100-dpd/checkpoint-90000` + +Generate belief states by running decode script +```shell +sh decode_baseline.sh +``` +The generated predictions are saved under `OUTPUTS_DIR_BASELINE` folder. Some of the generated belief state predictions are uploaded to this repository and can be found under [outputs](outputs) folder. + +### Baseline Evaluation + +The standard Joint Goal Accuracy (JGA) is used to evaluate the belief predictions. This metric compares all the predicted belief states to the ground-truth states for each turn. The prediction is considered correct only if all the predicted belief states match with the ground-truth states. Both slots and values must match for the prediction to be correct. + +Edit the [evaluate.py](baseline/evaluate.py) to set the predictions output file before running the evaluation +```shell +python evaluate.py +``` +### Results from baseline experiments +|data-split| JGA | +|--|:--:| +| 5-dpd | 9.06 | +| 10-dpd | 14.20 | +| 50-dpd | 28.64 | +| 100-dpd | 33.11 | +| 125-dpd | 35.79 | +| 250-dpd | 40.38 | -## Baseline Experiments -SOLOIST ([Peng et al., 2021](https://arxiv.org/abs/2005.05298)), the baseline model for this thesis, is a task-oriented dialog system that uses transfer learning and machine teaching to build task bots at scale. SOLOIST uses the pre-train, fine-tune paradigm for building end-to-end dialog systems using a transformer-based auto-regressive language model GPT-2. In the pre-training stage, SOLOIST is initialized with 12-layer GPT-2 (117M parameters) and further trained on two task-oriented dialog corpora for solving *belief state prediction* task. In the fine-tuning stage, the pre-trained SOLOIST is fine-tuned on MultiWOZ 2.1 dataset to perform belief prediction task. +## Prompt Learning Experiments -### Install the requirements -After following the environment setup steps in the previous [section](#environment-setup), install the required python modules for baseline model training. +### Data +`create_dataset.py` +// TODO -Change directory to `baseline` and install the requirements. Make sure the correct baseline conda environment is activated before installing the requirements. -```shell -cd baseline -pip install requirements.txt -``` +### Install the requirements +After following the environment setup steps in the previous [section](#environment-setup), install the required python modules for prompt model training. -### Train the baseline model -Train a separate model for each data split. Edit the [train_baseline.sh](baseline/train_baseline.sh) file to modify the hyperparameters while training (learning rate, epochs). Use `CUDA_VISIBLE_DEVICES` to specify a CUDA device (GPU) for training the model. -```shell -sh train_baseline.sh -d -``` -Pass the data split name to `-d` flag. Possible values are: `5-dpd`, `10-dpd`, `50-dpd`, `100-dpd`, `125-dpd`, `250-dpd` +Change directory to `prompt-learning` and install the requirements. Make sure the correct prompt-learning `conda` environment is activated before installing the requirements. +```shell +cd prompt-learning +pip install requirements.txt +``` +### Train the prompt model +Train a separate model for each data split. Edit the [train_prompting.sh](prompt-learning/train_prompting.sh) file to modify the default hyperparameters for training (learning rate, epochs). +```shell +sh train_prompting.sh -d +``` +Pass the data split name to `-d` flag. +Possible values are: `5-dpd`, `10-dpd`, `50-dpd`, `100-dpd`, `125-dpd`, `250-dpd` + +Example training command: `sh train_baseline.sh -d 50-dpd` -Example training command: `sh train_baseline.sh -d 50-dpd` +**Some `train_prompting.sh` flags**: +`--num_epochs 10` - Number of epochs +`--learning_rate 5e-5` - Initial learning rate for Optimizer +`--with_inverse_prompt` - Use Inverse Prompt while training +`--inverse_prompt_weight 0.1` - Weight of the inverse prompt for loss function -### Belief State Prediction -Choose a checkpoint of the saved baseline model to generate belief state predictions. +**Note:** The defaults in `train_prompting.sh` are the best performing values. -Set the `MODEL_CHECKPOINT` environment variable with the path to the chosen model checkpoint. It should only contain the path from the "experiment-{datetime}" folder. -```shell -export MODEL_CHECKPOINT=// -``` -Example: `export MODEL_CHECKPOINT=experiment-20220831/100-dpd/checkpoint-90000` +### Belief State Generations (Prompt Generation) +Now, the belief states can be generated by prompting. Choose a prompt fine-tuned model from the saved epochs and run the below script to generate belief states. -Generate belief states by running decode script -```shell -sh decode_baseline.sh +Generate belief states by running the below script: +```shell +sh test_prompting.sh -m ``` -The generated predictions are saved under `OUTPUTS_DIR_BASELINE` folder. Some generated belief state predictions are uploaded to this repository and can be found under [outputs](outputs) folder. +The argument `-m` takes the relative path of saved model from `SAVED_MODELS_PROMPT` env variable. It takes the following structure `-m //` -### Baseline Evaluation +Example: `sh test_prompting.sh -m 50-dpd/experiment-20221003T172424/epoch-09` -The standard Joint Goal Accuracy (JGA) is used to evaluate the belief predictions. This metric compares all the predicted belief states to the ground-truth states for each turn. The prediction is considered correct only if all the predicted belief states match with the ground-truth states. Both slots and values must match for the prediction to be correct. +The generated belief states (outputs) are saved under `OUTPUTS_DIR_PROMPT` folder. Some of the best outputs are uploaded to this repository and can be found under [outputs](outputs) folder. -Edit the [evaluate.py](baseline/evaluate.py) to set the predictions output file before running the evaluation -```shell +### Prompting Evaluation +The standard Joint Goal Accuracy (JGA) is used to evaluate the belief predictions. + +Edit the [evaluate.py](prompt-learning/evaluate.py) to set the predictions output file before running the evaluation +```shell python evaluate.py +``` +### Results from prompt-based belief state generations +|data-split| JGA* | +|--|:--:| +| 5-dpd | //TODO | +| 10-dpd | //TODO | +| 50-dpd | //TODO | +| 100-dpd | //TODO | +| 125-dpd | //TODO | +| 250-dpd | //TODO | + +// TODO :: Add prompt-based outputs and results in the above table + +## Multi-prompt Learning Experiments + +### Prompt Ensemble +**Training** + +Train a separate model for each data split. Edit the [train_prompting.sh](prompt-learning/train_prompting.sh) file and add `--with_prompt_ensemble` for training with multiple prompt functions. + +// TODO :: Add more README for training and generating. +// WIP :: Prompt ensemble training + +### Prompt Augmentation +Prompt Augmentation, sometimes called *demonstration learning*, provides a few additional *answered prompts* that can demonstrate to the PLM, how the actual prompt slot can be answered. Sample selection of answered prompts are manually hand-picked. Experiments are performed on different sets of *answered prompts*. + +Edit the [test_prompting.sh](prompt-learning/test_prompting.sh) file and add `--with_answered_prompts` flag for generating slots with answered prompts. + +Generate belief states by running the below script: +```shell +sh test_prompting.sh -m ``` -#### Results from baseline evaluation -|data-split| JGA | -|--|:--:| -| 5-dpd | 9.06 | -| 10-dpd | 14.20 | -| 50-dpd | 28.64 | -| 100-dpd | 33.11 | -| 125-dpd | 35.79 | -| 250-dpd | 40.38 | - +// TODO :: Add results diff --git a/prompt-learning/test_prompting.sh b/prompt-learning/test_prompting.sh index d073a16..8ff2d8d 100644 --- a/prompt-learning/test_prompting.sh +++ b/prompt-learning/test_prompting.sh @@ -2,7 +2,7 @@ usage="$(basename "$0") [-m ] Argument -m takes the relative path of fine-tuned model from ${SAVED_MODELS_PROMPT}. - Example: -m 250-dpd/experiment-20221030T172424/epoch-08" + Example: -m 250-dpd/experiment-20221003T172424/epoch-08" while getopts :m: flag do @@ -39,7 +39,7 @@ if [ ! -f "${TEST_DATA_FILE}" ]; then fi FINE_TUNED_MODEL_PATH=${SAVED_MODELS_PROMPT}/${model_path} -if [ ! -d ${FINE_TUNED_MODEL_PATH} ]; then +if [ ! -d "${FINE_TUNED_MODEL_PATH}" ]; then echo "Invalid fine-tuned model path - ${model_path}" fi