Added more baseline outputs, updated README

3 years ago · 1fe3d89476
parent 9be77c2de6
commit 1fe3d89476
3 changed files with 12529 additions and 6 deletions
--- a/README.md
+++ b/README.md
@ -5,10 +5,12 @@ Repository for my master thesis at the University of Stuttgart (IMS).
 Refer to this thesis [proposal](proposal/proposal_submission_1st.pdf) document for detailed explanation about thesis experiments.  
    
 ## Dataset
-MultiWOZ 2.1 [dataset](https://github.com/budzianowski/multiwoz/blob/master/data/MultiWOZ_2.1.zip) is used for training and evaluation of the baseline/prompt-based methods. MultiWOZ is a fully-labeled dataset with a collection of human-human written conversations spanning over multiple domains and topics. Only single-domain dialogues are used in this setup for training and testing. Each dialogue contains multiple turns and may also contain a sub-domain *booking*. Five domains - *Hotel, Train, Restaurant, Attraction, Taxi* are used in the experiments and excluded the other two domains as they only appear in the training set. Under few-shot settings, only a portion of the training data is utilized to measure the performance of the DST task in a low-resource scenario. Dialogues are randomly picked for each domain. The below table contains some statistics of the dataset and data splits for the few-shot experiments.
+MultiWOZ 2.1 [dataset](https://github.com/budzianowski/multiwoz/blob/master/data/MultiWOZ_2.1.zip) is used for training and evaluation of the baseline/prompt-based methods. MultiWOZ is a fully-labeled dataset with a collection of human-human written conversations spanning over multiple domains and topics. Only single-domain dialogues are used in this setup for training and testing. Each dialogue contains multiple turns and may also contain a subdomain *booking*. Five domains - *Hotel, Train, Restaurant, Attraction, Taxi* are used in the experiments and excluded the other two domains as they only appear in the training set. Under few-shot settings, only a portion of the training data is utilized to measure the performance of the DST task in a low-resource scenario. Dialogues are randomly picked for each domain. The below table contains some statistics of the dataset and data splits for the few-shot experiments.

 | Data Split | # Dialogues | # Total Turns |
 |--|:--:|:--:|
+| 5-dpd | 25 | 100 |
+| 10-dpd | 50 | 234 |
 | 50-dpd | 250 | 1114 |
 | 100-dpd | 500 | 2292 |
 | 125-dpd | 625 | 2831 |
@ -113,7 +115,7 @@ Train a separate model for each data split. Edit the [train_baseline.sh](baselin
 ```shell
 sh train_baseline.sh -d <data-split-name>
 ```
-Pass the data split name to `-d` flag. Possible values are: `50-dpd`, `100-dpd`, `125-dpd`, `250-dpd`
+Pass the data split name to `-d` flag. Possible values are: `5-dpd`, `10-dpd`, `50-dpd`, `100-dpd`, `125-dpd`, `250-dpd`

 Example training command: `sh train_baseline.sh -d 50-dpd`

@ -130,7 +132,7 @@ Generate belief states by running decode script
 ```shell
 sh decode_baseline.sh
 ```
-The generated predictions are saved under `OUTPUTS_DIR_BASELINE` folder. Some of the generated belief state predictions are uploaded to this repository and can found under [outputs](outputs) folder.
+The generated predictions are saved under `OUTPUTS_DIR_BASELINE` folder. Some generated belief state predictions are uploaded to this repository and can be found under [outputs](outputs) folder.

 ### Baseline Evaluation

@ -140,12 +142,13 @@ Edit the [evaluate.py](baseline/evaluate.py) to set the predictions output file
 ```shell
 python evaluate.py
 ```
-#### Preliminary results of baseline evaluation
+#### Results from baseline evaluation
 |data-split| JGA |
 |--|:--:|
+| 5-dpd | 9.06 |
+| 10-dpd | 14.20 |
 | 50-dpd | 28.64 |
 | 100-dpd | 33.11 |
 | 125-dpd | 35.79 |
 | 250-dpd | 40.38 |
 
-> Note: The above preliminary results will change based on further experiments  
--- a/outputs/baseline/10-dpd/output_test.json
+++ b/outputs/baseline/10-dpd/output_test.json
--- a/outputs/baseline/5-dpd/checkpoint-5000/output_test.json
+++ b/outputs/baseline/5-dpd/checkpoint-5000/output_test.json