From 96a433bb7e1da16e259e80d0269a94415032f43a Mon Sep 17 00:00:00 2001 From: Pavan Mandava Date: Wed, 7 Sep 2022 12:18:58 +0530 Subject: [PATCH] Added README.md notes for MultiWOZ dataset --- README.md | 20 +++++++++++++++++--- 1 file changed, 17 insertions(+), 3 deletions(-) diff --git a/README.md b/README.md index 5c844a3..5af29dd 100644 --- a/README.md +++ b/README.md @@ -1,5 +1,19 @@ -# master-thesis +# Prompt-based methods for Dialog State Tracking + +Repository for my master thesis at the University of Stuttgart (IMS). + +Refer to this thesis [proposal](proposal/proposal_submission_1st.pdf) document for detailed explanation about thesis experiments. + +## Dataset +MultiWOZ 2.1 [dataset](https://github.com/budzianowski/multiwoz/blob/master/data/MultiWOZ_2.1.zip) is used for training and evaluation of the baseline/prompt-based methods. MultiWOZ is a fully-labeled dataset with a collection of human-human written conversations spanning over multiple domains and topics. Only single-domain dialogues are used in this setup for training and testing. Each dialogue contains multiple turns and may also contain a sub-domain *booking*. Five domains - *Hotel, Train, Restaurant, Attraction, Taxi* are used in the experiments and excluded the other two domains as they only appear in the training set. Under few-shot settings, only a portion of the training data is utilized to measure the performance of the DST task in a low-resource scenario. Dialogues are randomly picked for each domain. The below table contains some statistics of the dataset and data splits for the few-shot experiments. -Repository for my master thesis at the University of Stuttgart (IMS) +| Data Split | # Dialogues | # Total Turns | +|--|:--:|:--:| +| 50-dpd | 250 | 1114 | +| 100-dpd | 500 | 2292 | +| 125-dpd | 625 | 2831 | +| 250-dpd | 1125 | 5187 | +| valid | 190 | 900 | +| test | 193 | 894 | -// TODO :: Add commands for training, testing & evaluation here \ No newline at end of file +In the above table, term "*dpd*" refers to "*dialogues per domain*". For example, *50-dpd* means *50 dialogues per each domain*. \ No newline at end of file