master-thesis

3.7 KiB

Raw Blame History

Table of Contents

Analysis of results and outputs

Baseline (SOLOIST)
Prompt-based Methods

Value-based prompt & Inverse prompt
destination vs departure & leave vs arrive
Repeated values
Multi-prompt methods
JGA and JGA* Scores

Value Extraction

Analysis of results and outputs

Baseline (SOLOIST)

The baseline SOLOIST is fine-tuned on different data splits to evaluate the performance of belief state predictions task under low-resource settings. As the results show that the baseline SOLOIST model did perform well when fine-tuned on relatively large data samples, however, it performed poorly under low-resource training data (esp. 25 & 50 dialogs).

The belief state prediction task of SOLOIST utilizes top-k and top-p sampling to generate the belief state slots and values. Since the baseline SOLOIST uses open-ended generation, it's susceptible to generating random slot-value pairs that are not relevant to the dialog history. Below is an example of how the baseline model generated a slot-value pair that's not relevant to user goals and it completely missed two correct slot-value pairs.

History	True belief states	Generated belief states
user: we need to find a guesthouse of moderate price. system: do you have any special area you would like to stay? or possibly a star request for the guesthouse? user: i would like it to have a 3 star rating.	type = guesthouse pricerange = moderate stars = 3	parking = yes stars = 3

Prompt-based Methods

Value-based prompt & Inverse prompt

Value-based prompt utilizes the dialog history and value to generate corresponding slots. This approach doesn't rely on the ontology of the slots. While training, both value-based prompts and inverse prompts are used to compute the training loss. The inverse prompt mechanism helped complementing the value-based prompt in generating the correct slots. It's worth mentioning that there's a 5-10% drop (depending on the data split trained on) in the JGA score when inverse prompt mechanism is not applied during training.

The experimental results show a significant difference in the performance between baseline SOLOIST and Prompt-based methods. Prompt-based methods significantly outperformed the baseline model under low-resource settings (5-dpd, 10-dpd and 50-dpd).

destination vs departure & leave vs arrive

Under low-resource settings, the prompt-based model struggled while generate slots like departure|destination and leave|arrive. For many instances, it wrongly generated destination instead of departure and vice-versa. Below is one example where slots are wrongly generated.

Dialog History	True belief states	Generated belief states
user: I need to be picked up from pizza hut city centre after 04:30	leave = 04:30 departure = pizza hut city centre	arrive = 04:30 destination = pizza hut city centre

Repeated values

Consider the following example:

Dialog History	True belief states
user: hi, can you help me find a 3 star place to stay? system: Is there a particular area or price range you would like? user: how about a place in the centre of town that is of type hotel system: how long would you like to stay, and how many are in your party? user: I'll be arriving saturday and staying for 3 nights. there are 3 of us.	area = centre stars = 3 type = hotel day = saturday people = 3 stay = 3

The repeated value 3 in the above example can lead to ambiguity for value-based prompt while generating the slots.

Multi-prompt methods

JGA and JGA* Scores

Higher JGA* scores suggest the current methods of extracting value candidates need improvements.

3.7 KiB Raw Blame History