diff --git a/ANALYSIS.md b/ANALYSIS.md index 2e4c516..e404b84 100644 --- a/ANALYSIS.md +++ b/ANALYSIS.md @@ -9,9 +9,9 @@ The baseline SOLOIST is fine-tuned on different data splits to evaluate the perf The belief state prediction task of SOLOIST utilizes *top-k* and *top-p* sampling to generate the belief state slots and values. Since the baseline SOLOIST uses open-ended generation, it's susceptible to generating random slot-value pairs that are not relevant to the dialog history. Below is an example of how the baseline model generated a slot-value pair that's not relevant to user goals and it completely missed two correct slot-value pairs. -| History | True belief states | Generated belief states | +| Dialog History | True belief states | Generated belief states | | ----- | ----- | ----- | -| **user:** we need to find a guesthouse of moderate price.
**system:** do you have any special area you would like to stay?
or possibly a star request for the guesthouse?
**user:** i would like it to have a 3 star rating. | type = guesthouse
pricerange = moderate
stars = 3 | parking = yes
stars = 3 | +| **user:** we need to find a guesthouse of moderate price.
**system:** do you have any special area you would like to stay?
or possibly a star request for the guesthouse?
**user:** i would like it to have a 3 star rating. | type = guesthouse
pricerange = moderate
stars = 3 | parking = yes
stars = 3 | @@ -30,15 +30,16 @@ Under low-resource settings, the prompt-based model struggled while generate slo | **user:** I need to be picked up from pizza hut city centre after 04:30 | leave = 04:30
departure = pizza hut city centre | arrive = 04:30
destination = pizza hut city centre | #### Repeated values -Consider the following example: +Since value-based prompt generates slots from corresponding values, it can't generate slots for repeated values. Only one slot can be generated for the repeated values. Consider the following example: | Dialog History | True belief states | | ----- | ----- | | **user:** hi, can you help me find a 3 star place to stay?
**system:** Is there a particular area or price range you would like?
**user:** how about a place in the centre of town that is of type hotel
**system:** how long would you like to stay, and how many are in your party?
**user:** I'll be arriving saturday and staying for 3 nights. there are 3 of us.| area = centre
stars = 3
type = hotel
day = saturday
people = 3
stay = 3| -The repeated value `3` in the above example can lead to ambiguity for value-based prompt while generating the slots. +The repeated value `3` in the above example can only generate one slot using value-based prompt, as the word with the highest probability is picked as the generated slot. This suggests that the existing annotations for beleif states doesn't work well with value-based prompt. #### Multi-prompt methods +After applying multi-prompt methods like *prompt ensemble* and *prompt augmentation*, the results are similar with just a minor improvement in the JGA scores. Different samples of prompts and answered prompts are applied to value-based prompt, while some yield good results, the others add bias while generating slots and degrade the performance. #### JGA and JGA* Scores Higher JGA* scores suggest the current methods of extracting value candidates need improvements.