diff --git a/ANALYSIS.md b/ANALYSIS.md index db39dbb..5e62e12 100644 --- a/ANALYSIS.md +++ b/ANALYSIS.md @@ -1,3 +1,27 @@ -# Analysis of results and outputs +--- +gitea: none +include_toc: true +--- +## Analysis of results and outputs -// TODO \ No newline at end of file +### Baseline (SOLOIST) +The baseline SOLOIST is fine-tuned on different data splits to evaluate the performance of belief state predictions task under low-resource settings. As the results show that the baseline SOLOIST model did perform well when *fine-tuned* on relatively large data samples, however, it performed poorly under low-resource training data (esp. 25 & 50 dialogs). + +The belief state prediction task of SOLOIST utilizes *top-k* and *top-p* sampling to generate the belief state slots and values. Since the baseline SOLOIST uses open-ended generation, it's susceptible to generating random slot-value pairs that are not relevant to the dialog history. Below is an example of how the baseline model generated a slot-value pair that's not relevant to user goals and it completely missed two correct slot-value pairs. + +| History | True belief states | Generated belief states | +| ----- | ----- | ----- | +| **user:** we need to find a guesthouse of moderate price.
**system:** do you have any special area you would like to stay?
or possibly a star request for the guesthouse?
**user:** i would like it to have a 3 star rating. | type = guesthouse
pricerange = moderate
stars = 3 | parking = yes
stars = 3 | + + +### Prompt-based Methods + +#### Value-based prompt + +#### destination vs departure + +#### Duplicate values + +#### Multi-prompt methods + +### Value Extraction \ No newline at end of file