parent
2d8e82ba51
commit
28a7b11edf
@ -1,3 +1,27 @@
|
||||
# Analysis of results and outputs
|
||||
---
|
||||
gitea: none
|
||||
include_toc: true
|
||||
---
|
||||
## Analysis of results and outputs
|
||||
|
||||
// TODO
|
||||
### Baseline (SOLOIST)
|
||||
The baseline SOLOIST is fine-tuned on different data splits to evaluate the performance of belief state predictions task under low-resource settings. As the results show that the baseline SOLOIST model did perform well when *fine-tuned* on relatively large data samples, however, it performed poorly under low-resource training data (esp. 25 & 50 dialogs).
|
||||
|
||||
The belief state prediction task of SOLOIST utilizes *top-k* and *top-p* sampling to generate the belief state slots and values. Since the baseline SOLOIST uses open-ended generation, it's susceptible to generating random slot-value pairs that are not relevant to the dialog history. Below is an example of how the baseline model generated a slot-value pair that's not relevant to user goals and it completely missed two correct slot-value pairs.
|
||||
|
||||
| History | True belief states | Generated belief states |
|
||||
| ----- | ----- | ----- |
|
||||
| **user:** we need to find a guesthouse of moderate price. <br />**system:** do you have any special area you would like to stay?<br/>or possibly a star request for the guesthouse?<br />**user:** i would like it to have a 3 star rating. | type = guesthouse<br/>pricerange = moderate<br/>stars = 3 | parking = yes<br/>stars = 3 |
|
||||
|
||||
|
||||
### Prompt-based Methods
|
||||
|
||||
#### Value-based prompt
|
||||
|
||||
#### destination vs departure
|
||||
|
||||
#### Duplicate values
|
||||
|
||||
#### Multi-prompt methods
|
||||
|
||||
### Value Extraction
|
||||
Loading…
Reference in new issue