Added more results to Markdown

main
Pavan Mandava 3 years ago
parent 5b1e2e4938
commit 7b80bec2dc

@ -10,7 +10,7 @@ The baseline SOLOIST is fine-tuned on different data splits to evaluate the perf
The belief state prediction task of SOLOIST utilizes *top-k* and *top-p* sampling to generate the belief state slots and values. Since the baseline SOLOIST uses open-ended generation, it's susceptible to generating random slot-value pairs that are not relevant to the dialog history. Below is an example of how the baseline model generated a slot-value pair that's not relevant to user goals and it completely missed two correct slot-value pairs. The belief state prediction task of SOLOIST utilizes *top-k* and *top-p* sampling to generate the belief state slots and values. Since the baseline SOLOIST uses open-ended generation, it's susceptible to generating random slot-value pairs that are not relevant to the dialog history. Below is an example of how the baseline model generated a slot-value pair that's not relevant to user goals and it completely missed two correct slot-value pairs.
| Dialog History | True belief states | Generated belief states | | Dialog History | True belief states | Generated belief states |
| ----- | ----- | ----- | |----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|-----------------------------------------------------------|-----------------------------|
| **user:** we need to find a guesthouse of moderate price.<br />**system:** do you have any special area you would like to stay?<br/>or possibly a star request for the guesthouse?<br />**user:** i would like it to have a 3 star rating. | type = guesthouse<br/>pricerange = moderate<br/>stars = 3 | parking = yes<br/>stars = 3 | | **user:** we need to find a guesthouse of moderate price.<br />**system:** do you have any special area you would like to stay?<br/>or possibly a star request for the guesthouse?<br />**user:** i would like it to have a 3 star rating. | type = guesthouse<br/>pricerange = moderate<br/>stars = 3 | parking = yes<br/>stars = 3 |
@ -18,7 +18,7 @@ The belief state prediction task of SOLOIST utilizes *top-k* and *top-p* samplin
### Prompt-based Methods ### Prompt-based Methods
#### Value-based prompt & Inverse prompt #### Value-based prompt & Inverse prompt
Value-based prompt utilizes the dialog history and value to generate corresponding slots. This approach doesn't rely on the ontology of the slots. While training, both value-based prompts and inverse prompts are used to compute the training loss. The inverse prompt mechanism helped complementing the value-based prompt in generating the correct slots. It's worth mentioning that there's a 5-10% drop (depending on the data split trained on) in the JGA score when inverse prompt mechanism is not applied during training. Value-based prompt utilizes the dialog history and value to generate corresponding slots. This approach doesn't rely on the ontology of the slots. While training, both value-based prompts and inverse prompts are used to compute the training loss. The inverse prompt mechanism helped to complement the value-based prompt in generating the correct slots, especially under low-resource data splits.
The experimental results show a significant difference in the performance between baseline SOLOIST and Prompt-based methods. Prompt-based methods significantly outperformed the baseline model under low-resource settings (*5-dpd*, *10-dpd* and *50-dpd*). The experimental results show a significant difference in the performance between baseline SOLOIST and Prompt-based methods. Prompt-based methods significantly outperformed the baseline model under low-resource settings (*5-dpd*, *10-dpd* and *50-dpd*).
@ -26,14 +26,14 @@ The experimental results show a significant difference in the performance betwee
Under low-resource settings, the prompt-based model struggled while generate slots like *departure*|*destination* and *leave*|*arrive*. For many instances, it wrongly generated *destination* instead of *departure* and vice-versa. Below is one example where slots are wrongly generated. Under low-resource settings, the prompt-based model struggled while generate slots like *departure*|*destination* and *leave*|*arrive*. For many instances, it wrongly generated *destination* instead of *departure* and vice-versa. Below is one example where slots are wrongly generated.
| Dialog History | True belief states | Generated belief states | | Dialog History | True belief states | Generated belief states |
|-------------------------------------------------------------------------| ----- | ----- | |-------------------------------------------------------------------------|-----------------------------------------------------|--------------------------------------------------------|
| **user:** I need to be picked up from pizza hut city centre after 04:30 | leave = 04:30<br/>departure = pizza hut city centre | arrive = 04:30<br/>destination = pizza hut city centre | | **user:** I need to be picked up from pizza hut city centre after 04:30 | leave = 04:30<br/>departure = pizza hut city centre | arrive = 04:30<br/>destination = pizza hut city centre |
#### Repeated values #### Repeated values
Since value-based prompt generates slots from corresponding values, it can't generate slots for repeated values. Only one slot can be generated for the repeated values. Consider the following example: Since value-based prompt generates slots from corresponding values, it can't generate slots for repeated values. Only one slot can be generated for the repeated values. Consider the following example:
| Dialog History | True belief states | | Dialog History | True belief states |
| ----- | ----- | |--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------|
| **user:** hi, can you help me find a 3 star place to stay?<br />**system:** Is there a particular area or price range you would like?<br />**user:** how about a place in the centre of town that is of type hotel<br />**system:** how long would you like to stay, and how many are in your party?<br />**user:** I'll be arriving saturday and staying for 3 nights. there are 3 of us. | area = centre<br/>stars = 3<br/>type = hotel<br />day = saturday<br/>people = 3<br/>stay = 3 | | **user:** hi, can you help me find a 3 star place to stay?<br />**system:** Is there a particular area or price range you would like?<br />**user:** how about a place in the centre of town that is of type hotel<br />**system:** how long would you like to stay, and how many are in your party?<br />**user:** I'll be arriving saturday and staying for 3 nights. there are 3 of us. | area = centre<br/>stars = 3<br/>type = hotel<br />day = saturday<br/>people = 3<br/>stay = 3 |
The repeated value `3` in the above example can only generate one slot using value-based prompt, as the word with the highest probability is picked as the generated slot. This suggests that the existing annotations for beleif states doesn't work well with value-based prompt. The repeated value `3` in the above example can only generate one slot using value-based prompt, as the word with the highest probability is picked as the generated slot. This suggests that the existing annotations for beleif states doesn't work well with value-based prompt.

@ -260,14 +260,8 @@ python evaluate.py -o path/to/outputs/file
``` ```
### Results from prompt-based belief state generations ### Results from prompt-based belief state generations
|data-split| JGA | JGA* |
|--|:--:|:--:| <table> <tr> <th> </th> <th colspan="2">w = 0.1</th> <th colspan="2">w = 0.3</th> <th colspan="2">w = 0.5</th> <th colspan="2">w = 0.7</th> </tr> <tr> <th>Dataset</th> <th>JGA</th> <th>JGA*</th> <th>JGA</th> <th>JGA*</th> <th>JGA</th> <th>JGA*</th> <th>JGA</th> <th>JGA*</th> </tr> <tr> <td>5-dpd</td> <td>30.66</td> <td>71.04</td> <td>31.67</td> <td>73.19</td> <td>30.77</td> <td>72.85</td> <td>29.98</td> <td>70.93</td> </tr> <tr> <td>10-dpd</td> <td>42.65</td> <td>86.43</td> <td>41.18</td> <td>83.48</td> <td>40.05</td> <td>80.77</td> <td>40.38</td> <td>85.18</td> </tr> <tr> <td>50-dpd</td> <td>47.06</td> <td>91.63</td> <td>46.49</td> <td>91.18</td> <td>47.04</td> <td>91.18</td> <td>46.27</td> <td>90.05</td> </tr> <tr> <td>100-dpd</td> <td>47.74</td> <td>92.31</td> <td>48.42</td> <td>92.42</td> <td>48.19</td> <td>92.65</td> <td>48.3</td> <td>92.65</td> </tr> <tr> <td>125-dpd</td> <td>46.49</td> <td>91.86</td> <td>46.15</td> <td>91.18</td> <td>46.83</td> <td>91.74</td> <td>46.15</td> <td>90.95</td> </tr> <tr> <td>250-dpd</td> <td>47.06</td> <td>92.08</td> <td>47.62</td> <td>92.65</td> <td>47.4</td> <td>92.31</td> <td>47.17</td> <td>92.09</td> </tr> </table>
| 5-dpd | 30.66 | 71.04 |
| 10-dpd | 42.65 | 86.43 |
| 50-dpd | 47.06 | 91.63 |
| 100-dpd | **47.74** | **92.31** |
| 125-dpd | 46.49 | 91.86 |
| 250-dpd | 47.06 | 92.08 |
> **Note:** All the generated output files for the above reported results are available in this repository. Check [outputs/prompt-learning](outputs/prompt-learning) directory to see the output JSON files for each data-split. > **Note:** All the generated output files for the above reported results are available in this repository. Check [outputs/prompt-learning](outputs/prompt-learning) directory to see the output JSON files for each data-split.
@ -306,6 +300,18 @@ Script for generating belief states (slots) using prompt-ensemble remains the sa
sh test_prompting.sh -m <saved-model-path> sh test_prompting.sh -m <saved-model-path>
``` ```
#### Results from Prompt Ensembling
| Dataset | JGA | JGA* |
|---------|-------|-------|
| 5-dpd | 30.09 | 69.23 |
| 10-dpd | 42.84 | 86.99 |
| 50-dpd | 47.62 | 91.74 |
| 100-dpd | 48.08 | 93.10 |
| 125-dpd | 46.96 | 92.08 |
| 250-dpd | 48.30 | 93.44 |
### Prompt Augmentation ### Prompt Augmentation
Prompt Augmentation, also called *demonstration learning*, provides a few additional *answered prompts* that can demonstrate to the PLM, how the actual prompt slot can be answered. Sample selection of answered prompts are hand-crafted and hand-picked manually. Experiments are performed on different sets of *answered prompts*. Prompt Augmentation, also called *demonstration learning*, provides a few additional *answered prompts* that can demonstrate to the PLM, how the actual prompt slot can be answered. Sample selection of answered prompts are hand-crafted and hand-picked manually. Experiments are performed on different sets of *answered prompts*.
@ -315,20 +321,13 @@ Edit the [test_prompting.sh](prompt-learning/test_prompting.sh) file and add `--
sh test_prompting.sh -m <tuned-prompt-model-path> sh test_prompting.sh -m <tuned-prompt-model-path>
``` ```
### Results from multi-prompt methods #### Results from Prompt Augmentation
|data-split| JGA | JGA* | <table> <tr> <th></th> <th colspan="2">Sample 1</th> <th colspan="2">Sample 2</th> </tr>
|--|:--:|:--:| <tr> <th>Data</th> <th>JGA</th> <th>JGA*</th> <th>JGA</th> <th>JGA*</th> </tr> <tr> <td>5-dpd</td> <td>26.02</td> <td>58.6</td> <td>27.6</td> <td>59.39</td> </tr> <tr> <td>10-dpd</td> <td>33.26</td> <td>70.14</td> <td>34.95</td> <td>77.94</td> </tr> <tr> <td>50-dpd</td> <td>38.8</td> <td>71.38</td> <td>39.77</td> <td>74.55</td> </tr> <tr> <td>100-dpd</td> <td>35.97</td> <td>70.89</td> <td>38.46</td> <td>74.89</td> </tr> <tr> <td>125-dpd</td> <td>36.09</td> <td>73.08</td> <td>36.18</td> <td>76.47</td> </tr> <tr> <td>250-dpd</td> <td>35.63</td> <td>72.9</td> <td>38.91</td> <td>76.7</td> </tr> </table>
| 5-dpd | 30.09 | 69.23 |
| 10-dpd | 42.84 | 86.99 |
| 50-dpd | 47.62 | 91.74 |
| 100-dpd | **48.08** | **92.87** |
| 125-dpd | 46.96 | 92.08 |
| 250-dpd | **48.08** | **92.87** |
> **Note:** All the generated output files for the above reported results are available in this repository. Check [outputs/multi-prompt](outputs/multi-prompt) directory to see the output JSON files for each data-split. > **Note:** All the generated output files for the above reported results are available in this repository. Check [outputs/multi-prompt](outputs/multi-prompt) directory to see the output JSON files for each data-split.
## Analysis ## Analysis
Analyses of the results and belief state generations (outputs) can be found [here](ANALYSIS.md). Analyses of the results and belief state generations (outputs) can be found [here](ANALYSIS.md).
Loading…
Cancel
Save