Modified Proposal.tex file (prompt-ensemble training)

main
Pavan Mandava 3 years ago
parent 1da0fa1dad
commit 6c57ba87aa

Binary file not shown.

@ -46,7 +46,7 @@
\begin{figure}[h!] \begin{figure}[h!]
\centering \centering
\includegraphics[width=.3\linewidth]{images/ims_logo.jpeg} \includegraphics[width=.3\linewidth]{images/ims_logo.jpeg}\label{fig:ims_logo}
\end{figure} \end{figure}
\large{Institut f{\"u}r Maschinelle Sprachverarbeitung\\Universit{\"a}t Stuttgart\\Pfaffenwaldring 5b\\70569 Stuttgart}\\[0.3cm] \large{Institut f{\"u}r Maschinelle Sprachverarbeitung\\Universit{\"a}t Stuttgart\\Pfaffenwaldring 5b\\70569 Stuttgart}\\[0.3cm]

@ -242,3 +242,16 @@
doi = "10.18653/v1/2021.findings-acl.161", doi = "10.18653/v1/2021.findings-acl.161",
pages = "1835--1845", pages = "1835--1845",
} }
@inproceedings{schick2021pet,
title = "Few-Shot Text Generation with Natural Language Instructions",
author = {Schick, Timo and Sch{\"u}tze, Hinrich},
booktitle = "Proceedings of the 2021 Conference on Empirical Methods in Natural Language Processing",
month = nov,
year = "2021",
address = "Online and Punta Cana, Dominican Republic",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2021.emnlp-main.32",
doi = "10.18653/v1/2021.emnlp-main.32",
pages = "390--402",
abstract = "Providing pretrained language models with simple task descriptions in natural language enables them to solve some tasks in a fully unsupervised fashion. Moreover, when combined with regular learning from examples, this idea yields impressive few-shot results for a wide range of text classification tasks. It is also a promising direction to improve data efficiency in generative settings, but there are several challenges to using a combination of task descriptions and example-based learning for text generation. In particular, it is crucial to find task descriptions that are easy to understand for the pretrained model and to ensure that it actually makes good use of them; furthermore, effective measures against overfitting have to be implemented. In this paper, we show how these challenges can be tackled: We introduce GenPET, a method for text generation that is based on pattern-exploiting training, a recent approach for combining textual instructions with supervised learning that only works for classification tasks. On several summarization and headline generation datasets, GenPET gives consistent improvements over strong baselines in few-shot settings.",
}

@ -30,16 +30,17 @@ where $w$ is a decimal value (0,1) and can be used to adjust the influence of in
\paragraph{Evaluation Metrics} The standard metric joint goal accuracy (JGA) will be adopted to evaluate the belief state predictions. This metric compares all the predicted belief states to the ground-truth states for each turn. The prediction is correct only if all the predicted states match the ground-truth states. Both slots and values must match for the prediction to be correct. To omit the influence of value extraction, \citet{yang2022prompt} proposed JGA*, the accuracy is computed only for the belief states where the values are correctly identified. These evaluation metrics can answer the following questions: \textbf{Q1}: How do the prompt-based methods perform overall compared to SoTA \textsc{Soloist}? \textbf{Q2}: Can the prompt-based model perform better under the few-shot settings? \textbf{Q3}: Does JGA* has a better score than JGA? \paragraph{Evaluation Metrics} The standard metric joint goal accuracy (JGA) will be adopted to evaluate the belief state predictions. This metric compares all the predicted belief states to the ground-truth states for each turn. The prediction is correct only if all the predicted states match the ground-truth states. Both slots and values must match for the prediction to be correct. To omit the influence of value extraction, \citet{yang2022prompt} proposed JGA*, the accuracy is computed only for the belief states where the values are correctly identified. These evaluation metrics can answer the following questions: \textbf{Q1}: How do the prompt-based methods perform overall compared to SoTA \textsc{Soloist}? \textbf{Q2}: Can the prompt-based model perform better under the few-shot settings? \textbf{Q3}: Does JGA* has a better score than JGA?
\paragraph{Analyses of belief state predictions} The main goal of this task is to analyze belief state predictions. The predictions from \textsc{Soloist} baseline and prompt-based methods will be compared and analyzed to identify the improvements and drawbacks. A detailed error analyses will be performed on the wrong belief state predictions. \paragraph{Analyses of belief state predictions} The main goal of this task is to analyze belief state predictions. The predictions from \textsc{Soloist} baseline and prompt-based methods will be compared and analyzed to identify the improvements and drawbacks. A detailed error analyses will be performed on the wrong belief state predictions.
\vspace{-2pt}
\subsection{Multi-prompt learning methods} \label{task3} \subsection{Multi-prompt learning methods} \label{task3}
The \textit{value-based} prompt described in the previous sections utilize a \textit{single} prompt for making predictions. However, a significant body of research has demonstrated that the use of multiple prompts can further improve the efficacy of prompting methods \citep{liu2021ppp}. There are different ways to extend the single prompt learning to use multiple prompts. This task will explore three multi-prompt learning methods: \textit{Prompt ensembling}, \textit{Prompt augmentation}, and \textit{Prompt decomposition}. This task aims to answer the following questions - \textbf{Q1}: Can combining different \textit{multi-prompt} techniques help the PLM better understand the DST task? \textbf{Q2}: How do various hand-crafted prompt functions influence the prompt-based model? The \textit{value-based} prompt described in the previous sections utilize a \textit{single} prompt for making predictions. However, a significant body of research has demonstrated that the use of multiple prompts can further improve the efficacy of prompting methods \citep{liu2021ppp}. There are different ways to extend the single prompt learning to use multiple prompts. This task will explore three multi-prompt learning methods: \textit{Prompt ensembling}, \textit{Prompt augmentation}, and \textit{Prompt decomposition}. This task aims to answer the following questions - \textbf{Q1}: Can combining different \textit{multi-prompt} techniques help the PLM better understand the DST task? \textbf{Q2}: How do various hand-crafted prompt functions influence the prompt-based model?
\vspace{-4pt}
\paragraph{Prompt Ensembling} This method uses multiple \textit{unanswered} prompts during the inference time to make predictions \citep{liu2021ppp}. This idea can leverage the complementary advantages of different prompts and stabilize the performance on downstream tasks. \citet{yang2022prompt} applied prompt ensembling for the value-based prompt to effectively utilize four different prompts. A simple way for ensembling is to train a separate model for each prompt and generate the output by applying the weighted averaging on slot generation probability. The probability of slot $s_t$ can be calculated via: \paragraph{Prompt Ensembling} This method uses multiple \textit{unanswered} prompts during the inference time to make predictions \citep{liu2021ppp}. This idea can leverage the complementary advantages of different prompts and stabilize the performance on downstream tasks. \citet{yang2022prompt} applied prompt ensembling for the value-based prompt to effectively utilize four different prompts. \citet{schick2021pet} proposed training a single model with multiple prompts as it is much faster and more memory efficient than having to train a separate model for each prompt (and multiple models at inference time). The probability of slot $s_t$ for ensemble prompts can be calculated via:
$$ $$
P\left(s_{t} \mid c_{t}\right)=\sum_{k}^{|K|} \alpha_{k} * P\left(s_{t} \mid c_{t}, f_{k}\left(v_{t}\right)\right) P\left(s_{t} \mid c_{t}\right)=\sum_{k}^{|K|} \alpha_{k} * P\left(s_{t} \mid c_{t}, f_{k}\left(v_{t}\right)\right)
$$ $$
where $|K|$ represents the number of prompt functions, $f_{k}$ is the $k$-th prompt function, $\alpha_{k}$ is the weight of prompt $k$. This task will utilize prompt ensembling differently from \citet{yang2022prompt}, by combining other multi-prompt methods. In this task, experiments will be performed on various prompt templates to find the most effective and suitable prompts in combination with other multi-prompt methods. where $|K|$ represents the number of prompt functions, $f_{k}$ is the $k$-th prompt function, $\alpha_{k}$ is the weight of prompt $k$. This task will utilize prompt ensembling differently from \citet{yang2022prompt}, by combining other multi-prompt methods. In this task, experiments will be performed on various prompt templates to find the most effective and suitable prompts in combination with other multi-prompt methods.
\vspace*{-4pt}
\begin{table}[h!] \begin{table}[h!]
\centering \centering
\begin{tabular}{ c l } \begin{tabular}{ c l }
@ -52,8 +53,9 @@ where $|K|$ represents the number of prompt functions, $f_{k}$ is the $k$-th pro
\caption{Examples of different prompt functions for ensembling} \caption{Examples of different prompt functions for ensembling}
\label{table:1} \label{table:1}
\end{table} \end{table}
\vspace{-4pt}
\paragraph{Prompt Augmentation} \textit{Prompt Augmentation}, sometimes called \textit{demonstration learning} \citep{gao2021lmbff}, provides a few additional \textit{answered prompts} that can demonstrate to the PLM, how the actual prompt slot can be answered. Sample selection of answered prompts will be manually hand-picked from the training data. Experiments will be conducted on different sets of samples. Table \ref{table:2} provides an example for prompt augmentation. \paragraph{Prompt Augmentation} \textit{Prompt Augmentation}, sometimes called \textit{demonstration learning} \citep{gao2021lmbff}, provides a few additional \textit{answered prompts} that can demonstrate to the PLM, how the actual prompt slot can be answered. Sample selection of answered prompts will be manually hand-picked from the training data. Experiments will be conducted on different sets of samples. Table \ref{table:2} provides an example for prompt augmentation.
\vspace{-4pt}
\begin{table}[h!] \begin{table}[h!]
\centering \centering
\begin{tabular}{ r l } \begin{tabular}{ r l }
@ -65,7 +67,7 @@ where $|K|$ represents the number of prompt functions, $f_{k}$ is the $k$-th pro
\caption{Examples of prompt augmentation with answered prompts} \caption{Examples of prompt augmentation with answered prompts}
\label{table:2} \label{table:2}
\end{table} \end{table}
\vspace{-4pt}
\paragraph{Prompt Decomposition} For utterances where multiple slot values should be predicted, directly using a single prompt for generating multiple slots is challenging. One intuitive method is to breakdown the prompt into sub-prompts, and generate the slots for each sub-prompt separately. For each candidate value in the utterance, construct a \textit{value-based} prompt and generate the slot. This approach will be utilized in both training and testing phases. This sort of \textit{prompt decomposition} has been explored by \citet{cui2021template} for named entity recognition(NER) task. \paragraph{Prompt Decomposition} For utterances where multiple slot values should be predicted, directly using a single prompt for generating multiple slots is challenging. One intuitive method is to breakdown the prompt into sub-prompts, and generate the slots for each sub-prompt separately. For each candidate value in the utterance, construct a \textit{value-based} prompt and generate the slot. This approach will be utilized in both training and testing phases. This sort of \textit{prompt decomposition} has been explored by \citet{cui2021template} for named entity recognition(NER) task.
\begin{table}[h!] \begin{table}[h!]
\centering \centering
@ -79,4 +81,4 @@ where $|K|$ represents the number of prompt functions, $f_{k}$ is the $k$-th pro
\label{table:3} \label{table:3}
\end{table} \end{table}
\newpage

Binary file not shown.
Loading…
Cancel
Save