master-thesis/proposal/latex/sections/01_motivation.tex

\section{Motivation}

\paragraph{} Dialog State Tracking (DST) is an essential module in dialog systems, it tracks the user goals  in the form of dialog states given the entire dialog history. In dialog systems, ``dialog states'' - sometimes also called ``belief states'' contains a set of \textit{(slot, value)} pairs for each turn of the dialog history. Existing data-driven methods and neural models for individual dialog modules (NLU, DST, NLG) and end-to-end dialog systems show promising results, but they need large amounts of task-specific training data, which is rarely available for new tasks. For task-specific DST, collecting dialog state labels can also be costly, requiring domain experts to indicate all possible \textit{(slot, value)} pairs for each turn of the dialogues. A typical task-oriented dialog system contains an ontology for each domain, with a pre-defined set of slots and all possible values for each domain. In real-world applications, defining all possible slots and values for DST is difficult due to new rising domains and users' continuous needs.

\paragraph{} Prompt-based learning \textit{(``pre-train, prompt, and predict'')} is a new paradigm in NLP which aims to predict the probability of text directly from the pre-trained LM \citep{liu2021ppp}. This framework is powerful as it allows the language model to be \textit{pre-trained} on massive amounts of raw text, and by defining a new prompting function the model can perform \textit{few-shot} or even \textit{zero-shot} learning. The large pre-trained language models (PLMs) are supposed to be useful in \textit{few-shot} scenarios where task-related training data is limited, as they can be ``probed'' for task-related knowledge efficiently by using a prompt. One example of such large pre-trained language models is GPT-3 \citep{brown2020gpt3} - \textit{``Language Models are Few-Shot Learners''}. \citet{madotto2021fsb} created an end-to-end chatbot (Few-Shot Bot) using \textit{prompt-based few-shot learning} (no gradient fine-tuning) and achieved comparable results to those of state-of-the-art. Prompt-based learning for few-shot DST with limited labeled domains is still under-explored.

\paragraph{} Recently, \citet{yang2022prompt} proposed a new prompt learning framework for few-shot DST. This work designed a \textit{value-based prompt} and an \textit{inverse prompt} mechanism to efficiently train a DST model for domains with limited training data. This approach doesn't depend on the ontology of slots and the results show that it can generate unseen slots and outperforms the existing state-of-the-art few-shot methods. The goal of this thesis is to further explore this prompt-based few-shot learning framework for DST by implementing these three tasks: (1) Prompt learning framework for few-shot DST - reproduce the results from \citet{yang2022prompt}. Can the DST knowledge be probed from PLM? (2) Evaluation and analyses of belief state predictions. This task will answer what improvements can be observed from prompt-based methods and the drawbacks of this approach. (3) Extend this prompt-based DST framework to utilize various \textit{multi-prompt} learning methods. Can different \textit{multi-prompt} techniques help the PLM better understand the DST task? These research methods are formally described in the later sections of this proposal.