\section{Background \& Related Work}

\subsection{Dialog State Tracking (DST)}

\paragraph{} Task-oriented dialog systems, both modular and end-to-end systems, solve a wide range of tasks (ticket booking, restaurant booking, etc.) across different domains. Since task-oriented dialog systems require strict response constraints as they aim to accurately handle the user messages, modular systems were proposed to generate responses in a controllable way. A typical modular-based system uses a modular pipeline, which has four modules that execute sequentially - Natural Language Understanding (NLU), Dialog State Tracking (DST), Policy Learning (POL), and Natural Language Generation (NLG). In this thesis, the focus is on the DST module of the modular-based dialog system. The Dialog state tracker infers the belief states (or user goals) from every turn of the dialog history and provides this information to the next module. For example, consider the user message - \textit{``Plan a train trip to Berlin this Friday''} - the DST Module is supposed to extract belief states (\textit{slot, value}) pairs as follows: \{(\textit{destination, Berlin}), (\textit{day, this Friday})\}.

\subsection{Pre-trained Language Models (PLMs)}

\paragraph{} Large pre-trained language models are trained on huge amounts of textual data and are used to solve a variety of NLP tasks. Pre-trained transformer-based language models such as BERT \citep{devlin2019bert} and GPT \citep{radford2018gpt} have achieved state-of-the-art performance on many tasks. GPT-2 \citep{radford2019gpt2} is a state-of-the-art auto-regressive language model trained on large amounts of open web text data. GPT-2 is trained with a simple objective to predict the next word, given all previous words within some text. The training objective of the pre-trained LMs plays an important role in determining its applicability to particular prompting tasks \citep{liu2021ppp}. For example, left-to-right auto-regressive LMs may be particularly suitable for \textit{prefix} prompts.

\paragraph{} The baseline model of this thesis \textsc{Soloist} \citep{peng2021soloist} uses a 12-layer GPT-2. \textsc{Soloist} uses the publicly available 117M-parameter GPT-2 as initialization for task-grounded pre-training. The prompt learning model of this thesis will use the \textsc{Soloist} to learn the DST task.

\subsection{SOLOIST}

\paragraph{} \textsc{Soloist} \citep{peng2021soloist} is a task-oriented dialog system that uses transfer learning and machine teaching to build task bots at scale. \textsc{Soloist} uses the \textit{pre-train, fine-tune} paradigm for building end-to-end dialog systems using a transformer-based auto-regressive language model GPT-2 \citep{radford2019gpt2}, which subsumes different dialog modules (i.e., NLU, DST, POL, NLG) into a single model. In a \textit{pre-train, fine-tune} paradigm, a fixed \textit{pre-trained} LM is adapted to different downstream tasks by introducing additional parameters and \textit{fine-tuning} them using task-specific objective functions. In the pre-training stage, \textsc{Soloist} is initialized with the 12-layer GPT-2 (117M parameters) and further trained on large heterogeneous dialog corpora. The primary goal at this stage is to learn task completion skills such as DST and POL. Belief state prediction (DST) is one of the tasks in the task-grounded pre-training, which will be utilized in this thesis. In the fine-tuning stage, the pre-trained \textsc{Soloist} model can be used to solve new tasks by just using a handful of task-specific dialogs. 

\paragraph{} In this thesis, the pre-trained \textsc{Soloist} will be utilized in the baseline model. In the fine-tuning stage, a multi-domain task-oriented dialog dataset will be applied to solve the belief state prediction task. The predictions and results from this task can be used to compare with the prompt-based model for detailed analyses.

\subsection{\textit{Pre-train, Prompt, and Predict (PPP)} Paradigm}

\paragraph{} Prompt-based learning (also dubbed \textit{``pre-train, prompt, and predict''}) is a new paradigm that aims to utilize PLMs more efficiently to solve downstream NLP tasks \citep{liu2021ppp}. In this paradigm, instead of adapting pre-trained LMs to downstream tasks via objective engineering, downstream tasks are reformulated to look more like those solved during the original LM training with the help of a textual \textit{prompt}. To perform prediction tasks, the original input $x$ is modified using a \textit{template} into a textual \textit{prompt} $x^{\prime}$ that has some unfilled slots, and then the PLM is used to probabilistically fill the unfilled information to obtain a final string $\hat{x}$, from which the final output $y$ can be derived. 

\paragraph{} For example, to recognize the emotion in the text, where \textit{input} $x = $``I missed the bus today.'', the \textit{template} may take the form such as ``$[X]$ I felt so $[Z]$''. Then, \textit{prompt} $x^{\prime}$ would become ``I missed the bus today. I felt so $[Z]$'' and ask the PLM to fill the slot $[Z]$ with an emotion-bearing word. There are two main varieties of prompts: \textit{cloze prompts}, where the slot $[Z]$ is to be filled in the middle of the text, and \textit{prefix prompts}, where the input text comes entirely before $[Z]$. In general, for tasks that are being solved using a standard auto-regressive LM, prefix prompts tend to be more helpful, as they mesh well with the left-to-right nature of the model. 

\paragraph{} In this way, by selecting the appropriate prompts, the pre-trained LM can be used to predict the desired output, sometimes even without any additional task-specific training. In this thesis, prompt-based methods will be utilized to train and help PLM understand the DST task.

\subsection{Prompt learning for DST}

\paragraph{} Existing work by \citet{lee2021sdp} uses slots as prompts, along with the natural language descriptions of the schema for generating corresponding values. This slot-based prompt DST approach uses encoder-decoder LM with a bi-directional encoder. This method relies on the known ontology of the slots and requires a lot of training data for fine-tuning PLM. In real-world applications, defining all possible slots is difficult due to new domains and users' continuous needs. \citet{yang2022prompt} proposed a new prompt learning framework that uses values as prompts and doesn't rely on the ontology of the slots. This thesis will apply the value-based prompt approach for few-shot DST.