Dialog datasets

Dialog tasks are generally divided into two broad categories: open-ended conversation (also known as chit-chat) and goal-oriented systems.

Open-ended dialog systems generally deal with conversing on unrestricted subjects and are trained using large scale corpuses from Twitter conversations, reddit replies, or similar forum posts. Since most open-ended tasks require the generation of responses, most models use the seq2seq framework, similar to machine translation or text summarization, and are evaluated using a combination of translation metrics (such as BLEU score) and human evaluation.

The key challenges involved in building these neural conversational models besides language modelling and generation are the lack of consistent personality, as the models are trained on many dialogs with different speakers, and have the tendency to produce non-committal answers (such as I don't know) to every utterance.

Goal-oriented dialog systems, on the other hand, are designed for extremely specific interactions between users and bots, such as customer care services, restaurant reservations, movie bookings, or other concierge services. They are usually evaluated on their ability to predict the dialog state by slot filling or to select the most appropriate response at each turn of dialog.

The key challenge for goal-oriented systems is combining prior knowledge, conversation history, and context to meet the goals set for them. Hence, the most common architectures involve extending QA models to conduct dialogs, as discussed in the previous section.

Table of Contents for Dialog datasets

Create new playlist

Sign In

Sign Up

Table of Contents for
Dialog datasets