Question-Answering datasets

Question-Answering datasets can have differences based on the form in which a response or answer is required. We will briefly summarize a few popular academic datasets for QA along with their key characteristics:

Dataset name Description Category URL
bAbI text understanding tasks This suite of 20 synthetically generated tasks was aimed to test some fundamental skills that models for NLU should possess. Each task trains a model to answer a question on the state of its environment based on a paragraph where various actions are taken in the environment. Answer selection https://research.fb.com/downloads/babi/
SQuAD: Stanford Question Answering Dataset SQuAD contains questions associated with Wikipedia articles, and requires the model to select an answer span in the article itself as an answer to the question. It is the most popular QA dataset today. Answer spanning https://stanford-qa.com/
VQA: Visual Question Answering Dataset In VQA, the input to be reasoned over is an image instead of text. The model must learn to reason over pixels to select answers to textual questions about the image. Answer selection http://www.visualqa.org/
AI2 Reasoning Challenge The ARC dataset contains science multiple choice questions to select answers from. It was specially designed to expose the shortcomings of recent neural network models that claim to do language understanding for easy datasets such as SQuAD and bAbI! Multiple choice http://data.allenai.org/arc/
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset