Question-Answering datasets can have differences based on the form in which a response or answer is required. We will briefly summarize a few popular academic datasets for QA along with their key characteristics:
Dataset name | Description | Category | URL |
bAbI text understanding tasks | This suite of 20 synthetically generated tasks was aimed to test some fundamental skills that models for NLU should possess. Each task trains a model to answer a question on the state of its environment based on a paragraph where various actions are taken in the environment. | Answer selection | https://research.fb.com/downloads/babi/ |
SQuAD: Stanford Question Answering Dataset | SQuAD contains questions associated with Wikipedia articles, and requires the model to select an answer span in the article itself as an answer to the question. It is the most popular QA dataset today. | Answer spanning | https://stanford-qa.com/ |
VQA: Visual Question Answering Dataset | In VQA, the input to be reasoned over is an image instead of text. The model must learn to reason over pixels to select answers to textual questions about the image. | Answer selection | http://www.visualqa.org/ |
AI2 Reasoning Challenge | The ARC dataset contains science multiple choice questions to select answers from. It was specially designed to expose the shortcomings of recent neural network models that claim to do language understanding for easy datasets such as SQuAD and bAbI! | Multiple choice | http://data.allenai.org/arc/ |