Benchmarking datasets

Image classification, or, for that matter, any classification task, is inherently a supervised learning task. Supervised tasks learn about the different classes through the underlying training sets available.

Even though CNNs are optimized feed forward networks that share weights, the number of parameters to train in a deep ConvNet might be huge. This is one of the reasons why huge training sets are required to achieve better performing networks. Luckily, research groups across the globe have been working towards collecting, hand-annotating, and crowdsourcing different datasets. These datasets are utilized to benchmark performance of different algorithms, as well as to identify winners in different competitions.

The following is a brief listing of widely accepted benchmarking datasets in the field of image classification:

  • ImageNet: With over 14 million hand-annotated high-resolution colored images spanning 20,000 categories, this is a gold-standard visual dataset. It was designed for use in visual object identification tasks by the computer science department at Princeton University in 2009. Since then, this dataset (in its trimmed version of 1,000 non-overlapping classes) has been used as the basis of the ImageNet Large Scale Visual Recognition Challenge (https://arxiv.org/abs/1409.0575).
  • 80 Million Tiny Images dataset: As the name suggests, this MIT dataset contains 80 million images collected from the internet and tagged to more than 75,000 different non-abstract English nouns. This dataset also forms the basis for multiple other widely used datasets, including the CIFAR datasets.
  • CIFAR-10: Developed by the Canadian Institute for Advanced Research, CIFAR-10 is one of the most widely used datasets for machine learning (ML) research. This dataset contains 60,000 low-resolution images spanning across 10 non-overlapping classes.
  • CIFAR-100: From the same research group, this dataset contains 60,000 images evenly spread across 100 different classes.
  • Common Objects in Context: Common Object in Context (COCO) is a large-scale visual database for object identification, segmentation, and captioning. This dataset contains more than 200,000 labeled images spanning different classes.
  • Open Images: This is possibly one of biggest annotated datasets available for use. Version 4 of this dataset contains more than 9 million annotated images.
  • Caltech 101 and Caltech 256: These datasets contain annotated images spanning across 101 and 256 categories respectively. Caltech 101 contains around 9,000 images, while Caltech 256 contains close to 30,000.
  • Stanford Dog dataset: This is an interesting dataset specific to different dog breeds. It contains 20,000 colored images spanning across 120 different dog breeds.
  • MNIST: One of the most famous visual datasets of all time, MNIST has become the defacto, Hello, World dataset for ML enthusiasts. It contains over 60,000 hand-labeled digits (zero to nine).

The preceding list is just the tip of the iceberg. There are numerous other datasets capturing different aspects of the world. Preparing these datasets is a painful and time-consuming process, but these datasets are what make deep learning so successful in its current form. Readers are encouraged to explore these and other such datasets in detail to understand the nuances behind them and the challenges each of these datasets pose for us to solve. We will be utilizing some of these datasets in this and coming chapters to understand transfer learning concepts.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset