Description of the image dataset

For such a challenge we need to have a real dataset. Don't worry, there are several platforms where such datasets are publicly available or can be downloaded with some terms and conditions. One such platform is Kaggle, which provides a platform for data analytics and ML practitioners to try ML challenges and win prizes. The Yelp dataset and the description can be found at: https://www.kaggle.com/c/yelp-restaurant-photo-classification.

The labels of the restaurants are manually selected by Yelp users when they submit a review. There are nine different labels annotated by the Yelp community associated in the dataset:

  • 0: good_for_lunch
  • 1: good_for_dinner
  • 2: takes_reservations
  • 3: outdoor_seating
  • 4: restaurant_is_expensive
  • 5: has_alcohol
  • 6: has_table_service
  • 7: ambience_is_classy
  • 8: good_for_kids

So we need to predict these labels as accurately as possible. One thing to be noted is that since Yelp is a community-driven website, there are duplicated images in the dataset for several reasons. For example, users can accidentally upload the same photo to the same business more than once, or chain businesses can upload the same photo to different branches. There are six files in the dataset, as follows:

  • train_photos.tgz: Photos to be used as the training set (234,545 images)
  • test_photos.tgz: Photos to be used as the test set (500 images)
  • train_photo_to_biz_ids.csv: Provides the mapping between the photo ID to business ID (234,545 rows)
  • test_photo_to_biz_ids.csv: Provides the mapping between the photo ID to business ID ( 500 rows)
  • train.csv: This is the main training dataset including business IDs, and their corresponding labels (1996 rows)
  • sample_submission.csv: A sample submission—reference correct format for your predictions including business_id and the corresponding predicted labels
..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset