The train/test dataset split

For the time being, be aware that we need to split our dataset into two sets: training and test. As mentioned in Chapter 1Setup and Introduction to TensorFlow, this needs to be done because we need to somehow check whether the model is able to generalize out of its own training samples (whether it's able to correctly recognize images that it has never seen during training). If our model can't do this, it isn't of much use to us.

Here are a few other important points to remember:

  • The training and testing data must come from the same distribution (so combine and shuffle all your data before splitting)
  • The training set is often bigger than the test set (for instance, training: 70% of total, testing: 30% of total).

For the examples that we will deal with in these early chapters, these basics will be enough, but in subsequent chapters, we will see, in further detail, how to properly set up your dataset for bigger projects.

..................Content has been hidden....................

You can't read the all page of ebook, please click here login for view all page.
Reset