In the following code, we will retain the analysis we have done until step 6 in Scenario 1. However, the only change will be the model architecture; in the following model architecture, we have more aggressive pooling than what we used in Scenario 1.
In the following architecture, having a bigger window of pooling in each layer ensures that we capture the activations in a larger area compared to the scenario of having lower pool sizes. The architecture of the model is as follows:
model = Sequential()
model.add(Conv2D(64, kernel_size=(3, 3), activation='relu',input_shape=(300,300,3)))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Conv2D(128, kernel_size=(3, 3), activation='relu',padding='same'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Conv2D(256, kernel_size=(3, 3), activation='relu',padding='same'))
model.add(MaxPooling2D(pool_size=(3, 3)))
model.add(Conv2D(512, kernel_size=(3, 3), activation='relu',padding='same'))
model.add(Flatten())
model.add(Dense(100, activation='relu'))
model.add(Dense(1, activation='sigmoid'))
model.summary()
Note that in this architecture, the pool size is 3 x 3 and not 2 x 2, as we had in the previous scenario:
Once we fit a model on the input and output arrays, the variation of accuracy and loss on the train and test datasets is as follows:
model.compile(loss='binary_crossentropy',optimizer='adam',metrics=['accuracy'])
history = model.fit(X_train, y_train, batch_size=32,epochs=10,verbose=1,validation_data = (X_test, y_test))
The following is the output of the preceding code:
We can see that the test data has ~70% accuracy in correctly classifying gender in images.
However, you can see that there is a considerable amount of overfitting on top of the training dataset (as the loss decreases steadily on the training dataset, while not on the test dataset).