In steps 1 and 2, we create a first convolution layer of four-dimensions. The first dimension (?) represents any number of input images, the second and third dimensions represent the height (16 pixels) and width (16 pixels) of each convoluted image, and the fourth dimension represents the number of channels (64) produced--one for each convoluted filter. In steps 3 and 5, we extract the final weights of the convolution layer, as shown in the following screenshot:
In step 4, we plot the output of the first convolution layer, as shown in the following screenshot: