My code is correct, but I don’t understand

week 1, second Lab
For Keira’s.layer,
Why here use tfl.ZeroPadding2D, not ZeroPadding3D? I know that on the Tensoflow web mention it: 2D is image. But our input is (64, 64, 3), if I don’t misunderstand, 64, 64 are the image height and width, 3shall be channels. Why we don’t say The Who object is 3D?
And how come here padding 3, not 4,5,6?
However, the BatchNomilization uses axis = 3, 2D has axis 3?

2D convolution operations (conv2d, *pooling2d) come from the fact that you provide details about width and height. The 3rd dimension on number of channels is inferred from the input.

As far as batch normalization is concerned, the exact axis over which batch normalization has to be performed should be specified and hence the use of axis=3.

Yes, the 2D versus 3D is referring to the “spatial” dimensions of the input values. For the 2D case, you have 2 spatial dimensions: height and width, e.g. for an image, and then the third dimension is “channels”, which would typically be the color values of the pixels in the image case. But you can also have images with 3 spatial dimensions: consider the case of CT scans or other medical scans which have 3 spatial dimensions: height, width and depth. And then you have a “channels” dimension that will give the values of the sensors corresponding to the point in 3D space. So the tensors will be 4D, but with 3 spatial dimensions. In that case, you would use Conv3D or Pooling3D.

1 Like

Besides all the comments above, I would also recommend you to

  1. In the tensorflow documentation about ZeroPadding2D, read the whole page and especially the Input shape section. The page tells us what it does, while the input shape section tells us what it expects for.

  2. In the assignment, print some shapes of our input - X_train. It is definitely not (64, 64, 3).

Good luck!