Convolution_model_Step_by_Step_v1: Why is np.pad that shape in the first code section

JoeDS · December 2, 2021, 10:21pm

The “correct” line of code in the first Exercise is:

X_pad = np.pad(X, ((0,0), (pad, pad), (pad, pad), (0, 0)), 'constant');

But I’m confused as to how this is being applied to what seems (to me) a series of pictures of arbitrary size.
Why is it not
X_pad = np.pad(X, ((pad, pad), (pad, pad)), 'constant'); ?
It’s a single image (not RGB) … with only two dimensions. Where are the other ‘(0,0)’ terms coming from?

paulinpaloalto · December 3, 2021, 3:22am

Take another look. The input is a 4D tensor. It tells you that in the “docstring” and other places as well. You could even look at the test code to see the shape of the inputs it is passing you. The height and width dimensions are the 2nd and 3rd of the 4 dimensions. The first dimension is the “samples” dimension. The last dimension is the “channels” dimension, which for an image is the RGB color values.

avo_jojo13 · February 3, 2023, 7:28am

Hi, thanks for your explanation. However, i’m still not very clear on exactly how this works.

Say for a 5D tensor, will the height and width dimensions still be the 2nd and 3rd dimensions? How will we know?

Could you also explain what is a samples dimension?

Thanks!

paulinpaloalto · February 3, 2023, 3:56pm

There is no fixed standard for what the dimensions are in general. If you have a 5D tensor, you’ll need to find the documentation for how it was generated. One such example I can think of would be volumetric medical images like CT scans. There you will have 3 spatial dimensions (height, width and depth) and then the number of channels will be determined by what the outputs of the sensors are. So if you have a batch of such samples, it would end up being a 5D tensor, since each “image” has 4 dimensions and you have multiple such images.

But here in DLS, images are always formatted the same way:

m x h x w x c

Where m is the number of samples (the first dimension), the height h is the second dimension (number of pixels), the width w is the third (number of pixels across) and then c the number of channels (typically 3 for an RGB image, but we sometimes see 4 channels on PNG images) is the last or 4th dimension.

To understand the meaning of the “samples” dimension, the point is we are typically dealing with multiple images in a “batch” of data. So each individual image has 3 dimensions: h, w and c, right? If you want to stack a bunch of images together so that you can deal with them as a batch, you need another dimension. You have two reasonable choices: stack them along a new first dimension or a new last dimension. Prof Ng (and most other people as far as I’ve seen) choose to stack the images so that the first dimension selects the individual samples. So for each value of the first index, you get the ranges of h, w and c for one particular image in the batch. You will see this play out in the “for” loops as you implement conv_forward and pool_forward in the first assignment here in DLS C4 W1.

Topic		Replies	Views
Week 1 Zero Padding Convolutional Neural Networks	5	368	September 11, 2023
Zero_pad function( course 4, Week 1, exersice 1) Convolutional Neural Networks	5	748	December 28, 2021
Course 4 Programming assignment 1 Convolutional Neural Networks	6	528	July 17, 2022
Np.pad() Explanation Convolutional Neural Networks	3	655	August 14, 2021
Week 1 - Convolution_model_Application (Understanding) Convolutional Neural Networks	2	556	July 24, 2022

Convolution_model_Step_by_Step_v1: Why is np.pad that shape in the first code section

Related topics