DLS Course 4: Week 1 CNN step by step programming assignment shape understanding problem

Hi,

seems I am having a basic understanding problem as I am struggeling with the shape of X.

This is X:

X = [[[[ 1.62434536, -0.61175641]
[-0.52817175, -1.07296862]
[ 0.86540763, -2.3015387 ]]

[[ 1.74481176, -0.7612069 ]
[ 0.3190391, -0.24937038]
[ 1.46210794, -2.06014071]]

[[-0.3224172, -0.38405435]
[ 1.13376944, -1.09989127]
[-0.17242821, -0.87785842]]]

[[[ 0.04221375, 0.58281521]
[-1.10061918, 1.14472371]
[ 0.90159072, 0.50249434]]

[[ 0.90085595, -0.68372786]
[-0.12289023, -0.93576943]
[-0.26788808, 0.53035547]]

[[-0.69166075, -0.39675353]
[-0.6871727, -0.84520564]
[-0.67124613, -0.0126646 ]]]

[[[-1.11731035, 0.2344157 ]
[ 1.65980218, 0.74204416]
[-0.19183555, -0.88762896]]

[[-0.74715829, 1.6924546 ]
[ 0.05080775, -0.63699565]
[ 0.19091548, 2.10025514]]

[[ 0.12015895, 0.61720311]
[ 0.30017032, -0.35224985]
[-1.1425182 , -0.34934272]]]

[[[-0.20889423, 0.58662319]
[ 0.83898341, 0.93110208]
[ 0.28558733, 0.88514116]]

[[-0.75439794, 1.25286816]
[ 0.51292982, -0.29809284]
[ 0.48851815, -0.07557171]]

[[ 1.13162939, 1.51981682]
[ 2.18557541, -1.39649634]
[-1.44411381, -0.50446586]]]]

The notebook description mentions x.shape = (4, 3, 3, 2), which I do not fully understand.
So, we have 4 examples, each of them has 3 rows (for RGB?) with 2 columns? But what is the fourth parameter of the shape? What do the tuples in the single brackets actually stand for?

Any enlightment on my understanding problem would be highly appreciated. Thanks for your effort in advance.

Best regards,
Dan

So we have 4 examples (why 4? Aren’t there 9 pixels in the plot?) that are surrounded by [[[…]]], each of them has 2 columns (for what exactly?) and 3 times (because of RGB?) 3 rows in each of them surrounded by [[…]]. What is the purpose of these?

Thanks,
Dan

Hey @dstefane,
Screenshot from 2022-05-02 18-43-06

I have attached the screenshot above of the code cell that we are talking about. In this code cell, X is just an example vector to check our function zero_pad. So, any reference to any of the images is not intended in here.

As for the shape, it is mentioned in the previous code cell, β€œX – python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images”. The first dimension represents the number of examples, the 2nd, 3rd and the 4th dimensions represent the height, width and number of channels in a single example.

For your understanding, let me present a simple example. Consider that we have 128 RGB images, each having a spatial resolution of 32*32. In that case, X.shape = (128, 32, 32, 3). I hope this helps.

Regards,
Elemento

1 Like

Hi Elemento,

thanks for your reply!

So, shape (4, 3, 3, 2) means we have 4 examples of 3(n_H) x 3(n_W) pictures with only 2 channels, right? It confused me that it is not related to the description in the text.

Thanks for your help.

Best regards,
Dan

Yes @dstefane, you are absolutely correct, and your most welcome.

Regards,
Elemento

1 Like