Convolution_model_Step_by_Step_v1 Exercise - 1

Azfal_Peermohammed · August 30, 2023, 12:41am

I am confused about the first exercise. I ended up getting the answer correct but I do not understand conceptually why it is correct.

So the test case is initialized as. x.shape = (4, 3, 3, 2)

x ends up looking like
array([[[[ 1.62434536, -0.61175641],
[-0.52817175, -1.07296862],
[ 0.86540763, -2.3015387 ]],

    [[ 1.74481176, -0.7612069 ],
     [ 0.3190391 , -0.24937038],
     [ 1.46210794, -2.06014071]],

    [[-0.3224172 , -0.38405435],
     [ 1.13376944, -1.09989127],
     [-0.17242821, -0.87785842]]],


   [[[ 0.04221375,  0.58281521],
     [-1.10061918,  1.14472371],
     [ 0.90159072,  0.50249434]],

    [[ 0.90085595, -0.68372786],
     [-0.12289023, -0.93576943],
     [-0.26788808,  0.53035547]],

    [[-0.69166075, -0.39675353],
     [-0.6871727 , -0.84520564],
     [-0.67124613, -0.0126646 ]]],


   [[[-1.11731035,  0.2344157 ],
     [ 1.65980218,  0.74204416],
     [-0.19183555, -0.88762896]],

    [[-0.74715829,  1.6924546 ],
     [ 0.05080775, -0.63699565],
     [ 0.19091548,  2.10025514]],

    [[ 0.12015895,  0.61720311],
     [ 0.30017032, -0.35224985],
     [-1.1425182 , -0.34934272]]],


   [[[-0.20889423,  0.58662319],
     [ 0.83898341,  0.93110208],
     [ 0.28558733,  0.88514116]],

    [[-0.75439794,  1.25286816],
     [ 0.51292982, -0.29809284],
     [ 0.48851815, -0.07557171]],

    [[ 1.13162939,  1.51981682],
     [ 2.18557541, -1.39649634],
     [-1.44411381, -0.50446586]]]])

I understand how 4 here would technically represent the number of training examples here, both of the 3’s represent that each of the 2 channels are 3x3. But these desired properties of the answer do not seem to be reflected in x.shape. Maybe I am misinterpreting.

Any guidance to understanding this would be amazing.

paulinpaloalto · August 30, 2023, 1:38am

Yes, that’s the way to visualize what a 4 x 3 x 3 x 2 tensor represents: think of it as 4 samples. And each sample is a 3 x 3 x 2 “image”. So it has 3 x 3 pixels and each “pixel” has two values.

Now the question is how to map that to the output of that “print” statement. Understanding the square brackets is the key. At the outer level, there are 4 “groups” of arrays. Within each group, there are 3 elements, each of which is 3 x 2. So they sort of “peel” the dimensions from the left end, which doesn’t exactly map to our “geometric” interpretation of it as a collection of 4 images.

Azfal_Peermohammed · August 30, 2023, 2:59am

why can’t we just store the data as a (4, 2, 3, 3) shape to match the geometric interpretation?

paulinpaloalto · August 30, 2023, 3:17am

You could do that, but the conventional way to store images is m x h x w x c. When you’re the boss, you can do it your way, but Prof Ng chooses the other way.

If you do it your way, then it’s like viewing two monocolor images one after the other. But the other way is viewing it as a 2D array of pixels with the “depth” being the color. Which you consider more intuitive is a personal thing, but the conventional way is the way that Prof Ng does it.

Topic		Replies	Views
DLS Course 4: Week 1 CNN step by step programming assignment shape understanding problem Convolutional Neural Networks coursera-platform	4	581	May 3, 2022
C4 W1 Assignment 1 - intuition of training data Convolutional Neural Networks coursera-platform	10	455	December 7, 2023
Course 4 Week 2 Assignment 1 Identity BLOCK shape? Convolutional Neural Networks coursera-platform	4	573	November 17, 2021
What means 4 and more dimensions? Convolutional Neural Networks coursera-platform	14	516	August 20, 2023
Shape of the filter used to convolve Convolutional Neural Networks week-module-1 , coursera-platform	3	194	March 24, 2024

Convolution_model_Step_by_Step_v1 Exercise - 1

Related topics