Assignment-1, I got confused about "Exercise 1 - zero_pad function"

H​i everyone,

when we define “x = np.random.randn(4, 3, 3, 2)” according to assignment definition:

4​: m :number of samples,

3​: n_H : height,

3​: n_W : width,

2​: n_C : number of channels,

respectively.

However, when I print and check the matrix “x” defined above, I think that:

4​: m :number of samples,

3​: n_C : number of channels,

3​: n_H : height,

2​: n_W : width, r​espectively.

Can you explain the point that I don’t understand, misunderstood and missed?

Welcome to the community.

I think it is a matter of definitions.
As we can not write a 4D figure, let’s exclude the number of samples, m, and think about 3D tensor.
Now, the shape of x is (3,3,2). The order of shape corresponds to the order of axis. So, there are 3 elements for axis-0, 3 elements for axis-1, and 2 elements for axis-2.

Then, what is the definition of “axis” for convolutional operations ? Here is the one.

There are multiple ways to describe this 3D shape. But, assignments for axises are same. Axis-0 for Height, Axis-1 for Width, Axis-2 for Channels (depth).
Some Numpy users likes Fig.2, i.e, to use axis-0 for “depth”. But, it is a matter of rotation, and still the first dimension (axis-0) is assigned to “Height”.

In short, that’s the definition.

Is there any particular reason that you want to re-define axis-0 for channels ?

1 Like

Hi Nobu Asai,

Thank you very much for your feedback. I understand your statement and I agree with what you wrote. However, aren’t the matrix values returned to us different from the code we wrote in numpy (x = np.random.randn(4,3,3,2) ) and the definition made according to Assignment-1?

Below, it is Assignment-1 zero_pad function. They test the function by defining “x = np.random.randn(4, 3, 3, 2)” and according to function definition, 4: number of samples, 3:height, 3:width and 2:channels.

GRADED FUNCTION: zero_pad

def zero_pad(X, pad):
“”"
Pad with zeros all images of the dataset X. The padding is applied to the height and width of an image,
as illustrated in Figure 1.

Argument:
X -- python numpy array of shape (m, n_H, n_W, n_C) representing a batch of m images
pad -- integer, amount of padding around each image on vertical and horizontal dimensions

Returns:
X_pad -- padded image of shape (m, n_H + 2 * pad, n_W + 2 * pad, n_C)
"""

#(≈ 1 line)
# X_pad = None
# YOUR CODE STARTS HERE
# code removed

# YOUR CODE ENDS HERE

return X_pad

np.random.seed(1)
x = np.random.randn(4, 3, 3, 2)
x_pad = zero_pad(x, 3)
print (“x.shape =\n”, x.shape)
print (“x_pad.shape =\n”, x_pad.shape)
print (“x[1,1] =\n”, x[1, 1])
print (“x_pad[1,1] =\n”, x_pad[1, 1])

fig, axarr = plt.subplots(1, 2)
axarr[0].set_title(‘x’)
axarr[0].imshow(x[0, :, :, 0])
axarr[1].set_title(‘x_pad’)
axarr[1].imshow(x_pad[0, :, :, 0])
zero_pad_test(zero_pad)

When I print this code, x = np.random.randn(4, 3, 3, 2) :

[[[[ 1.26474388 -0.46484427]
[ 0.78637928 0.26461514]
[-0.36456334 1.33924563]]

[[-0.59159267 1.66598765]
[-0.96589543 0.51970341]
[ 0.01959641 1.00936904]]

[[ 1.37712806 -0.86416548]
[-0.58832375 -0.23459444]
[ 0.01654039 0.54634878]]]

[[[-0.70913263 0.7648479 ]
[-2.08136303 0.70314096]
[ 0.56657471 0.22603908]]

[[ 0.23298828 -0.71754524]
[-0.60605412 -0.27958579]
[-0.44631499 -1.53642443]]

[[ 0.27432202 -0.0686808 ]
[ 0.19000226 0.6177528 ]
[-2.38790133 -1.70369936]]]

[[[-0.42946574 0.99084826]
[-0.88816092 -0.70354276]
[-0.66946023 -0.82021568]]

[[-0.45239545 -1.33858049]
[-0.11618799 -0.91795932]
[-0.29172917 -0.47293121]]

[[ 0.36859416 0.18489882]
[-0.68581694 0.49791882]
[-0.78346873 0.45902505]]]

[[[-1.84375242 1.1006082 ]
[-1.46550872 -0.98344802]
[ 0.32178131 -1.27372047]]

[[-0.66542193 -1.78795278]
[ 0.2814287 -0.79330958]
[ 0.81639142 0.33453191]]

[[ 0.84244292 1.19420491]
[-1.00960616 -3.18973464]
[ 0.74752147 -1.81289802]]]]

Doesn’t that mean 3x2 2D matrix with 3 channels and 4 samples ? If so, when when applying padding to matrix x ("x = np.random.randn(4, 3, 3, 2) "), according to padding definition, we need to change the matrix height and width as padding value. Namely, don’t we need to change the matrix x’ s last two entries i.e, x = np.random.randn(4, 3, 3, 2) as padding value?

1 Like

I can see that’s 3x3 matrix with 2 channel and 4 samples.

Let’s simplify it with one sample, i.e, 3x3x2 tensor.

I think the shape at the bottom is same as what you see as one sample of x.
And, you see this is 3x2 matrix with 3 channel. But, it’s not.

I put those values into 3D cube to show mappings to axis-0/1/2.

If we want to get the 1st channel data, which is the 3rd dimension (axis-2), then,

image

You can see this is 3x3 matrix, and exactly same as the first channel in a 3D cube figure.

Hope this helps.

1 Like

With your last statement, I understood exactly where I was wrong. Thank you very much for your explanations and help.