C2_W4_Assignment test_back_prop fails by b1 value

Hello classmates,

When I execute w4_unittest.test_back_prop(back_prop), I got following results.

I read the equation many times, but I can not find what mistake I made. While there are many threads about b1 value error, I can not find the same result value like me.

Can anybody give me some advice?

my result:

Wrong output values for gradient of b1 vector.
	 Expected: [[ 0.56665733]
 [ 0.46268776]
 [ 0.1063147 ]
 [-0.17481454]
 [ 0.11041817]
 [ 0.32025188]
 [-0.51827161]
 [ 0.08430878]
 [ 0.19341   ]
 [ 0.08339139]
 [-0.35949678]
 [-0.13053946]
 [ 0.19055422]
 [ 0.56405985]
 [ 0.13321988]] 
	Got: [[0.32110053]
 [0.21407996]
 [0.01130281]
 [0.03056012]
 [0.01219217]
 [0.10256127]
 [0.26860546]
 [0.00710797]
 [0.03740743]
 [0.00695412]
 [0.12923793]
 [0.01704055]
 [0.03631091]
 [0.31816351]
 [0.01774754]].
Wrong output values for gradient of b1 vector.
	 Expected: [[ 0.01864644]
 [-0.31966546]
 [-0.3564441 ]
 [-0.31703253]
 [-0.26702975]
 [ 0.14815984]
 [ 0.25794505]
 [ 0.24893135]
 [ 0.05895103]
 [-0.15348205]] 
	Got: [[0.00034769]
 [0.10218601]
 [0.1270524 ]
 [0.10050963]
 [0.07130489]
 [0.02195134]
 [0.06653565]
 [0.06196681]
 [0.00347522]
 [0.02355674]].
 14  Tests passed
 2  Tests failed

ps. I debugged my code with w4_unittest.test_back_prop. All other values are OK.

1 Like

Hi @sugaprho

This issue is pretty common, so here are the values that you can check against (in your case, the last part):

Check the inputs


x.shape:

(5778, 4)

x.values:

array([[0.  , 0.  , 0.25, 0.25],
       [0.  , 0.25, 0.25, 0.  ],
       [0.  , 0.  , 0.  , 0.  ],
       ...,
       [0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  ]])

yhat.shape:

(5778, 4)

yhat.values:

array([[5.65860316e-06, 6.30488530e-06, 1.05841040e-05, 5.18449964e-06],
       [1.29186998e-04, 1.37714283e-04, 1.02187460e-04, 8.04556333e-05],
       [2.09992073e-06, 2.04610255e-06, 1.93481769e-06, 2.22081401e-06],
       ...,
       [9.18385839e-05, 1.17760693e-04, 1.47190763e-04, 1.63875601e-04],
       [3.92654489e-06, 3.05574159e-06, 4.32752127e-06, 4.67408286e-06],
       [7.41423404e-05, 1.31285979e-04, 1.57982820e-04, 1.25570871e-04]])

y.shape:

(5778, 4)

y.values:

array([[0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 0., 0., 0.],
       ...,
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

W1.shape:

(50, 5778)

W1.values:

array([[4.17022005e-01, 7.20324493e-01, 1.14374817e-04, ...,
        3.18453337e-02, 8.69477233e-03, 3.58537645e-01],
       [3.06338212e-02, 5.17847453e-01, 3.55390080e-03, ...,
        3.18255407e-01, 8.19716276e-01, 5.92131246e-01],
       [3.72634224e-02, 7.78301760e-01, 2.58389111e-01, ...,
        6.59538833e-01, 6.19214356e-01, 5.70114010e-01],
       ...,
       [3.00697829e-01, 5.34150368e-01, 9.82730945e-01, ...,
        3.50735793e-02, 8.00820428e-01, 7.97962643e-01],
       [1.04441188e-02, 3.75743267e-01, 8.00548200e-01, ...,
        9.53145499e-01, 3.97463811e-01, 2.59193158e-01],
       [9.40443797e-01, 6.20337024e-01, 3.49636080e-01, ...,
        2.12197620e-01, 7.34165670e-01, 4.04981356e-01]])

W2.shape

(5778, 50)

W2.values

array([[0.98756632, 0.26921735, 0.79967476, ..., 0.7092524 , 0.26348048,
        0.91307305],
       [0.32481516, 0.49150709, 0.93396834, ..., 0.87124107, 0.66809605,
        0.59724161],
       [0.33875982, 0.97059413, 0.71778828, ..., 0.50609534, 0.74589904,
        0.17921762],
       ...,
       [0.26514628, 0.12376321, 0.04835613, ..., 0.51204125, 0.82800244,
        0.83260737],
       [0.43858813, 0.22220155, 0.23391658, ..., 0.44519868, 0.31930962,
        0.41111395],
       [0.3361489 , 0.37257524, 0.84969136, ..., 0.22725725, 0.14900807,
        0.37457519]])

b1.shape

(50, 1)

b1.values

array([[0.20354182],
       [0.7393041 ],
       [0.52322409],
       ...,
       [0.95183274],
       [0.12313232],
       [0.71743542]])

b2.shape

(5778, 1)

b2.values

array([[0.26853656],
       [0.42872682],
       [0.34107181],
       ...,
       [0.97258741],
       [0.77647386],
       [0.7201184 ]])

batch_size

4


Check your calculations

z1.shape:

(50, 4)

z1.values:

array([[0.58475793, 0.77387674, 0.68392272, 0.62577631],
       [1.42914862, 1.42776461, 1.10799043, 1.23087398],
       [1.09752993, 0.98610201, 0.99513927, 0.76984205],
       ...,
       [1.40200846, 1.40704489, 1.47177351, 1.2876826 ],
       [0.80057356, 0.60361842, 0.37715675, 0.63943922],
       [1.08343076, 1.23078049, 1.33236934, 1.21405054]])

Implemented for you

l1.shape:

# Compute l1 as W2^T (Yhat - Y) result:
(50, 4)

l1.values:

array([[ 1.90619151e-01,  5.04911863e-01,  2.39850923e-01,
         4.70598914e-01],
       [ 1.11155469e-01,  2.16875768e-02,  5.15676504e-02,
        -3.41496985e-01],
       [-3.26332279e-01,  5.81739107e-02,  1.32373299e-02,
        -1.63465397e-01],
       ...,
       [-2.56035248e-01, -2.60606873e-01, -1.61361246e-01,
        -3.53597305e-01],
       [-1.22656968e-01, -1.29136393e-01,  3.99540453e-01,
         5.15137770e-01],
       [-1.17463600e-02,  1.55212349e-01, -8.04714320e-02,
         4.84456794e-01]])

l1.shape:

# use "l1" to compute gradients below (implemented for you)
# in this (unfortunate) case all z1 are > 0, so result does not change

(50, 4)

l1.values:

array([[ 1.90619151e-01,  5.04911863e-01,  2.39850923e-01,
         4.70598914e-01],
       [ 1.11155469e-01,  2.16875768e-02,  5.15676504e-02,
        -3.41496985e-01],
       [-3.26332279e-01,  5.81739107e-02,  1.32373299e-02,
        -1.63465397e-01],
       ...,
       [-2.56035248e-01, -2.60606873e-01, -1.61361246e-01,
        -3.53597305e-01],
       [-1.22656968e-01, -1.29136393e-01,  3.99540453e-01,
         5.15137770e-01],
       [-1.17463600e-02,  1.55212349e-01, -8.04714320e-02,
         4.84456794e-01]])

Implemented for you

Now your calculations:

grad_W1.shape:

# compute the gradient for W1
(50, 5778)

grad_W1.values:

array([[ 0.04440311,  0.04654767,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [-0.01812058,  0.00457845,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [-0.00938925,  0.0044632 ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       ...,
       [-0.03218491, -0.02637301,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.05716739,  0.01690025,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.02524909,  0.00467131,  0.        , ...,  0.        ,
         0.        ,  0.        ]])

grad_W2.shape:

# Compute gradient of W2
(5778, 50)

grad_W2.values:

array([[ 4.66779085e-06,  8.79935716e-06,  6.73791242e-06, ...,
         9.76451498e-06,  3.91072793e-06,  8.57170370e-06],
       [-1.46113894e-01, -3.57138779e-01, -2.74272178e-01, ...,
        -3.50344892e-01, -2.00074256e-01, -2.70721867e-01],
       [ 1.38109376e-06,  2.69993832e-06,  1.98937019e-06, ...,
         2.88259543e-06,  1.26650284e-06,  2.51687351e-06],
       ...,
       [ 8.70130445e-05,  1.66045427e-04,  1.22388322e-04, ...,
         1.80526070e-04,  7.62271359e-05,  1.59876008e-04],
       [ 2.63636651e-06,  5.13063877e-06,  3.80689133e-06, ...,
         5.54812026e-06,  2.40223393e-06,  4.86387903e-06],
       [ 8.28954510e-05,  1.55752817e-04,  1.16179863e-04, ...,
         1.70720952e-04,  6.96205653e-05,  1.51213291e-04]])

grad_b1.shape:

# compute gradient for b1
(50, 1)

grad_b1.values:

array([[ 0.35149521],
       [-0.03927157],
       [-0.10459661],
       ...,
       [-0.25790017],
       [ 0.16572122],
       [ 0.13686284]])

grad_b2.shape:

# compute gradient for b2
(5778, 1)

grad_b2.values:

array([[ 6.93302302e-06],
       [-2.49887614e-01],
       [ 2.07541375e-06],
       ...,
       [ 1.30166410e-04],
       [ 3.99597265e-06],
       [ 1.22245503e-04]])

Cheers

1 Like

Hi, @arvyzukai,

I really appreciate for your kind help.

I compared each variable’s value and I found that there are diffence in grad_W1, and grad_b1 and there is a commonality which is how to apply step function.

I looked over the computation process of z1 and l1 because those are given in advance. After I closely looked at those values, I found l1 is based of following code.

l1 = np.dot(W2.T, (yhat - y))

Furthermore, the given code already calculate the step function value.

l1[z1 < 0] = 0 # use "l1" to compute gradients below

And finally understand which variable must be used while calculating grad_W1, and grad_b1.

By the way, even though I passed every unit test, the value of grad_W1 differ to the comment above.

In my case,

grad_W1 [[ 0.04440311  0.04654767  0.         ...  0.          0.
   0.        ]
 [-0.01812058  0.00457845  0.         ...  0.          0.
   0.        ]
 [-0.00938925  0.0044632   0.         ...  0.          0.
   0.        ]
 ...
 [-0.03218491 -0.02637301  0.         ...  0.          0.
   0.        ]
 [ 0.05716739  0.01690025  0.         ...  0.          0.
   0.        ]
 [ 0.02524909  0.00467131  0.         ...  0.          0.
   0.        ]]

I just leave above value for other classmates who experience the same problem and doing debugging and hopefully above values could be valid ones.

Thank you again @arvyzukai

3 Likes

Thank you for pointing out the difference (I had old code left from helping other students). After your note, I realized the unit test actually does not test the grad_W1 values (since x in the unit test are all 0s) and you would pass with any values if they match the desired shape.

Cheers

2 Likes

hi @arvyzukai and @sugaprho . I have the same problem. the picture in the assignment says we should use step(z1) but in the implemented code they are not calculating step of z1. they are just making l1 array 0 wherever z1 is less than 0 and i think it is not correct. I have searched to find the correct way of implementing backprop for CBOW model and I found out that we should calculate the step function for W2.T(yhat-y) and not z1. I am confused. I would appreciate it if you’d help. (sorry if my english is not so good!) thank you.

1 Like

That piece of code is the implementation of the step function. The step function on z1 is either 0 or 1, so l1 values either need to stay the same or go to zero.

1 Like

just make sure that you don’t use relu function when calculating the l1

What helped me solve grad_b2 errors was:

What does step(z1) mean?
(definition by GPT)
image
So multiplying step(z1) by l1 would be equivalent to: l1[z1<0]=0, i.e., substituting every negative value by 0.

What does it mean to multiply by step(z1)?
Once you understand what step(z1) is, you realize that you don’t have to take the product: l1•z1, since l1 is already the equivalent of relu(z1).

In short: substituting l1•z1 by l1 fixed my problem (grad_b1 incorrect value; failed 2 tests)

I deleted the post because I realized what I was doing wrong!

Thank you for this explanation! It helped me a lot!