C2_W4_Assignment test_back_prop fails by b1 value

sugaprho · January 20, 2024, 1:53pm

Hello classmates,

When I execute w4_unittest.test_back_prop(back_prop), I got following results.

I read the equation many times, but I can not find what mistake I made. While there are many threads about b1 value error, I can not find the same result value like me.

Can anybody give me some advice?

my result:

Wrong output values for gradient of b1 vector.
	 Expected: [[ 0.56665733]
 [ 0.46268776]
 [ 0.1063147 ]
 [-0.17481454]
 [ 0.11041817]
 [ 0.32025188]
 [-0.51827161]
 [ 0.08430878]
 [ 0.19341   ]
 [ 0.08339139]
 [-0.35949678]
 [-0.13053946]
 [ 0.19055422]
 [ 0.56405985]
 [ 0.13321988]] 
	Got: [[0.32110053]
 [0.21407996]
 [0.01130281]
 [0.03056012]
 [0.01219217]
 [0.10256127]
 [0.26860546]
 [0.00710797]
 [0.03740743]
 [0.00695412]
 [0.12923793]
 [0.01704055]
 [0.03631091]
 [0.31816351]
 [0.01774754]].
Wrong output values for gradient of b1 vector.
	 Expected: [[ 0.01864644]
 [-0.31966546]
 [-0.3564441 ]
 [-0.31703253]
 [-0.26702975]
 [ 0.14815984]
 [ 0.25794505]
 [ 0.24893135]
 [ 0.05895103]
 [-0.15348205]] 
	Got: [[0.00034769]
 [0.10218601]
 [0.1270524 ]
 [0.10050963]
 [0.07130489]
 [0.02195134]
 [0.06653565]
 [0.06196681]
 [0.00347522]
 [0.02355674]].
 14  Tests passed
 2  Tests failed

ps. I debugged my code with w4_unittest.test_back_prop. All other values are OK.

arvyzukai · January 22, 2024, 9:48am

Hi @sugaprho

This issue is pretty common, so here are the values that you can check against (in your case, the last part):

Check the inputs

x.shape:

(5778, 4)

x.values:

array([[0.  , 0.  , 0.25, 0.25],
       [0.  , 0.25, 0.25, 0.  ],
       [0.  , 0.  , 0.  , 0.  ],
       ...,
       [0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  ],
       [0.  , 0.  , 0.  , 0.  ]])

yhat.shape:

(5778, 4)

yhat.values:

array([[5.65860316e-06, 6.30488530e-06, 1.05841040e-05, 5.18449964e-06],
       [1.29186998e-04, 1.37714283e-04, 1.02187460e-04, 8.04556333e-05],
       [2.09992073e-06, 2.04610255e-06, 1.93481769e-06, 2.22081401e-06],
       ...,
       [9.18385839e-05, 1.17760693e-04, 1.47190763e-04, 1.63875601e-04],
       [3.92654489e-06, 3.05574159e-06, 4.32752127e-06, 4.67408286e-06],
       [7.41423404e-05, 1.31285979e-04, 1.57982820e-04, 1.25570871e-04]])

y.shape:

(5778, 4)

y.values:

array([[0., 0., 0., 0.],
       [1., 0., 0., 0.],
       [0., 0., 0., 0.],
       ...,
       [0., 0., 0., 0.],
       [0., 0., 0., 0.],
       [0., 0., 0., 0.]])

W1.shape:

(50, 5778)

W1.values:

array([[4.17022005e-01, 7.20324493e-01, 1.14374817e-04, ...,
        3.18453337e-02, 8.69477233e-03, 3.58537645e-01],
       [3.06338212e-02, 5.17847453e-01, 3.55390080e-03, ...,
        3.18255407e-01, 8.19716276e-01, 5.92131246e-01],
       [3.72634224e-02, 7.78301760e-01, 2.58389111e-01, ...,
        6.59538833e-01, 6.19214356e-01, 5.70114010e-01],
       ...,
       [3.00697829e-01, 5.34150368e-01, 9.82730945e-01, ...,
        3.50735793e-02, 8.00820428e-01, 7.97962643e-01],
       [1.04441188e-02, 3.75743267e-01, 8.00548200e-01, ...,
        9.53145499e-01, 3.97463811e-01, 2.59193158e-01],
       [9.40443797e-01, 6.20337024e-01, 3.49636080e-01, ...,
        2.12197620e-01, 7.34165670e-01, 4.04981356e-01]])

W2.shape

(5778, 50)

W2.values

array([[0.98756632, 0.26921735, 0.79967476, ..., 0.7092524 , 0.26348048,
        0.91307305],
       [0.32481516, 0.49150709, 0.93396834, ..., 0.87124107, 0.66809605,
        0.59724161],
       [0.33875982, 0.97059413, 0.71778828, ..., 0.50609534, 0.74589904,
        0.17921762],
       ...,
       [0.26514628, 0.12376321, 0.04835613, ..., 0.51204125, 0.82800244,
        0.83260737],
       [0.43858813, 0.22220155, 0.23391658, ..., 0.44519868, 0.31930962,
        0.41111395],
       [0.3361489 , 0.37257524, 0.84969136, ..., 0.22725725, 0.14900807,
        0.37457519]])

b1.shape

(50, 1)

b1.values

array([[0.20354182],
       [0.7393041 ],
       [0.52322409],
       ...,
       [0.95183274],
       [0.12313232],
       [0.71743542]])

b2.shape

(5778, 1)

b2.values

array([[0.26853656],
       [0.42872682],
       [0.34107181],
       ...,
       [0.97258741],
       [0.77647386],
       [0.7201184 ]])

batch_size

4

Check your calculations

z1.shape:

(50, 4)

z1.values:

array([[0.58475793, 0.77387674, 0.68392272, 0.62577631],
       [1.42914862, 1.42776461, 1.10799043, 1.23087398],
       [1.09752993, 0.98610201, 0.99513927, 0.76984205],
       ...,
       [1.40200846, 1.40704489, 1.47177351, 1.2876826 ],
       [0.80057356, 0.60361842, 0.37715675, 0.63943922],
       [1.08343076, 1.23078049, 1.33236934, 1.21405054]])

Implemented for you

l1.shape:

# Compute l1 as W2^T (Yhat - Y) result:
(50, 4)

l1.values:

array([[ 1.90619151e-01,  5.04911863e-01,  2.39850923e-01,
         4.70598914e-01],
       [ 1.11155469e-01,  2.16875768e-02,  5.15676504e-02,
        -3.41496985e-01],
       [-3.26332279e-01,  5.81739107e-02,  1.32373299e-02,
        -1.63465397e-01],
       ...,
       [-2.56035248e-01, -2.60606873e-01, -1.61361246e-01,
        -3.53597305e-01],
       [-1.22656968e-01, -1.29136393e-01,  3.99540453e-01,
         5.15137770e-01],
       [-1.17463600e-02,  1.55212349e-01, -8.04714320e-02,
         4.84456794e-01]])

l1.shape:

# use "l1" to compute gradients below (implemented for you)
# in this (unfortunate) case all z1 are > 0, so result does not change

(50, 4)

l1.values:

array([[ 1.90619151e-01,  5.04911863e-01,  2.39850923e-01,
         4.70598914e-01],
       [ 1.11155469e-01,  2.16875768e-02,  5.15676504e-02,
        -3.41496985e-01],
       [-3.26332279e-01,  5.81739107e-02,  1.32373299e-02,
        -1.63465397e-01],
       ...,
       [-2.56035248e-01, -2.60606873e-01, -1.61361246e-01,
        -3.53597305e-01],
       [-1.22656968e-01, -1.29136393e-01,  3.99540453e-01,
         5.15137770e-01],
       [-1.17463600e-02,  1.55212349e-01, -8.04714320e-02,
         4.84456794e-01]])

Implemented for you

Now your calculations:

grad_W1.shape:

# compute the gradient for W1
(50, 5778)

grad_W1.values:

array([[ 0.04440311,  0.04654767,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [-0.01812058,  0.00457845,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [-0.00938925,  0.0044632 ,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       ...,
       [-0.03218491, -0.02637301,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.05716739,  0.01690025,  0.        , ...,  0.        ,
         0.        ,  0.        ],
       [ 0.02524909,  0.00467131,  0.        , ...,  0.        ,
         0.        ,  0.        ]])

grad_W2.shape:

# Compute gradient of W2
(5778, 50)

grad_W2.values:

array([[ 4.66779085e-06,  8.79935716e-06,  6.73791242e-06, ...,
         9.76451498e-06,  3.91072793e-06,  8.57170370e-06],
       [-1.46113894e-01, -3.57138779e-01, -2.74272178e-01, ...,
        -3.50344892e-01, -2.00074256e-01, -2.70721867e-01],
       [ 1.38109376e-06,  2.69993832e-06,  1.98937019e-06, ...,
         2.88259543e-06,  1.26650284e-06,  2.51687351e-06],
       ...,
       [ 8.70130445e-05,  1.66045427e-04,  1.22388322e-04, ...,
         1.80526070e-04,  7.62271359e-05,  1.59876008e-04],
       [ 2.63636651e-06,  5.13063877e-06,  3.80689133e-06, ...,
         5.54812026e-06,  2.40223393e-06,  4.86387903e-06],
       [ 8.28954510e-05,  1.55752817e-04,  1.16179863e-04, ...,
         1.70720952e-04,  6.96205653e-05,  1.51213291e-04]])

grad_b1.shape:

# compute gradient for b1
(50, 1)

grad_b1.values:

array([[ 0.35149521],
       [-0.03927157],
       [-0.10459661],
       ...,
       [-0.25790017],
       [ 0.16572122],
       [ 0.13686284]])

grad_b2.shape:

# compute gradient for b2
(5778, 1)

grad_b2.values:

array([[ 6.93302302e-06],
       [-2.49887614e-01],
       [ 2.07541375e-06],
       ...,
       [ 1.30166410e-04],
       [ 3.99597265e-06],
       [ 1.22245503e-04]])

Cheers

sugaprho · January 24, 2024, 11:33am

Hi, @arvyzukai,

I really appreciate for your kind help.

I compared each variable’s value and I found that there are diffence in grad_W1, and grad_b1 and there is a commonality which is how to apply step function.

I looked over the computation process of z1 and l1 because those are given in advance. After I closely looked at those values, I found l1 is based of following code.

l1 = np.dot(W2.T, (yhat - y))

Furthermore, the given code already calculate the step function value.

l1[z1 < 0] = 0 # use "l1" to compute gradients below

And finally understand which variable must be used while calculating grad_W1, and grad_b1.

By the way, even though I passed every unit test, the value of grad_W1 differ to the comment above.

In my case,

grad_W1 [[ 0.04440311  0.04654767  0.         ...  0.          0.
   0.        ]
 [-0.01812058  0.00457845  0.         ...  0.          0.
   0.        ]
 [-0.00938925  0.0044632   0.         ...  0.          0.
   0.        ]
 ...
 [-0.03218491 -0.02637301  0.         ...  0.          0.
   0.        ]
 [ 0.05716739  0.01690025  0.         ...  0.          0.
   0.        ]
 [ 0.02524909  0.00467131  0.         ...  0.          0.
   0.        ]]

I just leave above value for other classmates who experience the same problem and doing debugging and hopefully above values could be valid ones.

Thank you again @arvyzukai

arvyzukai · January 24, 2024, 2:32pm

Thank you for pointing out the difference (I had old code left from helping other students). After your note, I realized the unit test actually does not test the grad_W1 values (since x in the unit test are all 0s) and you would pass with any values if they match the desired shape.

Cheers

vanooshe · January 30, 2024, 11:38am

hi @arvyzukai and @sugaprho . I have the same problem. the picture in the assignment says we should use step(z1) but in the implemented code they are not calculating step of z1. they are just making l1 array 0 wherever z1 is less than 0 and i think it is not correct. I have searched to find the correct way of implementing backprop for CBOW model and I found out that we should calculate the step function for W2.T(yhat-y) and not z1. I am confused. I would appreciate it if you’d help. (sorry if my english is not so good!) thank you.

ericjkrebs · February 6, 2024, 9:06pm

That piece of code is the implementation of the step function. The step function on z1 is either 0 or 1, so l1 values either need to stay the same or go to zero.

ALPIER_BCHARA · March 26, 2024, 3:40pm

just make sure that you don’t use relu function when calculating the l1

musicalstarwalker · May 5, 2024, 5:56am

What helped me solve grad_b2 errors was:

What does step(z1) mean?
(definition by GPT)

So multiplying step(z1) by l1 would be equivalent to: l1[z1<0]=0, i.e., substituting every negative value by 0.

What does it mean to multiply by step(z1)?
Once you understand what step(z1) is, you realize that you don’t have to take the product: l1•z1, since l1 is already the equivalent of relu(z1).

In short: substituting l1•z1 by l1 fixed my problem (grad_b1 incorrect value; failed 2 tests)

Erika_Petersen · July 6, 2024, 9:44pm

I deleted the post because I realized what I was doing wrong!

RLSK · August 6, 2024, 12:58am

Thank you for this explanation! It helped me a lot!

Topic		Replies	Views
C2_W4_Assignment Exercise 4 - Back_prop NLP with Probabilistic Models week-4	3	49	September 9, 2024
NLP C2_W4_2 unit tests failing (back_prop) NLP with Probabilistic Models week-4	1	17	April 28, 2025
W4_unittest.test_back_prop(back_prop)- Failing NLP with Probabilistic Models week-4	3	586	November 25, 2023
Errors for C2_W4_Assignment w4_unittest.test_gradient_descent NLP with Probabilistic Models week-4	3	382	December 18, 2023
Ex 4 and 5 nlp course 2 assignment 4 of week 4 NLP with Probabilistic Models week-4	8	765	October 9, 2023

C2_W4_Assignment test_back_prop fails by b1 value

Check the inputs

x.shape:

x.values:

yhat.shape:

yhat.values:

y.shape:

y.values:

W1.shape:

W1.values:

W2.shape

W2.values

b1.shape

b1.values

b2.shape

b2.values

batch_size

Check your calculations

z1.shape:

z1.values:

l1.shape:

l1.values:

l1.shape:

l1.values:

Now your calculations:

grad_W1.shape:

grad_W1.values:

grad_W2.shape:

grad_W2.values:

grad_b1.shape:

grad_b1.values:

grad_b2.shape:

grad_b2.values:

Related topics