I’m having trouble to find correct gradient formula for exercise 4. I looked at the lab where the formula is for batchsize of 1. Now I adjust it by use np.mean() to counter the batch size > 1. Im not getting correct outcome. It seems that I need to do a multiplication of 2. but, if I do that I get something else wrong. Please help!
instead of using mean(), I used sum and divided by batch_size, now I get correct results. I don’t understand why.
The batch versions of the partial derivatives were mentioned in the video Training a CBOW Model: Backpropagation and Gradient Descent
around 2:50. Unfortunately it’s not in the readings or the labs, so you had to catch it on the video! The note on using keepdims=True
for calculating grad_b1
is required as well, for batch processing.
@Xixi_NXCR That’s because the test is wrong. Exercise 4 - w4_unittest.test_back_prop(back_prop) is wrong - #2 by balaji.ambresh
Hi, I got the same issue. I solved it using the np.sum(a, axis = 1, keepdims = True), as mentioned above. Note that you need to change a by the gradient backpropagation formula for b1 or b2.
@Fabio2
I am having a hard time comprehending this. Adding the corresponding column values will change the values then how come it will give correct results?