This is your third post on this subject that I’ve seen so far. In this one you don’t actually show the result you get, but in one of the other ones it looks like you got the correct answer and then divided by 2 to get the average. But the point is that the goal here is not to compute the average: it is the sum across each minibatch. Here’s a thread which explains why that is the case.
1 Like