I have a question regarding the size of the gradients variable in minute 0:52 for the “Gradients, metrics and validation” video in the 2nd section.
The instructor says that the gradients variable size is equal to the size of the batch (64 in this case), but according to the video of the same week “Define Training Loop and Validate Model” tape.gradient actually returns the size of the trainable variables, so I am a little bit confused about what does tape.gradient returns, how would it know the size of the batch, if it only receives the loss (a scalar according to my understanding) and the trainable variables as input in order to calculate the gradients.
1 Like
Hi @Manuel_Montoya
As you said tape.gradient returns the gradients of the loss with respect the trainable variables. In theory the gradients that we are getting are the average gradients over all the examples, also know as the batch.
Maybe Lauren is saying that that we are getting an average of all the batch in the return of tf.gradients. But is a little bit confusing, because the shape of the return isn’t afected by the size of the batch.
Maybe if @gent.spah can confirm it.
Regards!
1 Like
This needs in depth investigation but if we think what are the trainable variables we have to think about the structure of the neuron i.e. g(z) = g(wx+b), so trainable variables are w, and b (weights and biases) and it seems for every input x, so I think because every input x has to go into the neurons the size of w and b matrices is the size of the batch of x-es and then is averaged over many batches.
I think the logic goes somewhat this way…
Hello @Pere_Martra and @Manuel_Montoya , on second thought I think the shapes of w and b depend only on number of layers and units per layer (some distant memories are surfacing ). So thats why the w and b shapes remain the same.