Video Gradients, metrics, and validation minute 0:55 , Laurence says:"
Note that if the batch has 64 examples,the gradients variable contains 64 sets of gradients.One for each set of trainable variables.To line up the array of 64 gradients with the 64 trainable variables that the gradients will update, we can use Python’s zip function." I don’t understand this thing, in my opinion if the batch has 64 elements, I will calculate only one set of gradients calculated with respect to all the trainable variables. I can’t have 64 sets of gradients otherwise I would be running the SGD. This step is not clear to me. Thanks
hi @Ucar97
It’s because the gradient tape applies the variable in
y=x * x or x^2 form in tenorflow gradient tape is applied array form
See the below document
Introduction to gradients and automatic differentiation | TensorFlow Core.
So if batch has 64 examples, the gradient tape will have 64 sets with 64 trainable variables.
Regards
DP
I don’t understand what you mean. I tried to ask chatgpt , his answer is :" * Batch size & Gradient Calculations: In your case with a batch of 64 examples, if you’re training a model, the GradientTape
essentially creates a computation graph for each operation involving the model’s weights (trainable variables) applied to all 64 examples in the batch. During backpropagation, the tape can then calculate the gradients for each of these operations across the batch. It’s important to note that even though the tape tracks all operations, when calling tape.gradient
, it computes the gradients over the entire batch, not individually per example. This is optimized for memory and efficiency.
- Lists of Trainable Variables: The function
tape.gradient
outputs gradients corresponding to each of the model’s trainable variables. These are returned in the form of a list of tensors, where each tensor contains the gradient information for a particular weight matrix or parameter set in the model. You can then pass these gradients to the optimizer usingoptimizer.apply_gradients
."