Video Gradients, metrics, and validation

Video Gradients, metrics, and validation minute 0:55 , Laurence says:"
Note that if the batch has 64 examples,the gradients variable contains 64 sets of gradients.One for each set of trainable variables.To line up the array of 64 gradients with the 64 trainable variables that the gradients will update, we can use Python’s zip function." I don’t understand this thing, in my opinion if the batch has 64 elements, I will calculate only one set of gradients calculated with respect to all the trainable variables. I can’t have 64 sets of gradients otherwise I would be running the SGD. This step is not clear to me. Thanks

@Ucar97

can you share the link about what you are trying to state.

regards
DP

1 Like

sure https://www.coursera.org/learn/custom-distributed-training-with-tensorflow/lecture/ogb0C/gradients-metrics-and-validation thanks

hi @Ucar97

It’s because the gradient tape applies the variable in
y=x * x or x^2 form in tenorflow gradient tape is applied array form

See the below document

Introduction to gradients and automatic differentiation  |  TensorFlow Core.

So if batch has 64 examples, the gradient tape will have 64 sets with 64 trainable variables.

Regards
DP

I don’t understand what you mean. I tried to ask chatgpt , his answer is :" * Batch size & Gradient Calculations: In your case with a batch of 64 examples, if you’re training a model, the GradientTape essentially creates a computation graph for each operation involving the model’s weights (trainable variables) applied to all 64 examples in the batch. During backpropagation, the tape can then calculate the gradients for each of these operations across the batch. It’s important to note that even though the tape tracks all operations, when calling tape.gradient, it computes the gradients over the entire batch, not individually per example. This is optimized for memory and efficiency.

  • Lists of Trainable Variables: The function tape.gradient outputs gradients corresponding to each of the model’s trainable variables. These are returned in the form of a list of tensors, where each tensor contains the gradient information for a particular weight matrix or parameter set in the model. You can then pass these gradients to the optimizer using optimizer.apply_gradients."