Video Gradients, metrics, and validation

Ucar97 · October 10, 2024, 7:11pm

Video Gradients, metrics, and validation minute 0:55 , Laurence says:"
Note that if the batch has 64 examples,the gradients variable contains 64 sets of gradients.One for each set of trainable variables.To line up the array of 64 gradients with the 64 trainable variables that the gradients will update, we can use Python’s zip function." I don’t understand this thing, in my opinion if the batch has 64 elements, I will calculate only one set of gradients calculated with respect to all the trainable variables. I can’t have 64 sets of gradients otherwise I would be running the SGD. This step is not clear to me. Thanks

Deepti_Prasad · October 10, 2024, 7:15pm

@Ucar97

can you share the link about what you are trying to state.

regards
DP

Ucar97 · October 10, 2024, 7:57pm

sure https://www.coursera.org/learn/custom-distributed-training-with-tensorflow/lecture/ogb0C/gradients-metrics-and-validation thanks

Deepti_Prasad · October 10, 2024, 10:26pm

hi @Ucar97

It’s because the gradient tape applies the variable in
y=x * x or x^2 form in tenorflow gradient tape is applied array form

See the below document

Introduction to gradients and automatic differentiation | TensorFlow Core.

So if batch has 64 examples, the gradient tape will have 64 sets with 64 trainable variables.

Regards
DP

Ucar97 · October 11, 2024, 9:10am

I don’t understand what you mean. I tried to ask chatgpt , his answer is :" * Batch size & Gradient Calculations: In your case with a batch of 64 examples, if you’re training a model, the GradientTape essentially creates a computation graph for each operation involving the model’s weights (trainable variables) applied to all 64 examples in the batch. During backpropagation, the tape can then calculate the gradients for each of these operations across the batch. It’s important to note that even though the tape tracks all operations, when calling tape.gradient, it computes the gradients over the entire batch, not individually per example. This is optimized for memory and efficiency.

Lists of Trainable Variables: The function tape.gradient outputs gradients corresponding to each of the model’s trainable variables. These are returned in the form of a list of tensors, where each tensor contains the gradient information for a particular weight matrix or parameter set in the model. You can then pass these gradients to the optimizer using optimizer.apply_gradients."

Topic		Replies	Views
Gradients variable size Custom and Distributed Training with TF week-2	2	514	March 24, 2023
C2_W1_Lab_2_gradient-tape-basics - one or two gradient tapes Custom and Distributed Training with TF week-1	3	64	July 1, 2024
How does tf.GradientTape() work? Improving Deep Neural Networks: Hyperparameter tun	7	582	November 23, 2021
Question on gradient tape Improving Deep Neural Networks: Hyperparameter tun	1	347	September 28, 2023
Course 2 Week 3 Lesson 13 tape.gradient returns W=NaN Improving Deep Neural Networks: Hyperparameter tun	2	524	September 30, 2021

Video Gradients, metrics, and validation

Related topics