Gradients variable size

Manuel_Montoya · March 23, 2023, 8:25pm

I have a question regarding the size of the gradients variable in minute 0:52 for the “Gradients, metrics and validation” video in the 2nd section.

The instructor says that the gradients variable size is equal to the size of the batch (64 in this case), but according to the video of the same week “Define Training Loop and Validate Model” tape.gradient actually returns the size of the trainable variables, so I am a little bit confused about what does tape.gradient returns, how would it know the size of the batch, if it only receives the loss (a scalar according to my understanding) and the trainable variables as input in order to calculate the gradients.

Pere_Martra · March 23, 2023, 11:24pm

Hi @Manuel_Montoya

As you said tape.gradient returns the gradients of the loss with respect the trainable variables. In theory the gradients that we are getting are the average gradients over all the examples, also know as the batch.

Maybe Lauren is saying that that we are getting an average of all the batch in the return of tf.gradients. But is a little bit confusing, because the shape of the return isn’t afected by the size of the batch.

Maybe if @gent.spah can confirm it.

Regards!

gent.spah · March 24, 2023, 8:59am

This needs in depth investigation but if we think what are the trainable variables we have to think about the structure of the neuron i.e. g(z) = g(wx+b), so trainable variables are w, and b (weights and biases) and it seems for every input x, so I think because every input x has to go into the neurons the size of w and b matrices is the size of the batch of x-es and then is averaged over many batches.

I think the logic goes somewhat this way…

Hello @Pere_Martra and @Manuel_Montoya , on second thought I think the shapes of w and b depend only on number of layers and units per layer (some distant memories are surfacing ). So thats why the w and b shapes remain the same.

Topic		Replies	Views
Video Gradients, metrics, and validation Custom and Distributed Training with TF week-2	4	21	October 11, 2024
Impact batch size Convolutional Neural Networks in TensorFlow week-1	2	523	December 20, 2022
Doubt about Tensorflow Tensor Shape in Week 3 Assignment 1 Improving Deep Neural Networks: Hyperparameter tun week-3	7	35	July 4, 2024
Question on gradient tape Improving Deep Neural Networks: Hyperparameter tun	1	347	September 28, 2023
[Course 1 Week 3] Quiz question Neural Networks and Deep Learning	3	646	June 14, 2022

Gradients variable size

Related topics