DLS 2 module 3 - TensorFlow programming exercise 3.3

Hi,

I’m confused in that I hit an error in section 3.3 - “Train the Model” of the programming exercise, when all prior code has executed and passed all tests. This despite the model requiring no additional code.

When I run the code block:
parameters, costs, train_acc, test_acc = model(new_train, new_y_train, new_test, new_y_test, num_epochs=100)

The error I get is:
ValueError: No gradients provided for any variable: [‘Variable:0’, ‘Variable:0’, ‘Variable:0’, ‘Variable:0’, ‘Variable:0’, ‘Variable:0’].

It would seem that the gradients I’m supplying to “optimizer.apply_gradients” aren’t defined, but I’m not sure why this occurs when the other code runs successfully. Any pointers greatly appreciated!


ValueError Traceback (most recent call last)
in
----> 1 parameters, costs, train_acc, test_acc = model(new_train, new_y_train, new_test, new_y_test, num_epochs=100)

in model(X_train, Y_train, X_test, Y_test, learning_rate, num_epochs, minibatch_size, print_cost)
74 grads = tape.gradient(minibatch_total_loss, trainable_variables)
75 print(grads) ##CAH added
—> 76 optimizer.apply_gradients(zip(grads, trainable_variables))
77 epoch_total_loss += minibatch_total_loss
78

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py in apply_gradients(self, grads_and_vars, name, experimental_aggregate_gradients)
511 ValueError: If none of the variables have gradients.
512 “”"
→ 513 grads_and_vars = filter_grads(grads_and_vars)
514 var_list = [v for (
, v) in grads_and_vars]
515

/opt/conda/lib/python3.7/site-packages/tensorflow/python/keras/optimizer_v2/optimizer_v2.py in _filter_grads(grads_and_vars)
1269 if not filtered:
1270 raise ValueError("No gradients provided for any variable: s."
→ 1271 ([v.name for _, v in grads_and_vars],))
1272 if vars_with_empty_grads:
1273 logging.warning(

ValueError: No gradients provided for any variable: [‘Variable:0’, ‘Variable:0’, ‘Variable:0’, ‘Variable:0’, ‘Variable:0’, ‘Variable:0’].

Hi @chenrynyc ,

It could be that the kernel has become inactive at some point. Try refresh the kernel and rerun the code from start.

From the menu tab at the top of your notebook:

Kernel → restart and clear all output
Cell → run all

Thanks. I had tried restarting the kernel, and again tried what you suggested, but had the same outcome.

In case it is helpful, I confirmed that the model() call runs into a state where the line:
“grads = tape.gradient(minibatch_total_loss, trainable_variables)”

outputs
“[None, None, None, None, None, None]”
to the variable grads

thanks

That error means that you probably have included some numpy functions in the compute graph, which breaks the graph as far as computing gradients is concerned, because that is only supported by TF. For example, if you use np.transpose or logits.T to perform the transpose on the inputs to the cost function, that will have this effect.

2 Likes

Thank you!
Spot on – I’d included np.transpose in the cost function. All works properly when I swapped in tf.transpose for it.

much appreciated @ [paulinpaloalto]!

This is a pretty subtle point and they don’t talk about it in the notebook. It’s misleading because the tests in the notebook only check the output values and those are correct. It’s not until you try back prop that things halt and catch fire.

It’s been a while since I watched the lectures on this, but I’m guessing they probably don’t talk about this there either. One of the many advantages of TensorFlow (and other “frameworks” like PyTorch) is that they all do some version of “automatic differentiation” to compute the gradients and manage back prop for you. All we have to do is write forward propagation and define the loss function and then TF can do back propagation for us. We still have some other “knobs to turn” like which optimizer to use and the like, but all the hard work happens “under the covers”. That’s the Good News. The corresponding Bad News is that you have to be careful that the entire computation is done using TF primitives or else you hit a path in the compute graph for which no gradients are available.

I recognized the syndrome because I’d seen it before on this specific exercise, although it’s probably been seen only single digit times in the 5+ years this course has been live. It’s worth considering what they could say in the notebook to explain this point without getting too deep into the weeds.

2 Likes