C3_W3_A1_Assignment : Odd behavior tensorflow, recomputing activations changes result

schadd · November 11, 2025, 4:04pm

Hello,

I am seeing some strange behavior while doing the reinforcement learning assignment of week 3, particularly the computation of the loss of y and Q. I will keep this descriptive to not violate the forum rules.

When doing assignments, I like doing things step-by-step to understand better what’s going on. Later I reformulate things as a one-line code solution. Now I am at the stage where I have the correct one-line solution, however it stops working when I leave a particular line un-commented which should have no effect on the remaining calculation. This line just takes the states and runs it through the target-network again. The result is not used anywhere else in the solution, I just print it out to see it, yet doing so causes the validation to fail with this error:

AssertionError                            Traceback (most recent call last)
<ipython-input-16-794445d1269d> in <module>
      1 # UNIT TEST
----> 2 test_compute_loss(compute_loss)

~/work/public_tests.py in test_compute_loss(target)
     57 
     58 
---> 59     assert np.isclose(loss, 0.6991737), f"Wrong value. Expected {0.6991737}, got {loss}"
     60 
     61     # Test when episode terminates

AssertionError: Wrong value. Expected 0.6991737, got 0.7275815010070801

If I comment out the duplicate computation of the network activation again it passes. Theoretically, running exactly the same input through a network shouldn’t change the results, so why is it the case here?

rmwkwok · November 12, 2025, 2:25am

Hello @schadd,

I get your point, and I had been in situations similar to yours and it turned out that something were just overlooked. In this sense, that would be a very good exercise for debugging which is what a ML Engineer should be able to handle.

I am assuming the exercise of concern is the exercise 2 for the function compute_loss, given that the error you shared came from a test for exercise 2.

First, if I were you, I would compare the values of all intermediate variables before and after un-commenting that line, and look for differences. Take the template code offered in the Hints section as an example, I would compare the values for all underlined variables:

Then I would find out the first difference after the un-commented line, and investigate how the un-commented line could have made that change.

Additionally, to make sure that the comparison is as fair as possible, I would restart the Kernel, run it with the line commented, save the values of the intermediate variables, then restart the Kernel again, run it with the line un-commented this time. Giving them new kernels can make sure that there is no dependence on old Kernel activities. This way, any difference should be accounted for by that un-commented line.

Good luck,
Raymond

rmwkwok · November 12, 2025, 2:33am

I don’t know what could be overlooked for your case, maybe it’s something about the uncommented line or maybe it’s some provided code in the assignment, but either way I think we should be able to identify, because even if it was the provided code, think about how frequently a ML Engineer will need to work on existing works by teammates, figuring it out would be part of the Engineer’s work.

schadd · November 12, 2025, 4:04pm

I’ve been writing some debug code and I managed to replicate the issue so that I don’t need a nonsense line anymore:

I’ve commented out the line which computed a_hat, which before I had prior to computing y_targets and was changing the results. The interesting thing here is what happens to max_qsa. If I copy the exact same code that computes max_qsa from the exercise, it changes the outcome. The same happens when I compute the q_values. I have the same statement for q_values_prior and q_values_post, and yet their values are different:

Comparing max_qsa recomputed
max_qsa recomputed is different!
before: [0.99091566 0.7803635  0.9108061  0.8211219  0.97421265]
after: [0.9159158  0.90291685 0.9978843  0.98970693 0.9268412 ]
Comparing y_targets
y_targets stayed the same
Comparing q_values
q_values is different!
before: [[0.29469204 0.2696547  0.59054655 0.38053408]
 [0.70252305 0.49612394 0.6332093  0.35339436]
 [0.28704396 0.13320711 0.9370804  0.80483586]
 [0.9380285  0.18538052 0.9331558  0.9357532 ]
 [0.2835445  0.27463424 0.5514316  0.18528923]]
after: [[0.39206588 0.2838675  0.76172984 0.24048433]
 [0.25423235 0.08408635 0.8641441  0.44789892]
 [0.56178623 0.73671097 0.7964889  0.44750813]
 [0.18412755 0.82873285 0.03099796 0.9467283 ]
 [0.57697785 0.87538874 0.60856545 0.25165957]]

rmwkwok · November 13, 2025, 1:40am

Hello @schadd,

To discuss this with you, I have also investigated the problem and I have found the reason. From what I see in your last post, you are almost there. What I don’t see is how you used those results to derive the next step of your investigation. Here is part of my thought flow:

There are 3 components that compute max_qsa: (1) tf.reduce_max (2) target_q_network (3) next states.
To get a different max_qsa, at least one of them must change.
Given your debug code, you should already be able to tell whether next states had changed.
I could print tf.reduce_max to make sure it is a tensorflow function, and if so, I will trust that its behavior wouldn’t change. (but I will keep in mind that it’s by trust, so if I can’t find a reason from other things, I will go back to it. Consider it a lower priority.)
What about target_q_network? This name didn’t guarantee anything. I would have to look into that.

@schadd, because this is a forum for learning (different from stackoverflow, in my opinion) and I am the kind of mentor who thinks one learns from debugging work, I will hold myself from telling you the answer. I hope you wouldn’t misunderstand me

Great work!

Cheers,
Raymond

schadd · November 17, 2025, 10:52am

I am hitting a plateau here trying to figure this out. Normally I would hop into the debugger to find out the issue, but you can’t do that here on a remote notebook. I’ve been trying to set up a local environment to run the example in a debugger locally, but that sent me down an endless rabbithole of version incompatibilities and dependency issues when trying to replicate the exact environment of the notebook (e.g. pyvirtualdisplay not being compatible with windows at all, the used gym version secretly not being compatible with python 3.11 since its numpy dependency isn’t…).

Based on my intuition I would suspect target_q_network being different. It could be something like the list of cached activations being different since I added an unexpected set of activation due to my manual tomfoolery. However, contrary to what the docstring indicates, you’re not actually getting a keras network in the input during the test, but a randomized numpy array that is wrapped in a function. So I can’t print out and check activations/gradients since there are none. Maybe because I did a manual step you now get a different randomized array even with a fixed seed?

rmwkwok · November 18, 2025, 12:34am

Hello @schadd,

Your intuition about target_q_network being different is correct, and since you mentioned about the random arrays, I assume you have read the test function defined inside “public_tests.py”. The thing about the seed is that it only gets set once at the beginning, then it is never set again between any two random number generations. This means that, if we call the pseudo target_q_network twice, they give two different outputs.

I wanted to introduce you the breakpoint() function, Python’s native interactive code debugger, but when I tried it myself for the assignment, it kept killing the kernel, so unfortunately, we couldn’t demonstrate it with the assignment.

Raymond

schadd · November 20, 2025, 12:47pm

Jupyter’s notebook style is notorious for encouraging the worst programming practices, and in my personal experience also for keeling over any time you look at the kernel funny.

In any case, stubborn me tried again with a completely fresh python venv and I got most of the code running in a proper dev environment. With that, it became quite evident that unknowingly I was effectively adding another invocation of np.random in-between the tests which they weren’t expecting, thus causing the test that came after to get a different result of np.random since they were written with just a single global fixed seed.

Thanks for the support. I was doubting my understanding of networks for a moment, but it’s really just a quirk with how the tests are written.

Etan1 · January 16, 2026, 6:00am

Hello, isn’t there a bug in this lab? I have done it correctly, but the autograder is still counting it wrong.

Thanks,

Etan

TMosh · January 16, 2026, 6:53am

Please post a screen capture that shows the detailed feedback from the grader.

Perhaps your code only works for the tests that are in the notebook - but does not work for the separate tests that the grader uses.

Etan1 · January 17, 2026, 4:27am

I couldn’t get the full screenshot, but this is what the feedback said:

Code Cell UNQ_C1: The value of your variable ‘q_network’ is correct.
Code Cell UNQ_C1: The value of your variable ‘target_q_network’ is correct.
Code Cell UNQ_C1: The value of your variable ‘optimizer’ is correct.
Code Cell UNQ_C2 failed: compute_loss test 1 failed
If you see many functions being marked as incorrect, try to trace back your steps & identify if there is an incorrect function that is being used in other steps.
This dependency may be the cause of the errors.

Thanks!

TMosh · January 17, 2026, 5:18am

This tells you which function to inspect for mistakes.

The grader uses different test than the ones in the notebook. Your code must work with any set of data and conditions.

Etan1 · January 17, 2026, 5:48am

No matter what I do, the feedback is the same. Isn’t this error known, and haven’t Coursera staff been known to override this lab?

Etan

TMosh · January 17, 2026, 6:23am

99.9% certain It’s not an error in the lab.
It’s far more likely an error in your code.

Isn’t this error known…

Not to my knowledge. Do you have some evidence for this?

TMosh · January 17, 2026, 6:24am

No, Coursera never does that.

Etan1 · January 17, 2026, 6:31am

I don’t, but I have noticed some issues with the lab. Thank you for your information! There is still one question I have with the lab: Is it possible to reset the entire lab completely?

Thanks,

Etan

TMosh · January 17, 2026, 6:42am

Yes, there is.

Use the File menu, and delete your current notebook.

Then use the Lab Help tool, and use “Get Latest Version”. This will give you a new copy of the lab.

Then exit the lab and re-enter it. This will open your new notebook. Then you can start over on the assignment.

Etan1 · January 17, 2026, 6:51am

Once I click the File menu, what do I click next to delete my current notebook?

TMosh · January 17, 2026, 7:13am

Click on the checkbox next to the .ipynb file, then a red trash can icon will appear in the toolbar. Click on that.

Etan1 · January 17, 2026, 6:59pm

Hello, this is what I see when I use the File menu. Where exactly is the checkbox?

Topic		Replies	Views
Why is the output of same function changing? Unsupervised Learning, Recommenders, Reinforcement week-module-3	7	347	September 23, 2023
Week 3 programming c2 Unsupervised Learning, Recommenders, Reinforcement week-module-3	3	434	June 28, 2023
C3_W3_Assignment1 Unsupervised Learning, Recommenders, Reinforcement week-module-3	3	578	December 4, 2022
Test_compute_loss fails in my assignement Unsupervised Learning, Recommenders, Reinforcement week-module-3	4	521	March 10, 2023
C3_W3_A1_Assignment - Lunar Lander Question Unsupervised Learning, Recommenders, Reinforcement week-module-3	11	227	January 4, 2026

C3_W3_A1_Assignment : Odd behavior tensorflow, recomputing activations changes result

Related topics