I am seeing some strange behavior while doing the reinforcement learning assignment of week 3, particularly the computation of the loss of y and Q. I will keep this descriptive to not violate the forum rules.
When doing assignments, I like doing things step-by-step to understand better what’s going on. Later I reformulate things as a one-line code solution. Now I am at the stage where I have the correct one-line solution, however it stops working when I leave a particular line un-commented which should have no effect on the remaining calculation. This line just takes the states and runs it through the target-network again. The result is not used anywhere else in the solution, I just print it out to see it, yet doing so causes the validation to fail with this error:
AssertionError Traceback (most recent call last)
<ipython-input-16-794445d1269d> in <module>
1 # UNIT TEST
----> 2 test_compute_loss(compute_loss)
~/work/public_tests.py in test_compute_loss(target)
57
58
---> 59 assert np.isclose(loss, 0.6991737), f"Wrong value. Expected {0.6991737}, got {loss}"
60
61 # Test when episode terminates
AssertionError: Wrong value. Expected 0.6991737, got 0.7275815010070801
If I comment out the duplicate computation of the network activation again it passes. Theoretically, running exactly the same input through a network shouldn’t change the results, so why is it the case here?
I get your point, and I had been in situations similar to yours and it turned out that something were just overlooked. In this sense, that would be a very good exercise for debugging which is what a ML Engineer should be able to handle.
I am assuming the exercise of concern is the exercise 2 for the function compute_loss, given that the error you shared came from a test for exercise 2.
First, if I were you, I would compare the values of all intermediate variables before and after un-commenting that line, and look for differences. Take the template code offered in the Hints section as an example, I would compare the values for all underlined variables:
Then I would find out the first difference after the un-commented line, and investigate how the un-commented line could have made that change.
Additionally, to make sure that the comparison is as fair as possible, I would restart the Kernel, run it with the line commented, save the values of the intermediate variables, then restart the Kernel again, run it with the line un-commented this time. Giving them new kernels can make sure that there is no dependence on old Kernel activities. This way, any difference should be accounted for by that un-commented line.
I don’t know what could be overlooked for your case, maybe it’s something about the uncommented line or maybe it’s some provided code in the assignment, but either way I think we should be able to identify, because even if it was the provided code, think about how frequently a ML Engineer will need to work on existing works by teammates, figuring it out would be part of the Engineer’s work.
I’ve commented out the line which computed a_hat, which before I had prior to computing y_targets and was changing the results. The interesting thing here is what happens to max_qsa. If I copy the exact same code that computes max_qsa from the exercise, it changes the outcome. The same happens when I compute the q_values. I have the same statement for q_values_prior and q_values_post, and yet their values are different:
To discuss this with you, I have also investigated the problem and I have found the reason. From what I see in your last post, you are almost there. What I don’t see is how you used those results to derive the next step of your investigation. Here is part of my thought flow:
There are 3 components that compute max_qsa: (1) tf.reduce_max (2) target_q_network (3) next states.
To get a different max_qsa, at least one of them must change.
Given your debug code, you should already be able to tell whether next states had changed.
I could print tf.reduce_max to make sure it is a tensorflow function, and if so, I will trust that its behavior wouldn’t change. (but I will keep in mind that it’s by trust, so if I can’t find a reason from other things, I will go back to it. Consider it a lower priority.)
What about target_q_network? This name didn’t guarantee anything. I would have to look into that.
@schadd, because this is a forum for learning (different from stackoverflow, in my opinion) and I am the kind of mentor who thinks one learns from debugging work, I will hold myself from telling you the answer. I hope you wouldn’t misunderstand me
I am hitting a plateau here trying to figure this out. Normally I would hop into the debugger to find out the issue, but you can’t do that here on a remote notebook. I’ve been trying to set up a local environment to run the example in a debugger locally, but that sent me down an endless rabbithole of version incompatibilities and dependency issues when trying to replicate the exact environment of the notebook (e.g. pyvirtualdisplay not being compatible with windows at all, the used gym version secretly not being compatible with python 3.11 since its numpy dependency isn’t…).
Based on my intuition I would suspect target_q_network being different. It could be something like the list of cached activations being different since I added an unexpected set of activation due to my manual tomfoolery. However, contrary to what the docstring indicates, you’re not actually getting a keras network in the input during the test, but a randomized numpy array that is wrapped in a function. So I can’t print out and check activations/gradients since there are none. Maybe because I did a manual step you now get a different randomized array even with a fixed seed?
Your intuition about target_q_network being different is correct, and since you mentioned about the random arrays, I assume you have read the test function defined inside “public_tests.py”. The thing about the seed is that it only gets set once at the beginning, then it is never set again between any two random number generations. This means that, if we call the pseudo target_q_network twice, they give two different outputs.
I wanted to introduce you the breakpoint() function, Python’s native interactive code debugger, but when I tried it myself for the assignment, it kept killing the kernel, so unfortunately, we couldn’t demonstrate it with the assignment.
Jupyter’s notebook style is notorious for encouraging the worst programming practices, and in my personal experience also for keeling over any time you look at the kernel funny.
In any case, stubborn me tried again with a completely fresh python venv and I got most of the code running in a proper dev environment. With that, it became quite evident that unknowingly I was effectively adding another invocation of np.random in-between the tests which they weren’t expecting, thus causing the test that came after to get a different result of np.random since they were written with just a single global fixed seed.
Thanks for the support. I was doubting my understanding of networks for a moment, but it’s really just a quirk with how the tests are written.
I couldn’t get the full screenshot, but this is what the feedback said:
Code Cell UNQ_C1: The value of your variable ‘q_network’ is correct.
Code Cell UNQ_C1: The value of your variable ‘target_q_network’ is correct.
Code Cell UNQ_C1: The value of your variable ‘optimizer’ is correct.
Code Cell UNQ_C2 failed: compute_loss test 1 failed
If you see many functions being marked as incorrect, try to trace back your steps & identify if there is an incorrect function that is being used in other steps.
This dependency may be the cause of the errors.
I don’t, but I have noticed some issues with the lab. Thank you for your information! There is still one question I have with the lab: Is it possible to reset the entire lab completely?