Why is the output of same function changing?

You can recreate the problem by defining max_qsa and max_qsa_1 like shown in code below. But If I comment out either of them then I get the correct value and unit test is passed. So what is happening here, why different values for the seemingly same function? I used different variables to store both of them so how are they interacting with each other?

Even if we consider that there is some relation between these two, what I don’t understand is that when I use max_qsa to calculate y_targets the unit test fails just because I did not comment out max_qsa_1 which is calculated after max_qsa

max_qsa = tf.reduce_max(target_q_network(next_states), axis=-1)
max_qsa_1 = tf.math.reduce_max(target_q_network(next_states), axis = -1)
print(next_states.shape,“\n”, max_qsa.shape, max_qsa, “\n”, max_qsa_1.shape, max_qsa_1)

The Output I get,

(64, 8)
(64,) tf.Tensor(
[0.99091566 0.7803635 0.9108061 0.8211219 0.97421265 0.8894425
0.89632255 0.8730944 0.9247408 0.8849665 0.717595 0.8369978
0.6383553 0.8805386 0.59138715 0.3028583 0.7417048 0.678377
0.9825898 0.8164424 0.78726345 0.9588025 0.790637 0.68895507
0.9257748 0.7466575 0.9305598 0.9786728 0.66059834 0.5379877
0.6962701 0.6873031 0.9795902 0.76472664 0.53483224 0.8009946
0.931447 0.9192375 0.9985205 0.97182006 0.84173685 0.91101795
0.72564256 0.8846819 0.92810905 0.97691613 0.71035457 0.9904062
0.5268519 0.9332023 0.9594706 0.31268117 0.953677 0.75806165
0.8905068 0.97132045 0.9225045 0.8797221 0.44434816 0.69102734
0.87332314 0.98412657 0.25284147 0.6399746 ], shape=(64,), dtype=float32)
(64,) tf.Tensor(
[0.59054655 0.70252305 0.9370804 0.9380285 0.5514316 0.9473759
0.96769524 0.66188836 0.80674684 0.6864434 0.9898835 0.9351531
0.94776404 0.629853 0.67449886 0.7802023 0.86997616 0.68371034
0.9071091 0.86848813 0.8974983 0.6919839 0.82846814 0.96987057
0.95526224 0.765184 0.66618806 0.62512106 0.9498464 0.9615725
0.60589635 0.9410133 0.84332323 0.8922739 0.98141044 0.64990115
0.6508313 0.92758095 0.9666125 0.68206626 0.79186386 0.9466767
0.854238 0.89493954 0.65039814 0.98667747 0.71392804 0.9117632
0.94249886 0.8301565 0.7423834 0.9029359 0.17272747 0.7783009
0.97623825 0.94857275 0.9306444 0.9854449 0.7251667 0.3268307
0.79966086 0.7393878 0.98624676 0.6192133 ], shape=(64,), dtype=float32)

Irrelevant backstory

I was doing the C3_W3_A1 lab. We needed the max of Q(s, a) and I did not see that max_qsa was already defined in code so I searched online and used tf.math.reduce_max instead of tf.reduce_max that was used to defined max_qsa. Because I did not see the already defined max_qsa I used the tf.math.reduce_max directly in the code I had to implement.

The problem came when I had used tf.reduce_max two times on same variable even though they should not be related since the first time it was used is stored in max_qsa and the second time it was used directly in the y_target code.

Hello @tinted, try to call tf.reduce_max two times and you will also see their outputs to be different, so the difference does not come from tf.reduce_max (or tf.math.reduce_max), but something in the test_compute_loss. Open public_tests.py to inspect it and you should be able to find out the reason.

Raymond

1 Like

I went through public_test.py line by line and I could not find anything that would cause this weird behaviour.

loss = target((states, actions, rewards, next_states, done_vals), 0.995, q_network_random, target_q_network_random)

assert np.isclose(loss, 0.6991737), f"Wrong value. Expected {0.6991737}, got {loss}"

This is where it is failing and that means that it is related to the compute_loss function defined in Assignment notebook.

I think this is either a bug of tensorflow reduce_max function or a behaviour of this function that I don’t understand.

Error
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
<ipython-input-15-794445d1269d> in <module>
      1 # UNIT TEST
----> 2 test_compute_loss(compute_loss)

~/work/public_tests.py in test_compute_loss(target)
     57 
     58 
---> 59     assert np.isclose(loss, 0.6991737), f"Wrong value. Expected {0.6991737}, got {loss}"
     60 
     61     # Test when episode terminates

AssertionError: Wrong value. Expected 0.6991737, got 0.7275815010070801

image

Here is a hint. Think about the effect of those np.random.rand.

1 Like

Thank you for not giving the answer straight away. This way I understood things better because I had to think more.

Anyway, the problem is the initialization of random values right? even though we use a set random seed, If I run any of the functions that use random values more than once for testing purposes, it will change the random seed for the next thing that will use that randomness, right?

You are welcome, @tinted!

So far so good…

I would say “it will generate a different set of numbers”.

For the following, it would be better for you to confirm with a computer scientist.

The usual understanding is each seed represents a “sequence of numbers” (though it isn’t really hard coding those numbers one by one somewhere, instead, the sequence can be deterministically generated by some pre-defined formula). First, you seed a random number generator (RNG), the seed determines effectively which “sequence of numbers” to use, each time you need a new generated number, it “picks from the sequence the next unused number” which is then post-modified to the generation requirement. So, there is no need to change the seed, instead, keep the same seed, and it gives you all the numbers your need.

The idea of “Sequence” was used in the hope that the message is easier to digest than keep emphasizing how the numbers are generated by some formula.

Cheers,
Raymond

1 Like

The sequence of numbers makes more sense. Thanks for the explanation.

Yes, or we might say it your way with some emphasis on the recursion nature of the RNG that you wanted to talk about.