I don’t know why such things keep happening to me …
So I’m trying to program out the “backpropagation” in Week 3’s programming assignment.
The test case consists of two parts:
-
A test that just prints results to STDOUT, and the student can then visually compare with what’s on the screen
-
A test called
backward_propagation_test()
inpublic_test.py
which compres the student-generated values against “shall-be” values using
assert np.allclose(output["dW1"], expected_output["dW1"]), "Wrong values for dW1"
etc.
It turns out that my code passes (1) (i.e. the output is as expected) but fails (2).
After trying out various things for a long time, it turns out that test (2) is really wonky.
It initializes all parameters to random values (but with the random seed set, so those random values are consistently the same on each run), as well as the cache, calls the user’s function once, then tests whether dW1, db1, DW2, db2 match its expectations.
However, this makes no sense:
cache = {'A1': np.random.randn(9, 7),
'A2': np.random.randn(1, 7),
'Z1': np.random.randn(9, 7),
'Z2': np.random.randn(1, 7),}
These values must be set to something from a valid feed-forward run!
Am I supposed to do that myself? Am I just confused?
The fun part is that the test case passes if one uses the A1
from the cache for computing the local derivatives of g[1], as indicated under “tips”. I did not do that. That’s why I ran into problems.
The test passes with
local_g1_derivatives = 1 - np.power(A1, 2)
local_g1_derivatives = 1 - A1**2
But the equivalent based on Z1
does not:
local_g1_derivatives = 1-(np.tanh(Z1)**2)
Althought db2
and dW2
match what is expected, db1
and dW1
do not.
P.S.
Here are some light fixes to the same backward_propagation_test(target)
Y
is built like this:
Y = (np.random.randn(1, 7) > 0)
This creates an array of booleans. Let’s keep the contract “Y is an array of numerics” as promised:
Y = (np.random.randn(1, 7) > 0).astype(np.uint8) # keep promise to deliver numerics, not
The tests should be ordered to examine the results that come early in the dataflow diagram first. This could give the student information that there are already problems with dZ2
for example. Otherwise if dW1
is tested first, it will obviously be wrong if dZ2
is already wrong, but the student will look at the wrong place. Hence:
# Type and shape as expected?
# Test the db2, dW2 first to give the user info about failure early in processing
assert type(output["db2"]) == np.ndarray, f"Wrong type for db2. Expected: {np.ndarray}"
assert type(output["dW2"]) == np.ndarray, f"Wrong type for dW2. Expected: {np.ndarray}"
assert output["db2"].shape == expected_output["db2"].shape, f"Wrong shape for db2."
assert output["dW2"].shape == expected_output["dW2"].shape, f"Wrong shape for dW2."
assert type(output["db1"]) == np.ndarray, f"Wrong type for db1. Expected: {np.ndarray}"
assert type(output["dW1"]) == np.ndarray, f"Wrong type for dW1. Expected: {np.ndarray}"
assert output["db1"].shape == expected_output["db1"].shape, f"Wrong shape for db1."
assert output["dW1"].shape == expected_output["dW1"].shape, f"Wrong shape for dW1."
# Content as expected?
# Test the db2, dW2 first to give the user info about failure early in processing
assert np.allclose(output["db2"], expected_output["db2"]), "Wrong values for db2"
assert np.allclose(output["dW2"], expected_output["dW2"]), "Wrong values for dW2"
assert np.allclose(output["db1"], expected_output["db1"]), "Wrong values for db1"
assert np.allclose(output["dW1"], expected_output["dW1"]), "Wrong values for dW1"