Getting wrong output for dZ2 using the backward test case

Ah, ok, I tried some instrumentation and now I realize the issue:

Prof Ng’s formulas are just fine. The problem is that if you use his formulas at the output level and skip calling linear_activation_backward, then you hit the problem that the test case input values are just randomly generated. That means that they don’t satisfy all the same mathematical relationships that the real values generated by forward propagation would satisfy.

What happens when you call linear_activation_backward is that it is a general function that works for any layer, so it has to use the general formula for dZ^{[l]}:

dZ^{[l]} = dA^{[l]} \cdot g^{[l]'}(Z^{[l]})

But what you are doing by using the formula:

dZ^{[2]} = A^{[2]} - Y

is that you’ve “short-circuited” the calculation shown above in the general case with the special simplifications you get at the output layer. At the output layer, the derivative of the activation function is:

g'(Z) = g(Z) * (1 - g(Z)) = A * (1 - A)

because the activation is sigmoid. But if you look at the way that the test inputs are generated it is this (as we discussed on that other thread):

def L_model_backward_test(target):
    np.random.seed(3)
    AL = np.random.randn(1, 2)
    Y = np.array([[1, 0]])

    A1 = np.random.randn(4,2)
    W1 = np.random.randn(3,4)
    b1 = np.random.randn(3,1)
    Z1 = np.random.randn(3,2)
    linear_cache_activation_1 = ((A1, W1, b1), Z1)

    A2 = np.random.randn(3,2)
    W2 = np.random.randn(1,3)
    b2 = np.random.randn(1,1)
    Z2 = np.random.randn(1,2)
    linear_cache_activation_2 = ((A2, W2, b2), Z2)

    caches = (linear_cache_activation_1, linear_cache_activation_2)

Since AL and Z2 are just random values, it is not the case that sigmoid(Z2) = AL. You can clearly see that, since the values of AL are not between 0 and 1, which any output of sigmoid would be.

So even if your code is correct, it may not pass the test cases unless you write the code using the general formulas that do not take into account any special relationships that are particular to the output layer.

There have been other instances of this, e.g. this one from Week 3.