W 4 A1 | Ex- 5 | Wrong shape and output

This is L_model_forward which manages the forward propagation logic through all the layers of the network. So it’s pretty important to have a clear picture of what the overall process is before you start writing the code. One way to get a fairly concrete idea in a particular case like this is to work through what is called the “dimensional analysis” of what happens through all the layers. The way to do that is to start by writing down the shapes of all the input objects that we are given by the test case. In this case it is the routine

L_model_forward_test_case_2hidden

You can find that by clicking “File → Open” and then opening the file testCases.py. You can do that for yourself, but I’ll save you the trouble:

def L_model_forward_test_case_2hidden():
    np.random.seed(6)
    X = np.random.randn(5,4)
    W1 = np.random.randn(4,5)
    b1 = np.random.randn(4,1)
    W2 = np.random.randn(3,4)
    b2 = np.random.randn(3,1)
    W3 = np.random.randn(1,3)
    b3 = np.random.randn(1,1)
  
    parameters = {"W1": W1,
                  "b1": b1,
                  "W2": W2,
                  "b2": b2,
                  "W3": W3,
                  "b3": b3}
    
    return X, parameters

So from that, we have the following important shapes:

X is 5 x 4, so we have 5 input features and 4 samples.
W1 is 4 x 5 and b1 is 4 x 1, so layer 1 has 4 output neurons.
W2 is 3 x 4 and b2 is 3 x 1, so layer 2 has 3 output neurons.
W3 is 1 x 3 and b3 is 1 x 1, so layer 3 has 1 output neuron.

Now consider what happens when you do the “linear activation” at layer 1. Here is the formula:

Z1 = W1 \cdot X + b1

So that dot product is 4 x 5 dot 5 x 4, which gives Z1 as 4 x 4. Adding b1 won’t change the shape. A1 is the output of the layer 1 activation (relu) applied to Z1. The activation functions are always applied “elementwise” meaning that the shape doesn’t change. So A1 is 4 x 4.

Now we do layer 2:

Z2 = W2 \cdot A1 + b2

We get 3 x 4 dotted with 4 x 4 which gives 3 x 4 output. So both Z2 and A2 will be 3 x 4.

Then at layer 3 we have:

Z3 = W3 \cdot A2 + b3

which will be 1 x 3 dotted with 3 x 4 which gives 1 x 4 output for Z3 and A3.

Ok, that now gives us the complete picture of the dimensions that should occur at each step. Now compare that to what you get: your AL value (which should be A3) is 3 x 4. So how could that happen? Note that 3 x 4 is the shape of A2. So one possibility is that you skipped the processing for layer 3.

But at least now you have something concrete to compare with your results, which is why “dimensional analysis” is always a recommended way to start debugging in a situation like this.

17 Likes