My network has:
-
- Layer 1:
(20, 12288)
- Layer 2:
(7, 20)
- Layer 3:
(5, 7)
- Layer 4:
(1, 5)
-
Backpropagation and Gradients: The gradients (dW
and db
) are computed for all layers correctly during the first iteration, and the parameters (W
and b
) for all layers are updated without issue.
-
Problem: In subsequent iterations, the parameters for the last layer (layer 4) are not updated, even though gradients for all layers, including the last one, are computed.
-
I Checked the shapes of gradients and parameters before and after updates for all layers. They seem consistent during the first iteration.
-
Here’s the log of updates from my implementation:
Iteration 0:
Updating layer 1
Before: W1 shape (20, 12288)
After: W1 shape (20, 12288)
...
Updating layer 4
Before: W4 shape (1, 5)
After: W4 shape (1, 5)
Iteration 1:
Updating layer 1
Before: W1 shape (5, 10)
After: W1 shape (5, 10)
...
Updating layer 3
Before: W3 shape (1, 6)
After: W3 shape (1, 6)
Why would layer 4 parameters be updated correctly in iteration 0 but not in subsequent iterations and why the dimensions change after first iteration?
You see values after iteration 0, but how do you know they are correct? They could just be the results of the initialization. The function update_parameters
is correct, because it is just given to you, so that means you must not be using it properly. E.g. the variable names you are passing are incorrect. Or you are assigning the return value to the wrong variable name.
Also that doesn’t look right. Why is the shape of W3 different than (5,7)?
i printed them out before nd after the update function nd I also debugged the code using pdb so I verified in pdb also.
How do you know what the correct values are after iteration 0? Did you consider the suggestions I made for what the bug is? E.g. that you assign the return value from update_parameters
to the wrong variable name (maybe params
instead of parameters
).
yes, i checked nd its assigned to parameters nd not params. i want to confirm one thing that relu_backward nd sigmoid_backward are part of the dnn_app_utils_v3 so I don’t need to define them explicitly. right?
Yes, that’s true, but you don’t need me to confirm that: just click “File → Open” and then open the file dnn_app_utils_v3.py
.
Maybe it’s time for me to look at your code. We can’t do that on a public thread, but I will send you a DM about how to proceed with that.
Note that you should not hand copy any of your functions from the “Step by Step” exercise here. They are provided for you in that file.
To close the loop on the public thread: the problem was that several of the “helper” functions were “hand created” instead of using the ones provided in dnn_app_utils_v3.py
. Then the problem was the hand created update_parameters
only updated the parameters for layers 1 and 2.
1 Like