My network has 4 layers with the following dimensions:
- Layer 1:
(20, 12288)
- Layer 2:
(7, 20)
- Layer 3:
(5, 7)
- Layer 4:
(1, 5)
- Backpropagation and Gradients: The gradients (
dW
anddb
) are computed for all layers correctly during the first iteration, and the parameters (W
andb
) for all layers are updated without issue. - Problem: In subsequent iterations, the parameters for the last layer (layer 4) are not updated, even though gradients for all layers, including the last one, are computed and checked the shapes of gradients and parameters before and after updates for all layers. They seem consistent during the first iteration. Here’s the log of updates from my implementation:
Iteration 0:
Updating layer 1
Before: W1 shape (20, 12288)
After: W1 shape (20, 12288)
...
Updating layer 4
Before: W4 shape (1, 5)
After: W4 shape (1, 5)
Iteration 1:
Updating layer 1
Before: W1 shape (5, 10)
After: W1 shape (5, 10)
...
Updating layer 3
Before: W3 shape (1, 6)
After: W3 shape (1, 6)
Why would layer 4 parameters be updated correctly in iteration 0 but not in subsequent iterations?