Paul sir! Please correct me if I am wrong.
Suppose we have a three-layer model (2 hidden and 1 output). The chain-rule for dZ1, dW1, and db1 are:
\frac{dL}{dZ1} = \frac{dL}{dA3} \times \frac{dA3}{dZ3}\times \frac{dZ3}{dA2}\times \frac{dA2}{dZ2}\times \frac{dA1}{dZ1}
\frac{dL}{dW1} = \frac{dL}{dA3} \times \frac{dA3}{dZ3}\times \frac{dZ3}{dA2}\times \frac{dA2}{dZ2}\times \frac{dZ2}{dA1}\times \frac{dA1}{dZ1}\times\frac{dZ1}{dW1}
\frac{dL}{db1} = \frac{dL}{dA3} \times \frac{dA3}{dZ3}\times \frac{dZ3}{dA2}\times \frac{dA2}{dZ2}\times \frac{dZ2}{dA1}\times \frac{dA1}{dZ1}\times\frac{dZ1}{db1}
In dW1, we do not take derivative w.r.t. any other weights like W2 or W3, right? Same for b.