Things work out differently at the output layer because of the way that the derivative of sigmoid and the derivative of the cross entropy loss function work nicely together. Here’s a thread which shows that. At the inner layers of the network, you don’t get that nice simplification.
But note that derivations involving calculus are beyond the scope of these courses. If you have some math background, here’s a thread with links to more information on the derivations of back propagation.