Dl/DA Gradient First Input Same or Not for All Activation

Hi Sir,

For dJ / dA[L] which is derivative of the cost function with respect to A[L] final layer, we are having below formula. This formula we got by plugging sigmoid function in the cost function. Is the formula will be same for all the activation function if we plug in to the cost function ? or the below equation could be different for different activation function after derive ? If its different where can I get the derived equation for All activation covered in the lecture video? Please kindly help on this.

DA

Dear Mentor ? Can someone please help to answer this ? Is the above formula remains the same for computing dAL whatever the activation functions used in the output layer ?

or Is the above formula works only for sigmoid function present in the output layer?

Dear Mentor Can you please help on this question?

@afofonjka
@suki
@roannalun
@sjfischer
@petrifast
@yanivh
@aimr