I felt an induction step on Z looks simpler than an induction step on A…

Here is a summary.

specs:

a(l)=g(z(l))

z(l)=w(l)'a(l-1)+b(l)

j=-(ylog(a)+(1-y)log(1-a))

J=sum(j)/m

back prop induction:

dZ(l-1)=w(l)dZ(l) * g’(Z(l-1)) [stores j-derivatives columnwise, each example is 1 column]

dw(l-1)=A(l-2)dZ(l-1)’/m [J-derivative]

db(l-1)=sum(dZ(l-1) over columns)/m [J-derivative]

Though, I dont know much about the literature and any specific advantages the original representation may provide. Look forward to your views.