The Chain Rule deals with the composition of functions, so how the derivatives are handled depends on what the functions are. In some cases they involve dot products (linear activation) and in some cases they are “elementwise” operations, e.g. the activation functions. So for example \frac {\partial A1}{\partial Z1} is just the derivative of the layer 1 activation function, which was applied elementwise.

This is beyond the scope of this course: Prof Ng does not really cover the underlying calculus. Here’s a thread with lots of links to supplementary material about the mathematics of back propagation.