First of all, sorry for asking too many questions! My question is that Andrew is explaining the importance of applying normalization to the activation functions output simply by taking the mean and divide by the standard deviation. I do believe this is the definition of standardization not normalization since the latter means to rescale the values between [0,1]. I revised these two definitions in these two links, link 1 and link 2.
Could you correct me if something is wrong with me?
Thanks for your patience!