Random initialization

The cost starts very high. This is because with large random-valued weights, the last activation (sigmoid) outputs results that are very close to 0 or 1 for some examples, and when it gets that example wrong it incurs a very high loss for that example. Indeed, when


, the loss goes to infinity.

This is the line mentioned in the observations of programming assignments. If w is large z will be large then a will be closer to one. How it will be zero? please explain

Note that the coefficients are both positive and negative, as are the inputs, right? So if the absolute values of the coefficients are large, then the Z values can be large in absolute value and either positive or negative. As z \rightarrow -\infty, \sigma(z) \rightarrow 0.