I remember Andrew has mentioned that if weight matrix is greater than Identity matrix(in general sense) the model will have exploding gradient where activations will increase drastically layer by layer and similar will happen to gradients as well.

But in sigmoid activation function for larger W , we will have larger Z which will lead to very low gradients. Because slope of sigmoid at larger Z is very low (almost parallel to x axis)

The above two ideas are contradicting.