in the weights initialisation prof andrew says that if we have a large num of input units z will be very large because we sum a lot of terms z=w1x1+w2x2+…wnxn but the terms will not be necessary positive and thus even though we have a lot of inputs some of them may cancel each other and therefore z will not be very large

This is all about statistical behavior. Yes, the weights can be both positive and negative and maybe the inputs can be both positive and negative as well (depending on the nature of your inputs and what your activation functions are in the hidden layers). So some may cancel each other out. But that is not a guarantee, right? Sometimes they do and sometimes they don’t. The concept of Expected Value is a statistical way to analyze this type of question.