Understanding of Add & Normal layer

Explain me, pease why are they Residual (skipping main layers) and why are they used at all?

This is one link for that, but there are many other posts like this.

so, why are they specially in our transformers? And how to detect that diminishing/exploding gradients can be really good fixed by skipping layers?

Also, do I understand correct, that this process is enough optional for transformers?