Why do we use b in the equation?
It is not entirely clear which equation you are referring to. I assume you mean y = a*x + b? In that case you could see it as the average target value.
If all features (X) are normalized, than the average case would be all X’s are zero. The only term left is b.