Why not teach softmax to cover both binary and multiclass classification?

Why not teach Softmax to cover both binary and multiclass classification?
It seems a better use of abstraction. There is nothing special about the number two.
Binary classification is an instance of multiclass classification.
Look at the equation below; you don’t need binary classification as a special case. This is a teaching waist of time.

We should teach computer scientists and ML engineers how to generalize. Too many one z , two zzz…

This course starts from the basics to the more advanced…

Softmax is better suited for multi-class classification.

Softmax is as basic and more intuitive than binary classification.
As an exercise, albeit anecdotal, I will try to teach both to my 15 years old son.
Now I have to keep two mental models in my head, where just one will do. This is still math, elegance matters.

Agreed, binary classification is just an instance of multiclass classification.