Why Matrix Multiplication is more efficient than looping

The nature of matrix multiplication is nothing more than just multiplying the elements of the input vector with the corresponding parameters. In other words, it is not supposed to be faster as it is doing the same job behind the scenes.

Then why is it indeed faster?

Maybe see What is NumPy? — NumPy v1.23 Manual

Which contains nuggets like this one: Vectorization describes the absence of any explicit looping, indexing, etc., in the code - these things are taking place, of course, just “behind the scenes” in optimized, pre-compiled C code.