
And the formula for p( x ⃗ ) is given by this expression:
Now lets take the first part of this expression:

In this part x1 is the n=1 feature of x(i) training example and thus x1 is a vector with a total of “m” values if m = total number of training examples.
Considering that when we put the vector x1 in this formula:

How is this part “[ (x - µ) ]” calculated, since “x” here is a vector of length “m” and µ is a scalar quantity? How is then p(x) compressed down to a scalar value?
For m training examples, it doesn’t.
The resulting output will be a vector of m numbers (probabilities), not a scalar.
Hello @Ammar_Jawed,
On top of @Mujassim_Jamal’s answer, I would like to add that there are two angles here.
Computer program
If it is a computer program and you have implemented the equations with (e.g.) numpy, then we can subtract an array with a scalar. This operation is valid thanks to numpy’s “broadcasting” capability. For more on numpy broadcasting, check out this doc or run some examples like below. The result is that the subtraction will get you another array.
import numpy as np
a = np.array([1., 2., 3., ])
b = 3.
print(a - b)
Mathematics
If you are thinking like a mathematican, if I remember correctly, you can’t subtract a vector by a scalar. Essentially we are not substituting a vector into x, but each number one element from the vector into x, so that it becomes a scalar-scalar subtraction.
Cheers,
Raymond