X is a matrix in which each column is one training example.
How is it so? I thought Dimension of X was (n,m) where n= number of training examples and m = number of input features.
It is a matter of choice. If you define X as the number of training examples times input features, equations change, for example:
A = W X + b
would become
A = X W + b etc.
1 Like