X is a matrix in which each column is one training example.

How is it so? I thought Dimension of X was (n,m) where n= number of training examples and m = number of input features.

It is a matter of choice. If you define X as the number of training examples times input features, equations change, for example:

A = W X + b

would become

A = X W + b etc.

1 Like