Here in the graph, W times X. X, the input feature is placed on the right side of W. But in the same video with another example, the input feature is placed on the left side of W (second image). Why the difference? Which order is correct?
You have to perform a dot product of the weight matrix i.e. the model weights and the input features. W x + b is the common notation for a single input x (lower case). X (upper case) represents a batch of inputs.
In the 2nd picture, look how each input is a row and the model weights are arranged as a column. So, if we consider a batch of input data, it’ll be of shape (batch_size, num_features). Model weights will have shape (num_features, 1). Matrix multiplication of X and W will have shape (batch_size, 1) as shown on the right side.
You can perform multiplication in any order so long as the operation performs the intended dot product and results are interpreted correctly.
So it means both WX+b and XW+b is correct depending on how the row and columns is arranged in W and X?
That is correct. Libraries like tensorflow have input shape of (batch_size, num_features).

