w = np.array([[1.], [2.]])
b = 2.
X =np.array([[1., 2., -1.], [3., 4., -3.2]])
Y = np.array([[1, 0, 1]])
Are X and w, Y inconsistent? X=(2,3), 2 instances and 3 params in each instance. Should it be the other way around (3,2) to match with w and Y. What am I missing here?
The design matrix X is organized with the individual training examples (m) in the columns and the features (n) in the rows. So the matrix X is n x m, not m x n. So here we have two features and three examples. This is an unconventional ordering, but Prof Ng feels that it is more natural for learners, and he’s the boss!
Please study the expression for the activation (A) in the in the second bullet point in the “Hints” to Exercise 5. The matrixes are indeed conformable according to the rules of matrix algebra.
Prof Ng very clearly taught us that X is an (m,n) matrix. Why should this example be different? Will the formula Y=w.T*X+b work in this case? The propogate function needs to be used in the cat classification example later, which will go by X being (m,n)? Will it not lead to inconsistency? As it is I am struggling with shapes and broadcasting!
In standard matrix algebra notation, an arbitrary matrix has m rows and n columns. Most deep learning practitioners organize their feature matrices (where the training data is kept), say X, with the examples (i.e. observations) in rows and the individual features in columns. This is most likely how the "m training examples" convention arose.
Prof. Ng organizes the feature matrix X the other way around. In this Specialization, the "m training examples" language convention is adhered to, and n_x refers to the number of data features. This means that X is an \left(n_x,m\right) matrix for our purposes.
You may want to review the lecture material on “vectorization” and also brush up on your linear algebra. For example, here is the Wikipedia entry for matrix multiplication.
Got it. Thanks. I still think it is better to stick to a single convention!