Z calculation for neural network is:
Please note that here W is NOT transposed
However, for the logistic regression it is transposed
What’s the reason for flipping the weight matrix?
Z calculation for neural network is:
Please note that here W is NOT transposed
However, for the logistic regression it is transposed
What’s the reason for flipping the weight matrix?
We apply the dot product between W and X to form the equation w1x1 + w2x2 + ... + wnxn
for each x example. The dot product requires that number of columns in first matrix should match the number of rows in second, A (p, q) . B (q, r)
.
In logistic regression initializations, W and X have same orientation as there’s a one-to-one relation. Therefore, we apply transpose operation on the matrix to make them dot-product compatible such that the dot-product results in a single value for each x example. This value for each example is its y value.
In neural network, the matrix orientations are different. Dimensions of first layer’s W is
number of units in first layer X number of input features
. That means, each row corresponds to one activation which has weights for each of the input features - a many-to-many relation. The dimensions of the X is number of features X number of examples
. Since the matrices are already dot-product compatible, there’s no need to transpose either matrix. Similarly in second, third and deeper layers, the dimensions of W are number of units in the layer X number of units in previous layer
which makes them compatible with previous layer’s A matrices for dot-product. The final layer has units number of output classes X number of units in previous layer
which maps the calculated value to output classes.
Thanks for your answer. My question is why are the orientations different? Seems like a strange convention.