Z calculation for neural network is:

Please note that here W is NOT transposed

However, for the logistic regression it is transposed

What’s the reason for flipping the weight matrix?

Z calculation for neural network is:

Please note that here W is NOT transposed

However, for the logistic regression it is transposed

What’s the reason for flipping the weight matrix?

1 Like

We apply the dot product between W and X to form the equation `w1x1 + w2x2 + ... + wnxn`

for each x example. The dot product requires that number of columns in first matrix should match the number of rows in second, `A (p, q) . B (q, r)`

.

In logistic regression initializations, W and X have same orientation as there’s a one-to-one relation. Therefore, we apply transpose operation on the matrix to make them dot-product compatible such that the dot-product results in a single value for each x example. This value for each example is its y value.

In neural network, the matrix orientations are different. Dimensions of first layer’s W is

`number of units in first layer X number of input features`

. That means, each row corresponds to one activation which has weights for each of the input features - a many-to-many relation. The dimensions of the X is number of `features X number of examples`

. Since the matrices are already dot-product compatible, there’s no need to transpose either matrix. Similarly in second, third and deeper layers, the dimensions of W are `number of units in the layer X number of units in previous layer`

which makes them compatible with previous layer’s A matrices for dot-product. The final layer has units `number of output classes X number of units in previous layer`

which maps the calculated value to output classes.

3 Likes

Thanks for your answer. My question is why are the orientations different? Seems like a strange convention.