Here’s another thread from a while ago that goes through how we get from w being n_x x 1 in the Logistic Regression case to the way the W weight matrices work here and why we no longer need the transpose for real Neural Networks.
1 Like