Two part-question:
(1) Why is equation Z = wX + b mathematically possible? If w is a scalar, X is a vector, then wX is a vector…and I’m assuming b is a scalar (although we’re not explicitly told this, but it has the same notation as the scalar w)…isn’t this equation saying to add a vector and a scalar, which isn’t possible?
(2) What geometric transformation is taking place when we do this calculation? The way I’m trying to conceptualize this is (and I apologize in advance for this turning into a stream of consciousness…feels like I have all the pieces of the puzzle they’re just jumbled up):
We have a collection of datapoints represented by a matrix…which we can also think of as a collection of vectors where the head is the origin and the tail is the datapoint…and we want to project a line with randomly “initialized parameters” onto each vector…then compute the average so the resulting line after taking all of the projections is sort of an average of all lines running through each datapoint? If that’s accurate…and the dot product (or matrix*vector multiplication) tells us the projection of the vector created by the randomly initialized parameters onto each vector with their tails corresponding to datapoints…wouldnt a vector that results in a dot product of 1 actually get us closer to the best fit line between the data points? (As opposed to minimizing some function to zero?)