Conceptualizing Week 3's Assignment

Two part-question:

(1) Why is equation Z = wX + b mathematically possible? If w is a scalar, X is a vector, then wX is a vector…and I’m assuming b is a scalar (although we’re not explicitly told this, but it has the same notation as the scalar w)…isn’t this equation saying to add a vector and a scalar, which isn’t possible?

(2) What geometric transformation is taking place when we do this calculation? The way I’m trying to conceptualize this is (and I apologize in advance for this turning into a stream of consciousness…feels like I have all the pieces of the puzzle they’re just jumbled up):

We have a collection of datapoints represented by a matrix…which we can also think of as a collection of vectors where the head is the origin and the tail is the datapoint…and we want to project a line with randomly “initialized parameters” onto each vector…then compute the average so the resulting line after taking all of the projections is sort of an average of all lines running through each datapoint? If that’s accurate…and the dot product (or matrix*vector multiplication) tells us the projection of the vector created by the randomly initialized parameters onto each vector with their tails corresponding to datapoints…wouldnt a vector that results in a dot product of 1 actually get us closer to the best fit line between the data points? (As opposed to minimizing some function to zero?) :face_with_spiral_eyes: :face_with_spiral_eyes: :face_with_spiral_eyes:

1 Like

The (wX + b) is shorthand for a vector dot product followed by scalar addition.

Or, if w and X are matrices, then b will be a vector, and it’s a vector addition.

You don’t really need to view it as a geometric transformation. w * X is simply computing the weighted sum of the examples (in X) and the weights for each feature (in w). It’s a more efficient implementation of the linear regression prediction (rather than implementing it via nested for-loops). Numpy is very good at matrix algebra.