Course 4 Week 1, Step_by_Step Confusion over dot product not convolution

I could not wrap my head around the blue highlighted section in the assignment labelled " What you should remember" which seems like it should be driving points home, but they don’t make sense to me, given what has come before.

It states in the 1st and 3rd bullet points that convolutions extract features from an input image by taking the dot product between input data and 3D filters.

As far as I have tried reasoning about it, this is vastly different to the convolution operation (sum of elements of the Hadamard product) and I do not see any equivalence.

Do I misunderstand this section, or is it simply misstated?

Yes, I agree that maybe this is a little confusing the way they have stated it. But think about the implications of taking the Hadamard product followed by the sum: that’s equivalent to what a dot product does, right? It’s the product of the corresponding elements followed by taking the sum of the aforementioned products. So you’re right that you can’t accomplish the atomic operation of convolution using, but it is logically equivalent.

Thanks for your reply!

I think I’m getting a bit of what you mean. I tried to experiment with what you’ve said in python.

What I still don’t understand is: Say we’re looking at a single convolutional step, the element-wise multiplication yields different values (and sum) than the dot product, unless the activations and filter weights are reshaped into a single row and column respectively.

Is the notion that dot product implies this reshaping in this context, or have I managed to further confuse myself on this topic?

I just mean that the phrase “dot product” means that you first do an elementwise multiply and then you add the results to get the answer. That is logically what we are doing in a single step of a convolution, but the objects in question are not vectors: they are 3D tensors. But logically the operation is the same: elementwise multiply followed by sum, right? So the two operations are not “vastly different” (your words), they are actually very similar, other than for the shapes of the input objects.

If you wanted to actually implement the convolution operation with, you could do it by “unrolling” the two input tensors into vectors and you’d get the same answer. You can try it if you want, just to convince yourself of the intent of their description. But that’s not really the point here: they aren’t telling you to implement it that way. They are just describing from a logical standpoint what is happening.

This is really not a big deal. You should just move on through the assignment and implement everything according to their guidance and see what all you learn.

1 Like