Loosing my mind with nc' filter in pointwise convolution

DrR0bot · March 25, 2022, 2:10pm

When explaining the convolution from the input (or previous output) with a size of 4 x 4 x 3 and then using convolution with a 1 x 1 x 3 is completely understandable how he gets one matrix of 4 x 4 x 1, however when he talks about the nc’ = 5 and then getting 4 x 4 x 5.

How?

Are the other four matrices behind the first one same as the first one? How do nc’ filters are used in the computation? @paulinpaloalto @andrewng

Thanks to anyone who can help me with this!

paulinpaloalto · March 25, 2022, 3:32pm

Prof Ng explains that in the lecture. Listen again at 9:35. Unlike the “depthwise” step, the pointwise convolutions work the same as “normal” convolutions: the number of filters determines the number of output channels. Each filter must match the number of input channels. Each filter is learned differently (has a different purpose) and the total number of them you define (choose) determines the number of output channels. The choice of the number of filters is what Prof Ng calls a “hyperparameter”, meaning simply a choice you need to make. In this case he has choosen to have 5 filters, so the total dimensions of W for the second “pointwise” step are 1 x 1 x 3 x 5, right?

DrR0bot · March 25, 2022, 5:11pm

Thank you, Paul.

I understand that part of choosing that hyperparameter of 5 filters; the question is more how are the other four are being computed. I understand the first “layer”, but how are the other computed?

Does the depthwise filter change size?

Appreciate your help with this confusion!

paulinpaloalto · March 25, 2022, 5:15pm

I don’t understand your point. This is exactly like a “normal” convolution: it just happens that the filter size is f = 1, right? So you have five different filters each shaped 1 x 1 x 3 and you apply each one individually just as you normally apply a “conv” filter. Each one gives you a 4 x 4 x 1 output and there are 5 of them, so you end up with an output that is 4 x 4 x 5. What is mysterious or confusing about that?

paulinpaloalto · March 25, 2022, 5:16pm

And note that we don’t “choose” the filter values, just the number of filters. The filter values are “parameters”, meaning that they are learned through back propagation, just like normal. Absolutely flavor vanilla compared to everything else we’ve learned up to this point, right?

The depthwise filters are completely separate: that is the previous step. It is completely independent. Those parameters are also learned through back propagation. But of course back propagation propagates through all the layers using the Chain Rule just like normal, so what happens in the later layers affects the earlier layers.

DrR0bot · March 25, 2022, 5:20pm

Got it! gosh! the reason was that I only saw one (1 x 1 x 3) filter!

paulinpaloalto · March 25, 2022, 7:45pm

Yes, I guess the picture could have been more complete. But he literally said all the necessary words in the lecture to explain what he means here. It’s the area around 9:35 into that lecture.

DrR0bot · March 25, 2022, 8:06pm

Thanks, Paul, sorry for bothering you with such silly doubt.

Justin · July 7, 2023, 1:17am

Hi Paulinpaloalto, I had similar question as DrRobot had. It got clarified a bit from your answer but I want to confirm 1 more. In 1x1x3 matrix in point wise matrix above where value of the filter is (2,2,2). So these values should be same in all 3 position since it is actually just 1 filter. isn’t it?

best,

paulinpaloalto · July 7, 2023, 2:38am

Yes, that is just an example and maybe not a very good one. Typically you would not expect the filter values to all be the same in all positions or even be integers for that matter. There is no reason why that would actually happen in “real life” as all those values are learned through back propagation and start from random initializations for symmetry breaking.

You can see that the values produced match, though:

1 * 2 + 4 * 2 + 7 * 2 = 24
2 * 2 + 5 * 2 + 8 * 2 = 30

and so forth …

Justin · July 8, 2023, 1:38am

I just realized you are saying these convolutional filters(not Maxpooling) are also trainable variables via back-prop. But edge detectors are not trainable filters? Then weights to filters and filter itself are both getting trained throughout back-prop?

paulinpaloalto · July 8, 2023, 4:23am

The “edge filters” that Prof Ng shows in Week 1 of Convnets are just a demonstration of how conv layers can detect things. That is “old school” and nobody uses hand-coded filters like that these days: you just randomly initialize the filters and let back prop learn what it needs to in order to solve the problem at hand.

ngkhatu · November 29, 2023, 4:46am

What is the difference between Pointwise Convolution and 1x1 Convolutions that we learnt earlier? Seems like they are no different. Is there a reason they are named differently? If so what is the difference?

paulinpaloalto · November 29, 2023, 6:13am

Yes, I believe those are just two names for the same operation. It’s not unusual for people to have multiple ways to name or describe the same operation or phenomenon.

Topic		Replies	Views
Course 4 Week 2 - Mobile Net - Pointwise Convolution Convolutional Neural Networks coursera-platform	6	642	December 31, 2021
Filter dimensions Convolutional Neural Networks week-1 , coursera-platform	2	236	March 2, 2024
Question about W dimension Convolutional Neural Networks coursera-platform	2	528	August 19, 2021
Confusion in understanding the filter size and the number of filters Convolutional Neural Networks week-1 , coursera-platform	1	20	November 30, 2024
Convolution question DLS 4 week 2 Convolutional Neural Networks coursera-platform	1	511	August 21, 2021

Loosing my mind with nc' filter in pointwise convolution

Related topics