we saw in the videos that the depthwise convolution in the end give us the same size of number as the normal convolution with a reduced cost, but does the computed numbers in the output of the two methods are equal?? or its calculating diffrente thing
It gets to the same shape of output with a different method, including a different number of parameters. But the parameters in both methods are trainable, right? That’s the whole point. In theory, my guess is that you can learn the same result either way. I don’t think Prof Ng actually says that in the lectures that I remember and I have not tried to prove this, but it seems intuitively reasonable. But in math, intuitions don’t always play out. Does anyone else have any references on this?
But on second thought, there’s an obvious counterargument: If that were true, then why would anyone ever do the more expensive method? Why wouldn’t all ConvNets be built using the “depthwise” technique? So maybe my intuition is worth exactly what you paid for it …
actually that is the real question, why don’t they use this method in all the convnets.i think there must be a price paid in order to get less computational cost
I think that the argument is that if you have significantly fewer parameters, that fundamentally says that the function you end up with has a lot fewer degrees of freedom and is (hence) less complex. In other words that my first “conjecture” above that you can get to the same result by either method with the appropriate level of training is not at all clear from general principles.
Here’s a thread which shows the comparison between the two styles of convolution more graphically, which may make things a bit more intuitive.