In the lecture, Andrew shows that the Depthwise convolution can significantly reduce the computational cost by using n_c number of n_f x n_f filters compared to Normal convolution (which uses n_f x n_f x n_c filter). However, after I take a closer look of these two different methods, I’m wondering is there any difference in the Parallelising ability between these two?
The point is not differences in parallelizability (if that’s a word) between the normal convolutions and the pointwise + depthwise convolutions: it’s the reduction in the total number of trainable parameters and the lower overall compute cost. Fewer operations with equivalent vectorizability are cheaper than more operations.
Please review what Prof Ng says in the lectures where he compares the number of operations between the two approaches.