Hi again @yinshan and @Yousif,
I wanted to give the pen and paper exercise a go
Standard convolution
has (3 \times 3 \times 3 + 1) \times 128 = 28 \times 128 = 3584 parameters.
Depthwise convolution
has (3 \times 3 \times 1 + 1) \times 3 = 10 \times 3 = 30 parameters.
Pointwise convolution
has (1 \times 1 \times 3 + 1) \times 128 = 4 \times 128 = 512 parameters.
Depthwise separable convolution
thus has 30 + 512 = 542 parameters, compared to 3584 for the standard convolution.
Moreover, 542 / 3584 = 0.15.
Hence, the depthwise separable convolution only has 15\% of the number of parameters of the standard convolution (for this example)!
Thanks to Kunlun Bai for the amazing graphics: