When Andrew tried to compare the computational cost between the normal convolution and the mobile nets, for the mobile nets the cost for the depthwise step was (3 * 3) * (4 * 4 * 3).

I think I have a problem with that. Isn’t that the computation for just one slice and then we need to multiply again with 3 ?

Hi Muhammad-Elmallah,

As Andrew explains starting from 5:16 in the Mobilenet video, one filter is applied to one input channel only. The number of computations for a filter is (4x4)x(3x3) (16 positions times 3x3 computations). In total there are 3 filters, each of which is applied to one input channel only. So you end up with 3x(4x4)x(3x3) computations.

