Question about computation cost

In Week 2 of the inception NN video lecture, for calculation computation cost Andro just considered the multiplications cost and neglected the summation cost. but for convolution, a(l) with a filter after multiplying all elements of an a(l) slice with a filter we also use summation but the summation cost was neglected for calculating all computation cost process

For every multiplication, there is an addition. If there are fewer multiplications, then there are also fewer additions, right? So the number of multiplications is a good proxy for the total compute cost. There is one more addition for the bias term, but that is not where the real differences in cost are.

Assume we have an input a(l) with dimension 3*3*1 and we have one filter with dimension 3*3*1 we have 9 multiplications and 8 addition. According to that lecture, the computation cost is 9. Are these 8 additions have low computation costs so they are neglected?

Yes. Addition is very cheap compared to multiplication.

1 Like

Well, it’s actually 9 multiplications and 9 additions, because we also add the bias value, right?

Tom’s point is correct, of course, but I think there’s a higher level point that I was trying to make in my first reply, but I didn’t express it as clearly as I could have:

The point is not to get the precise computation cost down to the individual cpu cycle. The point is to compare a normal convolution with a “depthwise separable” convolution to see why the depthwise version is more efficient. The number of multiplications and additions are of the same scale in either case by definition of the convolution operations that we are doing. So for the purpose of comparing the scale of the cost of the two operations, it is both sufficient and actually precise just to compare the number of multiplications. Notice that Prof Ng’s conclusion is that you can see from the ratio of number of multiplications in one case versus the other, that depthwise separable convolutions are a little less than 1/3 the cost of the normal “plain vanilla” convolution at least in the specific example that he is using. Since we are just computing the ratio, that conclusion would not be changed by including the additions, right?