in the week 2 lecture on 1 * 1 Convolution, Prof Andrew says that we can reduce the number of channels while keeping the height and width of the input intact. But we can do the same in all other convolution by keeping the Padding = ‘SAME’ and using less number of filters. We can apply a ReLU function ( As we saw in the ResNet architecture on the conv layer ).
What is unique about a 1 * 1 network in that case?
The mechanics of applying a 1 x 1 convolution are the same as any other normal convolution with filter size > 1, but the difference is that there is no interaction between neighboring “pixels” (although it’s a mistake to think of them as pixels in the internal layers of the network). That’s why they call it “pointwise” …
So sure, you can get the same output dimensions with many different convolutions, but pointwise convolutions have a very specific application in combination with depthwise separable convolutions, which is the whole point that Prof Ng is explaining in these lectures. The combination of depthwise followed by pointwise enables you to get similar behavior as a single “normal” convolution, but at significantly lower computational cost.