Why use Strided Convolutions?

What’s the purpose of using strides when doing a convolution? Why or when would I choose a strided convolution over a normal convolution operation?

Welcome to the community.

In Tensorflow, all convolutional layers have default values for the “stride”. It is “one”. For Conv1D, the default stride is 1. For Conv2D, the default stride is (1,1). And, so on. Usually, as a filter size is much smaller than an image size, we need to move a filter to cover all pixels.

Basically, we want to get detail features of an image by convolutions with stride=1. Using a larger stride has some advantages and disadvantages.

  1. By a larger stride, the number of convolutions can be reduced. It helps to reduce the computational requirements, i.e., the cost including time.
  2. Output of an image is smaller than an original one. If a filter is 2x2 and uses stride=(2,2) for 2D image, then, output size is roughly 1/4. This can reduce the computational requirements in the next step. (This convolution may be also used to create “blurred image” for arts, etc…)

As you see, the major impact is for computational costs. On the other hand, there is a drawback. If an image is quite precise, and there are lots of important features in there, then, using a larger stride definitely lose those features. If an image is very flat like a sky, then, it could be possible, though…

So, those are trade off of “computational cost” v.s… “detail feature map creation”, basically.