Question

You are working with 3D data. You are building a network layer whose input volume has size 32x32x32x16 (this volume has 16 channels), and applies convolutions with 32 filters of dimension 3x3x3x16 (no padding, stride 1). What is the resulting output volume?

Shouldn’t the answer be 30x30x30x1?

However, there is no such option

You have:

- An input of size 32 x 32 x 32 x 16 (n x n x n x n_c)
- 32 filters of size 3 x 3 x 3 x 16 (f x f x f x n_c).

Notice how the number of channels each filter has corresponds to the same number of input channels (n_c = 16 in both cases), these have to match up in order to be able to convolve them.

The formula to calculate the dimension of the output is given by:

- n_out = floor((n + 2*p - f) / s) + 1

** which in this case is **n_out** = floor((32 + 2*0 - 3) / 1) + 1 = floor(29 / 1) + 1 = 29 + 1 **= 30**

The number of output channels (n’_c) is always going to be the number of filters you had, as each filter produces its own output. So, **n’_c = 32**.

Putting everything together, you have your **output dimension = 30 x 30 x 30 x 32** (n_out x n_out x n_out x n’_c).

1 Like

Thanks for such a detailed explanation! Totally forgot about the number of filters.

1 Like