Andrew explained at 5:30 in this video that by keeping the number of channels, you’ll add another layer that offers nonlinearity, which allows you to learn a more complex function of your network.
But you can still have nonlinearity even if you instead have, say a 32x32x64 volume as input and use a “same” convolution with 64 3x3 filters, and output a 32x32x64 volume. Right?