Can 1X1 convolutions"arbitrarily " be applied without any cost?

It seems to me that 1x1 convolutions can be used to reshape activations arbitrarily. See for example 3.2 - The Convolutional Block in the exercises or the parts about 1x1 convolutions etc… I think I find this odd, because it seems to indicate that the structure of the activations is essentially arbitrary, which just doesn’t seem to make sense to me.

So are there any restrictions or costs to applying convolutions? Or is it kind of like how in topology a Mug is for all intents and purposes a Donut?

Hi, @Quincy_Rondei !

What do you mean exactly with “the structure of the activations”?

As you said, there are no restrictions. Any model has its own “structure” that makes sense only after training, when the features are extracted through whatever paths the optimization algorithm has approximated along the model layers.

Then, when comparing the results and metrics between different models you can tell if the shapes of the layers (1x1 conv or any) is better or worse.

I changed the title to better reflect my question

I think my issue is with wether the output of an 1x1 convolution is merely a reshuffled version of the input or not.

What I seem to have understood originally was that the 1x1 convolution doesn’t do anything but change the shape of the previous layer of activations: If that is the case it gives me a sense of arbitrariness of the shape of those activations.

However rewatching the video I seem to get more of a “translation” intuition. Think like how you can’t always translate something literally, but rather express things in an idiomatic way (e.g., “hay” in Spanish but “there is” in English and “Es gibt” in German).

Yes, you are right.
They are widely used to “condense” the input into a smaller output to reduce the model size without losing much information