Difference between upsampling feature dimensions in a transformer vs increasing number of features using linear layer

nikhilsos · October 1, 2024, 11:59am

I recently stumbled upon an article on TransformerFTC, where they discuss downsampling features and upsampling them again using a funnel-like architecture.

My question is: what is the difference between changing the number of embedding dimensions between blocks vs. changing the number of dimensions using methods such as nn.Linear? What is the difference, which approach is better, and why? I tried asking ChatGPT, but I couldn’t get an intuitive answer

Deepti_Prasad · October 1, 2024, 4:36pm

hi @nikhilsos

are you talking about the bottleneck for the autoencoder visualisation where we reduce the dimension and again increase the dimension to see if the model learned anything about the features in varied dimensions?

can I know if this is any course related to query just to be specific.

Regards
DP

nikhilsos · October 2, 2024, 4:50am

Yes, it is about reduction in the dimension and again increase the dimension to see if the model learned anything about the features in varied dimensions.

But after studying the paper in more detail, it is my understanding that in the paper, the reduction in feature dimension is not by the feedforward networks but by max/mean pooling. Regarding decoder part, nn.upsample is used in ‘nearest’ mode.

I fail to understand what is the difference? what is the benefit changing the dimension of features using pooling and nn.upsample versus using FFN.

I am fairly new, I have a limited understanding, sorry if I failed to articulate it better. Here’s the paper if anyone is interested.

55820361.pdf (205.2 KB)

Topic		Replies	Views
Channels depth increment after pooling Convolutional Neural Networks coursera-platform	4	545	August 10, 2022
W3A2E3 Unet final step Convolutional Neural Networks coursera-platform	2	517	September 8, 2021
What's the difference between sp.ndimage.zoom and UpSampling2D Advanced Computer Vision with TensorFlow week-module-4	3	611	October 28, 2021
Course 4, week 3: How U-net model handle odd input dimensions? Convolutional Neural Networks coursera-platform	4	640	April 12, 2022
Transformer: Why Add, Not Concat? Sequence Models coursera-platform	7	1686	April 26, 2023

Difference between upsampling feature dimensions in a transformer vs increasing number of features using linear layer

Related topics