In the lectures, Sharon teaches that Transposed Convolutions has the checkerboard problem, i.e., the pixels in the center get visited more as compared to those at the edges. We encountered the same issue with Convolution Operation, and in order to avoid it, we used the Padding Operation, so why don’t we simply use Padding + Transposed Convolution instead of Upsampling + Convolution, as told by Sharon is becoming more popular these days. Thanks in advance!
Hey @mentor, can you please answer my question!
The main thing to keep in mind here is that Convolution reduces the size of the image and Transposed Convolution increases it. So with convolution, you can put the padding around the outside edge of the entire image, but where would you put the padding for Transposed Convolution?
Thanks a lot for your reply. I understood the thing I was forgetting while thinking about the idea of Padding + Transposed Convolution.
But, I have one more doubt, when Sharon says that we use Upsampling with Convolution, what kind of Upsamping do we use, cause, I tried using Nearest Neighbors Upsampling with Convolution, and though the output changed, essentially, the same problem remains, the corner cells get influenced by 1 cell from the input only, the edge cells get influenced by 2 cells from the input, and so on. The only difference was instead of getting influenced by 1 cell from the input a single time, it got influenced multiple times from the 1 cell only in this case. I have attached an image of the same.
P.S.: Please ignore my handwriting and any mathematical mistake I might have done
Another excellent question, @Elemento!
The problem is that our example is a little too simplistic to see the full checkerboard effect. With a stride of 1 and filter size of 2, only the corner and edge cells have a smaller number of visits, which, as you point out, is very similar to the problem we see with convolution.
But, if you’ll remember from the example image - the checkerboard problem shows up throughout the whole image, not just at the edges. This kind of checkerboard problem can happen with transpose convolution if you have a stride > 1.
There’s an excellent article referenced in the course that gives a nice explanation of this. There’s even a little slider you can adjust to help you visualize the different effects with different strides and filter sizes:
Thanks a lot for the reference. I read this one much before this doubt occurred to me, so the article just slipped my mind. In the article, they have mentioned that using NN-Upsampling followed by Convolution, provided the best results.
I am assuming it must be because of the way the calculations have changed that the effect of the checkerboard problem has been reduced, as they haven’t mentioned anything about how NN Upsampling + Convolution has the same ultimate effect of some cells getting affected from more cells in the input.
Additionally, I have conveniently chosen the kernel size and other parameters such that it fits my query, but if we change them, the effect can certainly be reduced.
Anyways, thanks a lot once again, and if possible, there are still some of my other queries which haven’t been answered. So, please take a look at them