Resnet for 32x32 images

I am trying to program a ResNet for 32x32 images, and I came upon this tutorial https://towardsdatascience.com/resnets-for-cifar-10-e63e900524e0, but I am a bit confused about what it is saying. In the layer 1 section, it says to use filters of size {16, 32, 64} respectively, but the output is only 16 channels.

someone pls reply thanks

I really need help I have no idea how to move forward and I would really appreciate it if someone would help me.

You are asking questions that are not about the course material. Please realize that the mentors here are not getting paid to answer your questions. We are volunteers. So no-one owes you an answer to a question like that. Just to be clear on the expectations here. But with that said, there are lots of people who are probably happy to discuss this type of question with you. But the point is you can’t demand an answer on a fixed time scale. If you need the answer in a hurry, your best bet is to grow your own skills at solving this type of problem. Note in particular that the blog post you linked provides the full implementations for you to examine.

Note that the way ResNets work is driven by the “skip” connections, right? So within one “block”, you preserve the spatial dimensions and the number of channels, because they need to mesh with the skip connection. So the output they show as 32 x 32 x 16 is just the output of the first “block” or layer, right? Then you feed it into layer 2 and you’ll end up with 16 x 16 x 32 as the output of that. And so forth with reduced spatial and increased channel dimensions in the 3rd layer.

Thank you for responding. I didn’t mean to demand anything, I’m sorry if that’s what it came across as. What I don’t understand is they say to use 3 different filters in Layer 1, of sizes {16,32,64}, which would result in an output of channels 64.

1 Like

No, that’s not what they said. The point is that the inputs are images, so they have 3 channels. So that means the very first “block” of layers takes 32 x 32 x 3 inputs. So each filter will have shape 3 x 3 x 3 and there are 16 of them in the very first Conv2D. But then the rest of them will have 3 x 3 x 16 and then the output of that first layer is 32 x 32 x 16, because you need “same” padding in order to mesh with the skip layer.

Then in the next block the filters will be 3 x 3 x 16 and there will be 32 of them. And so forth.

Oh ok. Thanks for clarifying that. Does that mean there are only 6 layers? I thought its supposed to be ResNet50.

I don’t know. You’ll need to read the blog post and the code a bit more carefully.

image
They gave this diagram for layer 1, so I guess each layer comprises 6 sublayers?

That looks like 4 layers to me. The 3 x 16 blue blocks are just the filters being depicted. Or maybe you could consider it 2 actual Conv2D layers and that last one is just the output of the addition of the output of layer 2 and the skip connection to produce the final output.

But really, they gave you the code. Why not read it and see?

1 Like

Where’s the code? All I saw was a step-through of the problem.

1 Like

Oh I see. Thanks very much for your time.

Do you mind sending me a screenshot? I live in another country so the website is blocked.

It’s just a github link. If you can’t get to github, I don’t want to be in a position of having to send you everything you could possibly want from github. As I mentioned above, this is a) not part of the course that I’m helping you with and b) I don’t get paid to do this. I hope you will understand.

My suggestion is that you try using a VPN if you are on some restrictive network that won’t let you get to github.

I got to the site (my free VPN started working), but it says the site doesn’t exist. Welp

Ok, sorry, that was a dead end. Sigh.

But I think the description of how to build the network is pretty clear from the write up in the blog post. So just build it with TF based on what we did in the various exercises like the ResNet assignment and the U-Net assignment (C4 W3 A2).

You may also check out the Resnet paper for its figure 3 and table 1 about the architecture. And I think the content of the paper has a pretty clear explanation of what is going on.