Convolutional sliding windows

If it is a multiclass classifier, then you need softmax. But note that Prof Ng always uses the “from_logits = True” mode of the loss functions, which means that the softmax is happening internally in the loss calculation.

Andrew’s and my examples both use 4 filters which is meant for mult-class classification.

You may want to ask yourself this: are you fine-tuning a model? Or are you using the model for prediction.

If you are fine-tuning, then the way Paul mentioned is better than appending a softmax layer to the end, but you need to make sure to specify the correct axis.

If you are predicting, then you need to know whether the model has come with a softmax and if not, you may add a softmax layer to the end but make sure you have set the correct axis to softmax over. I believe with not more than a few trials you will figure out the correct settings. Afterall you don’t have more than 4 axes to choose from :wink:

When it doesn’t come in the straight forward way, it’s time for you to practice what you have learnt to create a solution :wink:


As for converting a dense layer to a Conv layer, if you are fine-tuning a pre-trained model, you may just replace those dense layers with Conv layers. However, if you really want to just convert a Dense layer into a Conv layer, then I can give you some hints.

For an input of (None, 14, 14, 32), before you append a Dense layer to it you will flatten it out so it becomes (None, 6272). If the Dense layer has 1 neuron, then it has 14 x 14 x 32 + 1 = 6272 + 1 trainable parameters.

If you replace that Dense layer with a Conv layer, then that equivalent Conv layer should have 1 filter of size 14 x 14. How many trainable parameters does this layer have? The answer is 14 x 14 x 32 + 1 = 6272 + 1 again.

You therefore should see that the trainable parameters in a Dense layer can be rearranged to become trainable parameters in a Conv layer that gives you the exact same result as the Dense layer.

This is all the hints I can give you, so good luck :wink:


PS: If you get my idea but havn’t tried that before, then I suggest you to start your experiments with some simpler CNN network, instead of going into EfficientNet right away.

1 Like

I’m slightly confused about the video, because it says FC (fully connected), but he refers to them as 5x5 filters at 7:12 of the video, and they seem to be filters based on what I see.

As andrew said, it is filter.

I thought fully connected layers aren’t filters

FC isn’t filter. but at 7:12 it is about filters. There is nothing wrong in the narration.

and I agree with you that they are filters.

But the diagram says FC, doesn’t that mean Fully Connected?

FC means Fully connected, but there it is talking about filters.

Yes but he is talking about filters while showing the FC

It doesn’t make it FC while we know we are talking about filters.

But he was walking through the network, explaining what each layer was.

Why is there an FC in the diagram if he does not mention there being any FCs?

@sickopickle, I cannot guess why they are reusing the graph without taking “FC” away. But at 7:12, we are talking about filters.

OK, thanks. Was the original network an FC?

Ok, I think I understand what is happening, after watching the first part of the video. Thanks again for your help!

You are welcome @sickopickle. I hope you understand that while I can clarify and assure you what 7:12 is talking about, I can’t explain why FC is there because I didn’t design the slide.

PS: I have opened a ticket for the course to review regarding those FC labels.

Should the Conv layer be a softmax?

Also, I found that the second model (for the whole canvas) outputs a tensor of shape (None, 15, 31, 21), even though I would expect it to output a tensor of shape (None, 456, 956, 21), values which I obtained from 500-45+1 and 1000-45+1. Is this because the internal convolutional layers of EfficientNet have strides of greater than 1?

Hey @sickopickle, I can’t give a different answer from this one. Please take a look again and make your decision.. You are familiar with the from_logits=True thing, right?

Can you share the summary of your model 2?