C4_W2_Video_Classic_Networks: Confusion about last 2 Convo layers in LeNet-5?

Confused on the last convo layer. I don’t understand how a 14x14x6 became a 10x10x16. I understand the width and height, just don’t understand the depth (6 to 16). My understanding is If we had 2 filters, wouldn’t it be 6x2 = 12?

Hello @evilyoda,

A layer with 16 filters will output a depth of 16. Each filter has a depth equal to the depth of its input.

Therefore, if we had 2 filters, it would be 10x10x2, regardless of the depth of the input.

Cheers,
Raymond

Got it, then how is the operation done then? Let’s say the input has depth of 16 and we have 2 filters. Do we merge the input into a depth of 1 and then apply the filter? Or do we apply the filter to each depth and then merge it?

If the input is 12x13x16, then a filter of size 3 will have a shape of 3x3x16. You put the filter in the upper left corner of the input, then perform an element wise multiplication between the input and the filter, and then sum all the multiplication results up. Then you move the filter to its right, and repeat the above process, until the filter goes through the whole input.

Then we move on to the next filter.

Is this clear?

Trying to understand the depth. First things first, so for a NxNx16 input, our filter is FxFx16 (filter depth has to be the same right?). If we have 2 FxFx16 filter, what would the output depth be?

The first filter will produce N’ × N’ x 1. The second filter will produce another N’ × N’ x 1. Together they produce N’ × N’ x 2.

Remember I said we do an element wise multiplication and then sum all the results up? By all, I want to emphasize it means all results across all depths. Therefore, after the sum, the depth becomes 1.

1 filter produces a result of depth 1, 16 filters produces a result of depth 16.

I just managed to get to my laptop. Here is an example.

I don’t want to draw 16 channels. However, if you drew something, feel free to share and I can have a look for you. Note that every filter sums over all of the depths of the input.

1 Like

thanks, exactly where I was confused, totally clear now!