How do we compute loss between the outputs of two layers?

learner1tk · February 4, 2024, 7:15pm

x1 = Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
x1= MaxPooling2D((2, 2), padding='same')(x1)                                                         
x2 = Conv2D(64, (3, 3), activation='relu', padding='same')(x1)
x2= MaxPooling2D((2, 2), padding='same')(x2)
x3= Conv2D(128, (3, 3), activation='relu', padding='same')(x2)
x3= MaxPooling2D((2, 2), padding='same')(x3)

Say I wish to compute l2 loss between x1 and x3, how would I do that? I want the deeper layers to guide the feature maps of the intermediate layers. Since the shape of x1 and x2 are different this makes it hard to directly compute loss between them.Simple reshaping does not help Please help

TMosh · February 4, 2024, 7:36pm

“loss” involves computing the difference between some expected and predicted values.

Do you have any expected values for your intermediate layers?

I don’t think a comparison between x1 and x3 would be considered a “loss”. And if the number of units is different, such a comparison is not possible.

learner1tk · February 4, 2024, 7:56pm

I’m trying to make the feature maps of the shallow layers learn from the feature maps of the deeper layers ( self distillation ). This method computes such a loss

paulinpaloalto · February 4, 2024, 8:28pm

Sorry, I have never heard of “self-distillation” and will google it in a second, so maybe I’m not the right person to answer this question. But “back propagation” is the normal way that the earlier layers learn from what is happening in the later layers, right? That’s exactly what back propagation does: the loss is computed at the output layer and then the gradients are calculated at every layer and that enables us to adjust the parameters at all layers using the Chain Rule. The gradients at each layer show how the current layer’s parameters need to change in order to get better results at the next layer.

learner1tk · February 4, 2024, 8:54pm

I’ll attach a link, I’m having trouble figuring it out myself. If anyone has any inputs please help!
[1905.08094] Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation (arxiv.org)

paulinpaloalto · February 5, 2024, 5:42am

Hi, Sara.

Thanks for the link! I took a quick look at the paper, but do not claim to understand enough to implement anything yet. The paper was published in 2019 and they claim in the text that they’ll “soon” release their code on github. Did you try searching on github to see if they actually did publish any implementation code?

rmwkwok · February 5, 2024, 6:28am

Hello @learner1tk,

I can give you some idea but not the details. However, you can google with keywords like “tensorflow” “custom loss” “multiple outputs” for more discussions.

You need to define a custom loss function/class. A loss function accepts y_true and y_pred as input arguments.

y_true is provided by you in the training data, while y_pred is the output of the network.

You need to think about how to construct the network’s output that will send everything you need into the custom loss function, including, e.g., x1 - x3.

Then you implement the loss function that can takes all outputs into account.

Cheers,
Raymond

Topic		Replies	Views
Understanding loss in tensorflow Custom and Distributed Training with TF week-module-1	6	388	January 30, 2024
Week 1 Assignment 1 Problem 4 - pool forward Convolutional Neural Networks coursera-platform	4	798	June 15, 2021
Course 4 Week 1 Assignment-2 Functional API Maxpool2D Convolutional Neural Networks coursera-platform	1	625	September 24, 2021
Gradients of Multi output models Custom and Distributed Training with TF week-module-1	3	582	May 22, 2022
C4 W1 A2 MaxPooling2D Convolutional Neural Networks coursera-platform	6	471	September 29, 2023

How do we compute loss between the outputs of two layers?

Related topics