How do we compute loss between the outputs of two layers?

``````x1 = Conv2D(32, (3, 3), activation='relu', padding='same')(inputs)
x2 = Conv2D(64, (3, 3), activation='relu', padding='same')(x1)
x3= Conv2D(128, (3, 3), activation='relu', padding='same')(x2)
``````

Say I wish to compute l2 loss between x1 and x3, how would I do that? I want the deeper layers to guide the feature maps of the intermediate layers. Since the shape of x1 and x2 are different this makes it hard to directly compute loss between them.Simple reshaping does not help Please help

2 Likes

â€ślossâ€ť involves computing the difference between some expected and predicted values.

Do you have any expected values for your intermediate layers?

I donâ€™t think a comparison between x1 and x3 would be considered a â€ślossâ€ť. And if the number of units is different, such a comparison is not possible.

1 Like

Iâ€™m trying to make the feature maps of the shallow layers learn from the feature maps of the deeper layers ( self distillation ). This method computes such a loss

Sorry, I have never heard of â€śself-distillationâ€ť and will google it in a second, so maybe Iâ€™m not the right person to answer this question. But â€śback propagationâ€ť is the normal way that the earlier layers learn from what is happening in the later layers, right? Thatâ€™s exactly what back propagation does: the loss is computed at the output layer and then the gradients are calculated at every layer and that enables us to adjust the parameters at all layers using the Chain Rule. The gradients at each layer show how the current layerâ€™s parameters need to change in order to get better results at the next layer.

[1905.08094] Be Your Own Teacher: Improve the Performance of Convolutional Neural Networks via Self Distillation (arxiv.org)

Hi, Sara.

Thanks for the link! I took a quick look at the paper, but do not claim to understand enough to implement anything yet. The paper was published in 2019 and they claim in the text that theyâ€™ll â€śsoonâ€ť release their code on github. Did you try searching on github to see if they actually did publish any implementation code?

Hello @learner1tk,

I can give you some idea but not the details. However, you can google with keywords like â€śtensorflowâ€ť â€ścustom lossâ€ť â€śmultiple outputsâ€ť for more discussions.

You need to define a custom loss function/class. A loss function accepts `y_true` and `y_pred` as input arguments.

`y_true` is provided by you in the training data, while `y_pred` is the output of the network.

You need to think about how to construct the networkâ€™s output that will send everything you need into the custom loss function, including, e.g., `x1 - x3`.

Then you implement the loss function that can takes all outputs into account.

Cheers,
Raymond