# C4 Week 3 loss function for object localisation

In the Week3 video ‘Object Localisation’, Andrew mentioned a loss function which contained ‘if’ statements. (~10:30 mark). I’m wondering if there are existing implementations for loss functions like this (i.e., combining both object detection and localisation)?

I tried to implement this myself and encountered shape issues. Below is a minimal working example:

``````from numpy import *
from tensorflow.keras.layers import Input, Dense
import tensorflow as tf

#dummy variables
X_dummy=array([1.0,0.0,1.0])
#here the first element =1 if object is detected, =0 if not
Y_dummy=array([ [1.0,0.0],[0.0,0.0],[1.0,0.0] ])

def loss_loc(y_true, y_pred):

#if the object is detected, use MSE for all pred
loss_w_obj=tf.keras.losses.MeanSquaredError()(y_true, y_pred)
#if the object is not detected, just find the squared difference for the first element
loss_wo_obj=(y_true[0]- y_pred[0])**2

#case 1 when object is not detected, y_true[0]=0
loss1= tf.keras.backend.switch(
tf.keras.backend.equal(y_true[0],0.0), loss_wo_obj, 1e9)

#case 2 when object is detected, y_true[0]=1
loss2=tf.keras.backend.switch(
tf.keras.backend.equal(y_true[0],1.0), loss_w_obj , 1e9)

return tf.math.minimum(loss1, loss2)

#compile and fit the model
input_dummy = Input(shape=[1,], name='input')
X_dummy_2=tf.keras.layers.Dense(units=2)(input_dummy)
model = tf.keras.models.Model(input_dummy, X_dummy_2)

model.fit(X_dummy, Y_dummy, epochs=1, verbose=1)
``````

model summary:

``````_________________________________________________________________
Layer (type)                Output Shape              Param #
=================================================================
input (InputLayer)          [(None, 1)]               0

dense_6 (Dense)            (None, 2)                 4

=================================================================
Total params: 4
Trainable params: 4
Non-trainable params: 0
_________________________________________________________________
``````

The loss function works if I just pass arrays into it (e.g., loss_loc(Y_dummy[0],Y_dummy[0])) but once I start training the model, I get the following error:

``````ValueError: in user code:

File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/keras/engine/training.py", line 1160, in train_function  *
return step_function(self, iterator)
File "<ipython-input-2-da8c334ac548>", line 18, in loss_loc  *
loss2=tf.keras.backend.switch(
File "/Library/Frameworks/Python.framework/Versions/3.10/lib/python3.10/site-packages/keras/backend.py", line 5223, in switch
raise ValueError(

ValueError: Rank of `condition` should be less than or equal to rank of `then_expression` and `else_expression`. ndim(condition)=1, ndim(then_expression)=0
``````

Seems like a shape issue, but I couldn’t figure out what tensorflow is doing during batch processing. Any help is appreciated. Thanks!

In principle those two tasks need different loss functions because the magnitude and shapes of matrices involved are different. Thinking, it might be possible to combine both tasks in one loss function but the accuracy of such loss function will be very bad because it has to accommodate 2 very different tasks.

An example of an algorithm that used a unified loss function in object detection is YOLO v1. See the expression below. It has separate components for localization, classification and object presence. The designers went to some effort to scale and weight each component in order that they could be combined. Still, they resorted to pretraining on just the classification task first, then extending that model using transfer learning to include the localization component.

x_i and y_i are center location, w_i and h_i are shape, C_i is class, p_i(c) is conditional probability (weighted by object presence)

2 Likes

thank you for the reply, I was hoping to use something like this: machine learning - What is a good loss function for object localisation and classification using a cnn? - Cross Validated balancing out the detection/localisation components.
Now I think about it, It’s probably easier to train seperate NNs for detection and localisation.