Custom Loss function

Hello learners. I want to define a costume loss function rather than using the predefined loss function that Keras provides and want the loss function to use my input training examples X_tain besides the default inputs y_pred and y_true. To do so, I wrote a loss function that itself has an inner loss function. However, the model does not work; I receive an error in the first iteration of the fitting stage. I suspect maybe this is because of the shape or type of the X_train (it is a NumPy matrix), since the code works when I delete terms associated with X_train. I tried to change the shape and format of X_train to a tensor because I use TensorFlow as the backend but still does not work. I was wondering if you had a similar experience or have suggestions.

Here is my code:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense


def costum_loss(X_loss):
    def loss(y_true, y_pred):
                
        q1 = y_pred[:, 1] + y_pred[:, 4] - X_loss [:,0]
        q2 = y_pred[:, 2] + y_pred[:, 3] + y_pred[:, 7] + y_pred[:, 17] + y_pred[:, 22] - X_loss [:,1] 
        q3 = y_pred[:, 4] + y_pred[:, 5] + y_pred[:, 7] + y_pred[:, 8] + y_pred[:, 9] + y_pred[:, 19] - X_loss [:,2]
        q4 = y_pred[:, 3] + y_pred[:, 10] + y_pred[:, 11] + y_pred[:, 13]
        q5 = y_pred[:, 14]
        q6 = y_pred[:, 5] + y_pred[:, 7] + y_pred[:, 10] + y_pred[:, 15] + y_pred[:, 17] + y_pred[:, 20] + y_pred[:, 22]
        q7 = y_pred[:, 11] + y_pred[:, 16] + y_pred[:, 19] + y_pred[:, 21]
        q8 = y_pred[:, 5] + y_pred[:, 7] + y_pred[:, 8] + y_pred[:, 20] + y_pred[:, 21] + y_pred[:, 22] + y_pred[:, 23]
    
        loss_t = tf.reduce_mean(tf.square(y_true - y_pred)) + 1000 * tf.reduce_mean(tf.square(q1)) + \
        1000 * tf.reduce_mean(tf.square(q2)) + 1000 * tf.reduce_mean(tf.square(q3)) + \
        1000 * tf.reduce_mean(tf.square(q4)) + 1000 * tf.reduce_mean(tf.square(q5)) + \
        1000 * tf.reduce_mean(tf.square(q6)) + 1000 * tf.reduce_mean(tf.square(q7)) + \
        1000 * tf.reduce_mean(tf.square(q8))
        return loss_t
    return loss

model = Sequential([
    tf.keras.Input(shape=(n2,)),
    Dense(16, activation='relu'),
    Dense(64, activation='relu'),
    Dense(128, activation='relu'),
    Dense(64, activation='relu'),
    Dense(25, activation='relu')
], name="Model")


# Compile the model with the custom loss function
model.compile(loss = costum_loss(X_train), optimizer=tf.keras.optimizers.Adam(0.001))

# Train the model
model.fit(X_train, Y_train, epochs=5000)

Hello @Nasim_Deljouyi

Is your X_loss a constant? Or does it vary from mini-batch to mini-batch?

Raymond

It is actually constant. Later, in the compile section, I will use X_train as the input argument of custom loss, which is a NumPy array with constant values (defined before the neural network modeling) and its dimension is 7000*8, in which 8 is the number of input features and 7000 is the number of training samples.

Please share a copy of the full error traceback here. A screenshot is better, but text copy should also do.



X_loss[:,0] has a shape of (7000, ) right? And y_pred[:, 4] has a shape of (batch_size, ), right?

The batch size here is the number of all samples, and it is actually 6999. Yes, X_loss[:, 0] (X_train[:,0]) has a shape of (6999, ) (I round it earlier when I said 7000), and I don’t know how to determine the shape of y_pred.

The first number in the shape of y_pred is the batch size. It is defaulted to 32 (see this doc).

So you are subtracting with incompatible shapes.

1 Like

Oh, thanks for your help. So, do you have any suggestion to modify that? Do I need to put batch_size = 6999, in the model.fit?

You can do it that way, but do you want to learn mini-batch wise?

Yes, I really want to learn about batch-size; I don’t have much information about it. Also, need to go more into detail about TensorFlow , I always have problems when it comes to changing the shape and type of NumPy arrays to tensors. I would be happy if you are able to recommend me some easy-to-understand references.

By the way, thank you so much for your help.

A few things:

  1. If you want to do it mini-batch wise, then this mean X_loss is no longer a constant because it should change from batch to batch. Therefore, we also can’t pass X_loss into the loss function in the way you are doing it now. Here is what you can change:

    a. Learn to use Tensorflow Functional API to build a model instead of using Sequential. See this for an introduction, or search for more if needed. For example, we can actually build a model without Sequential like in below. See how we use x from place to place to make all the “connections”.
    image

    b. Take X_loss as a second Input. (Yes, you can have more than one Input to a model).

    c. Build a Lambda layer which take X_loss and the output from your last Dense layer as inputs. A Lambda layer lets you define a custom function to process the inputs. You can do these things in the Lambda layer to produce a lambda_layer_output . (see the end of this reply).

    d. Concatenate the lambda_layer_output with the output from your last Dense layer to form a new, final output.

    e. In the loss function, it will still take y_true and y_pred as inputs, but your y_pred contains two things: (1) lambda_layer_output and (2) the output from your last Dense layer. Unpack them back, and do the last step of computing the loss. (see the end of this reply)

  2. Regarding shapes, this is how I generally do this:

    a. implement a custom loss function

    b. add tf.print to each print each intermediate variable’s shape (e.g.
    tf.print(tf.shape(variable))

    c. find some small samples of y_true and y_pred, pass it through the loss function, and check if it produces expected returns.

    d. Don’t use it for model training until you have done step c to verify that it works as expected

Finally, for the steps I suggest in point number 1, you might not understand it until you try it out, and you might not be able to finish them unless you have studied enough examples from the internet. You might not make it work in your first try. However, these are all learning processes that I recommend you to go through.

Cheers,
Raymond

What your lambda function can do:

        q1 = y_pred[:, 1] + y_pred[:, 4] - X_loss [:,0]
        q2 = y_pred[:, 2] + y_pred[:, 3] + y_pred[:, 7] + y_pred[:, 17] + y_pred[:, 22] - X_loss [:,1] 
        q3 = y_pred[:, 4] + y_pred[:, 5] + y_pred[:, 7] + y_pred[:, 8] + y_pred[:, 9] + y_pred[:, 19] - X_loss [:,2]
        q4 = y_pred[:, 3] + y_pred[:, 10] + y_pred[:, 11] + y_pred[:, 13]
        q5 = y_pred[:, 14]
        q6 = y_pred[:, 5] + y_pred[:, 7] + y_pred[:, 10] + y_pred[:, 15] + y_pred[:, 17] + y_pred[:, 20] + y_pred[:, 22]
        q7 = y_pred[:, 11] + y_pred[:, 16] + y_pred[:, 19] + y_pred[:, 21]
        q8 = y_pred[:, 5] + y_pred[:, 7] + y_pred[:, 8] + y_pred[:, 20] + y_pred[:, 21] + y_pred[:, 22] + y_pred[:, 23]
    
        lambda_layer_output =  1000 * tf.reduce_mean(tf.square(q1)) + \
        1000 * tf.reduce_mean(tf.square(q2)) + 1000 * tf.reduce_mean(tf.square(q3)) + \
        1000 * tf.reduce_mean(tf.square(q4)) + 1000 * tf.reduce_mean(tf.square(q5)) + \
        1000 * tf.reduce_mean(tf.square(q6)) + 1000 * tf.reduce_mean(tf.square(q7)) + \
        1000 * tf.reduce_mean(tf.square(q8))

What your new custom loss function can do:

# unpack y_pred to extract lambda_layer_output and the output from your last Dense layer
loss_t = tf.reduce_mean(tf.square(y_true - output_from_your_last_dense_layer)) + lambda_layer_output 
1 Like

I appreciate your help. Thanks so much.

You are welcome @Nasim_Deljouyi!

Hi @rmwkwok. I have rebuilt my model using functional API as follows:

input_1 = keras.Input(shape=(n2,))
input_2 = keras.Input(shape=(n2,))
dense_1 = layers.Dense(16, activation = ‘relu’)(input_1)
dense_2 = layers.Dense(64, activation = ‘relu’)(dense_1)
dense_3 = layers.Dense(128, activation = ‘relu’)(dense_2)
dense_4 = layers.Dense(64, activation = ‘relu’)(dense_3)
output = layers.Dense(25, activation = ‘relu’)(dense_4)

However, I have some questions. Can you please elaborate steps 1.c to 1.e a little more? I got confused. Shouldn’t a lambda layer be in the format of " tf.keras.layers.Lambda(custom function)(inputs) "? The custom function is what you wrote at the end of the reply; does it return lambda_layer_output? Therefore, I don’t need to define a custom loss function anymore?
And I don’t need to pass input 2 to dense_1 to dense_4, right?

Hello @Nasim_Deljouyi

If you read carefully, there are two functions and none of them alone can totally replace your loss function.

I believe I have written down pretty clearly that there will be a lambda function, and there will be a loss function.

it makes sense to return something, and do you need to return lambda_layer_output?

I don’t understand how you come to that conclusion.

I think you will need to figure it out yourself. Think why you need input 2, and where you want to use it.

@Nasim_Deljouyi, the approach I share with you will let you supply a small batch of X_loss each time, since you want to do it mini-batch wise. However, you will still need to figure out how to connect all the dots up. Good luck!

Cheers,
Raymond

Thanks so much for the clarification.

Best,
Nasim

Hi Raymond. I hope you are doing well.
If you recall, you suggested this approach for training models that incorporate the input of training examples into the loss function in a mini-batch-wise manner. I have implemented this approach and another approach to do so, and I get the same results from both when using a fixed seed for the random number generators. However, when I set the batch size equal to the number of training examples, I got the same results from both approaches but different from the approach I previously used and provided here, in which X_loss was assumed to be constant and was only able to work for the batch size equal to the training examples. I was wondering if you or anyone here has dealt with this issue before and can help me with it.