Incorporating equations governing input-output pairs in neural networks

Hi all. I am working on a neural network problem. The number of input features is 8 and the number of outputs is 25. There are many outputs that make the problem complicated. I have developed 6999 training examples, so X_train and Y_train dimensions are 6999 * 8 and 6999 * 25. I have developed a neural network as follows:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

Model

model_1 = Sequential([
tf.keras.Input(shape=(n2,)),
Dense(16, activation = ‘relu’),
Dense(64, activation = ‘relu’),
Dense(128, activation = ‘relu’),
Dense(64, activation = ‘relu’),
Dense(25, activation = ‘relu’)
], name = “Model_1”)

Compile (Loss)

model_1.compile(loss = tf.keras.losses.MeanSquaredError(),
optimizer = tf.keras.optimizers.Adam(0.001))

Fit

model_1.fit(X_train,Y_train, epochs = 5000)

However, I know that for my input and outputs, some equations are true. I wrote the equations for the training data as follows.

Y_train[:, 1] + Y_train[:, 4] = X_train[:, 0]
Y_train[:, 2] + Y_train[:, 3] + Y_train[:, 7] + Y_train[:, 17] + Y_train[:, 22] = X_train[:, 1]
Y_train[:, 4] + Y_train[:, 5] + Y_train[:, 7] + Y_train[:, 8] + Y_train[:, 9] + Y_train[:, 19] = X_train[:, 2]
Y_train[:, 3] + Y_train[:, 10] + Y_train[:, 11] + Y_train[:, 13] = X_train[:, 3]
Y_train[:, 14] = X_train[:, 4]
Y_train[:, 5] + Y_train[:, 7] + Y_train[:, 10] + Y_train[:, 15] + Y_train[:, 17] + Y_train[:, 20] + Y_train[:, 22] = X_train[:, 5]
Y_train[:, 11] + Y_train[:, 16] + Y_train[:, 19] + Y_train[:, 21] = X_train[:, 6]
Y_train[:, 5] + Y_train[:, 7] + Y_train[:, 8] + Y_train[:, 20] + Y_train[:, 21] + Y_train[:, 22] + Y_train[:, 23] = X_train[:, 7]

I tried to use the idea of a physics-informed neural network and include these equations in my loss function, I wrote the following code:

import tensorflow as tf
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense

Loss function

def costum_loss(X_loss):
def loss(y_true, y_pred):

    q1 = y_pred[:, 1] + y_pred[:, 4] - X_loss[:, 0]
    q2 = y_pred[:, 2] + y_pred[:, 3] + y_pred[:, 7] + y_pred[:, 17] + y_pred[:, 22] - X_loss[:, 1]
    q3 = y_pred[:, 4] + y_pred[:, 5] + y_pred[:, 7] + y_pred[:, 8] + y_pred[:, 9] + y_pred[:, 19] - X_loss[:, 2]
    q4 = y_pred[:, 3] + y_pred[:, 10] + y_pred[:, 11] + y_pred[:, 13] - X_loss[:, 3]
    q5 = y_pred[:, 14] - X_loss[:, 4]
    q6 = y_pred[:, 5] + y_pred[:, 7] + y_pred[:, 10] + y_pred[:, 15] + y_pred[:, 17] + y_pred[:, 20] + y_pred[:, 22] - X_loss[:, 5]
    q7 = y_pred[:, 11] + y_pred[:, 16] + y_pred[:, 19] + y_pred[:, 21] - X_loss[:, 6]
    q8 = y_pred[:, 5] + y_pred[:, 7] + y_pred[:, 8] + y_pred[:, 20] + y_pred[:, 21] + y_pred[:, 22] + y_pred[:, 23] - X_loss[:, 7]

    loss_t = tf.reduce_mean(tf.square(y_true - y_pred)) + \
    tf.reduce_mean(tf.square(q1)) + \
    tf.reduce_mean(tf.square(q2)) + tf.reduce_mean(tf.square(q3)) + \
    tf.reduce_mean(tf.square(q4)) + tf.reduce_mean(tf.square(q5)) + \
    tf.reduce_mean(tf.square(q6)) + tf.reduce_mean(tf.square(q7)) + \
    tf.reduce_mean(tf.square(q8))
    return loss_t
return loss

Model

model = Sequential([
tf.keras.Input(shape=(n2,)),
Dense(16, activation=‘relu’),
Dense(64, activation=‘relu’),
Dense(128, activation=‘relu’),
Dense(64, activation=‘relu’),
Dense(25, activation=‘relu’)
], name=“Model”)

Compile the model with the custom loss function

model.compile(loss = costum_loss(X_train), optimizer=tf.keras.optimizers.Adam(0.001))

Train the model

model.fit(X_train, Y_train, batch_size = 6999 ,epochs=5000)

Of course, still I need to work on the code and rewrite it so that I can use it for batch sizes other than the number of training examples, but for example even for the batch_size equal to the training examples my first model (before involving equations in loss function) works much better than the second one (after involving the equation in the loss function). Actually, the answer for the second model is so bad. I expected the model performance would be improved by forcing the outputs to satisfy those equations. For example, regarding equation number 5, I always want the 15th element of the input to be equal to the 5th element of the input. Do you think the problem is with the way of implementation (I am working to write it in another way) or just the idea does not work here because those sets of equations do not have a unique answer (the unknown variables are more than the number of equations)? Is there any other way that I can force the output of the model to satisfy those equations?

Hi Nasim_Deljouyi,

This kind of issue is sometimes resolved by adding a memory component to a neural network. An example of this you can find here. You may google on adding memory component to neural network, and see if you can find some inspiration. I know they have been working on this at the allen institute in Seattle. Good luck!

Thank you so much @reinoudbosch

Best,
Nasim