Loss values do not change

Hi Learners. I have developed a simple deep-learning model for a simple training example that comes from y=2*x -1. However, most of the time when I run the model, the loss values do not change and stick with a specific value. It happens 6 or 7 times from 10 times of running. This happens even I change the number of layers and neurons, learning epochs, and learning rate. Does anyone know the reason behind it?

My entire code is as follows:

import numpy as np
x = np.array([[-1.0, 0.0 , 1.0, 2.0, 3.0, 4.0, 5.0]])
y = np.array([[-3.0, -1.0, 1.0, 3.0, 5.0, 7.0, 9.0]])

x = np.reshape(x,(7,1))
y = np.reshape(y,(7,1))

from tensorflow import keras
from keras import Model
from keras.models import Sequential
from keras.layers import Input, Dense

input_1 = Input(shape=(1,))
dense_1 = Dense(4, activation=‘relu’)(input_1)
dense_2 = Dense(2, activation=‘relu’)(dense_1)
dense_3 = Dense(1, activation=‘relu’)(dense_2)
model_2 = Model(inputs= input_1, outputs = dense_3)

model_2.compile(loss = keras.losses.MeanSquaredError(),
optimizer = keras.optimizers.Adam(0.001))

model_2.fit(x,y, epochs=500)
print(model_2.predict([10]))

One issue is that an NN with three hidden layers may be totally puzzled by how simple the problem you’ve asked it to solve is.

The cost functions for NN’s have local minima, and you may not always find the lowest-cost solution.

All you need for this example is linear regression, no hidden layers, and no ReLU.

Perhaps try a more complicated data set (maybe a nice parabola), and only use one hidden layer. Give that a try. A parabola is non-linear, and that’s what hidden layers are good at solving.

1 Like

For straightforward relationships, even the parabola, it’s unlikely you’ll need anything close to 500 epochs, either. Or even 50. My recommendation would be to dial that way back. Early experiments could use 5 or 10. They will complete quickly, then, if the model seems to be learning, you can add more and see what extra benefit they provide. You can also read about implementing your own callbacks that can stop training when accuracy reaches a certain threshold.

Yes, you are right; I did not pay attention that I was using ‘relu’ instead of ‘linear’, and it solved this problem. I also removed the hidden layers. However, I still get different results each time that I run. I know this is because of the stochastic nature of the algorithm, but among 10 runs, there were only 2 runs that ended up with the result I expected within 1500 epochs, and the other 8 runs ended up with high loss value, like over 15.

This is an easy problem and I know what I expect as the output, but for more complex problems I am not able to distinguish between the outputs and even run many many times. I would greatly appreciate it if you could advise me on this.

Also the following is the updated code:

import numpy as np
x = np.array([[-1.0, 0.0 , 1.0, 2.0, 3.0, 4.0, 5.0]])
y = np.array([[-3.0, -1.0, 1.0, 3.0, 5.0, 7.0, 9.0]])

x = np.reshape(x,(7,1))
y = np.reshape(y,(7,1))

from tensorflow import keras
from keras import Model
from keras.models import Sequential
from keras.layers import Input, Dense

input_1 = Input(shape=(1,))

dense_1 = Dense(1, activation=‘linear’)(input_1)

dense_2 = Dense(2, activation=‘relu’)(dense_1)

dense_3 = Dense(1, activation=‘linear’)(input_1)
model_2 = Model(inputs= input_1, outputs = dense_3)

model_2.compile(loss = keras.losses.MeanSquaredError(),
optimizer = keras.optimizers.Adam(0.001))

model_2.fit(x,y, epochs=1500)
print(model_2.predict([10]))

Thanks for your response. Actually, the problem I had was that the loss value was not changing at all. I mean from the first epoch to whatever the ending epoch was. I changed the activation and removed hidden layers, but it also needed more epochs for learning. I changed the number of epochs to 1500 and see again most of the time the learning is finished with a high value of loss. I don’t know why, but when I used to code machine learning algorithms whole by myself and only using numpy, I did not face with such issues.

I’m looking at your code right now. I’ll have more in a few minutes

1 Like

So a couple things I notice, without trying to get it to run yet. First, you import Sequential, but don’t use it to define the model. Second, you have input_1 as the input to dense_3, so the other Dense layers aren’t connected to the output of the model.

Yes, I commented the dense layers, and does it mind if I import sequential but not using it? I actually wanted to use it to write the model using sequential.

Your data set is still a simple linear function.

Thanks a lot. I changed the optimizer to ‘sgd’ and it worked.

from tensorflow import keras
from keras import Model
from keras.models import Sequential
from keras.layers import Input, Dense

import numpy as np


x = np.array([-1.0, 0.0 , 1.0, 2.0, 3.0, 4.0, 5.0])
y = np.array([-3.0, -1.0, 1.0, 3.0, 5.0, 7.0, 9.0])

model = Sequential(
    [ 
        Dense(units = 1, input_shape=[1]),
    ])

model.compile(loss = 'mean_squared_error',
              optimizer = 'sgd',
              metrics=["mse"])

model.fit(x,y, epochs=15)
print(model.predict([7]))

that should be enough to get a demonstration of concept running, though as @TMosh points out, it’s not a good use of the horsepower of a neural net

Really appreciate your help @ai_curious and @TMosh.

Best,
Nasim