Hi, I’ve just been exploring a bit, and tried a simple model with no inputs and no weights, expecting it to adjust the bias to the mean of the y values.
model = tf.keras.models.Sequential([
tf.keras.layers.Dense(1, input_shape = [0]),
])
This doesn’t seem to work, however - it behaves as if the gradients and loss are calculated for the first batch, but then never updated. (i.e. the loss reported does not change by epoch,and the bias increases linearly with epochs, initially moving in the right direction but then overshooting).
Does anyone happen to know why it does this?
1 Like
Hi, @David_Harris2!
Seems interesting what you’re trying to do. I have a couple of questions:
- Which loss function are you using?
- Does the model output change during your training?
- Are the gradient values always the same?
Thanks for you quick reply.
This is my code. It’s tracking loss and bias by epoch at the moment, I’ll look at gradients by epoch when I get a chance. If it’s tweaked to have one weight and no bias (using a vector of ones as x) it works perfectly,incidentally.
import tensorflow as tf
import numpy as np
import matplotlib.pyplot as plt
nrecs = 10000
x_train = np.ones([nrecs, 0]) # Training dataset with no data
y_train = np.ones([nrecs, 1]) # Labels that will not be used
bias_history = []
model = tf.keras.models.Sequential([tf.keras.layers.Dense(1, input_shape = [0])])
model.compile(loss='mean_squared_error', optimizer=tf.optimizers.RMSprop())
history = model.fit(x_train, y_train, epochs=50, verbose=False, batch_size=100,
callbacks=[tf.keras.callbacks.LambdaCallback(
on_epoch_end=lambda batch, logs: bias_history.append(model.layers[0].get_weights()[1]))])
plt.plot(history.history["loss"])
plt.show()
bias_history_np = np.array(bias_history)
plt.plot(bias_history_np)
plt.show()
I can’t yet see a way of tracking the gradients that are being applied without rerunning it with a custom training loop and explicitly using a GradientTape()
Hi, I have a bit more detail for you now.
I’m going to (hesitantly) suggest that this may be a bug in Tensorflow, since the model fails to converge if run with tf.config.run_functions_eagerly(False) (the default, functions are executed as graphs) but identical code converges perfectly to the correct solution if run with tf.config.run_functions_eagerly(True). And at present I believe that changing this option should not affect results (or can it?).
Is this plausible, and if it is should I try to raise it?
To give more definite answers to your questions
-
loss=‘mean_squared_error’
-
Yes.Before training model(tf.constant([[]])) gives 0.0
After training for 25 epochs (runeagerly=True) it gives 1.0 (the correct result)
After training for 25 epochs (runeagerly=False) it gives 2.51 (an overshoot)
-
I’ve failed to find a way to track a history of gradients by epoch, I’m afraid.
Hi, @David_Harris2.
That seems a little odd since the eager mode should be just about how computations are done (graph computations or not). That should be something we are missing about the things that happen “under the hood”, … or maybe a bug