"C1_W1_Lab_1 Ungraded Lab: The Hello World of Deep Learning" giving nan for 100 data points

Hi,
I was going through the Hello World ungraded lab ( C1_W1_Lab_1_hello_world_nn.ipynb) and wanted to make the model more accurate. So I tried adding more points (100) to learn the function “y = 7*x”. However, it works for 10 points but falters unusually for 100 points and gives back loss as nan.

Please find below my code for the same:

import tensorflow as tf
import numpy as np
from tensorflow import keras


model = tf.keras.Sequential(keras.layers.Dense(units = 1, input_shape = [1]))
model.compile(optimizer = "sgd", loss = "mean_squared_error")

xs = []
ys = []

for i in range(100):
    xs.append(i)
    ys.append(i*7)

print(xs)
print(ys)

model.fit(xs,ys, epochs = 500)

print(model.predict([10]))

Try this after printing your xs and ys. Remove the rest of your code.:

from sklearn.preprocessing import StandardScaler
x_scaler = StandardScaler()
y_scaler = StandardScaler()

xs_new = x_scaler.fit_transform(np.array(xs).reshape(-1, 1))
ys_new = y_scaler.fit_transform(np.array(ys).reshape(-1, 1))
model.fit(xs_new, ys_new, epochs=500, verbose=1)

test_x = np.array([10]).reshape(-1, 1)
print(y_scaler.inverse_transform(model.predict(x_scaler.transform(test_x))))

Thank you for the prompt response, but could you please point out the issue in my code. It even works when I keep the training size to 10 elements. I cannot get why it behaves so when I increase the size of input data.

There is no syntax error in your code if that’s what you’re asking.

That said, read the Hint in the assignment here: tensorflow-1-public/C1_W1_Assignment.ipynb at main · https-deeplearning-ai/tensorflow-1-public · GitHub

Scaling features helps.

Unfortunately, I am unable to get sklearn working with my system. But I feel my data points are not big enough to require scaling. Even the relations are straight way linear.

What’s the problem with sklearn on your system? Why does pip install fail?

Have you tried using numpy to perform scaling?

To put things in perspective, these were the model weights before scaling when I got the model to stop upon encountering NaN.
Here’s the reference: tf.keras.callbacks.TerminateOnNaN  |  TensorFlow Core v2.7.0

[array([[-5.404947e+18]], dtype=float32), array([-8.655783e+16], dtype=float32)]

These are the weights with scaling
[array([[1.0000013]], dtype=float32), array([5.485496e-09], dtype=float32)]

For your unscaled input, the default learning rate value used by the sgd optimizer is much too high. The result is that, rather than converging on the minimum value, each iteration of the model is leaping further and further away from the minimum. That is, when the system is trying to improve the model, it’s actually making it worse.

One way to resolve this is to take shorter steps when we’re trying to improve the model, to reduce the learning rate. If you replace your optimizer from "sgd" with tf.keras.optimizers.SGD(learning_rate=0.0001), this will, for this particular example, should allow you to train with your unscaled example features. We override the default learning rate with one that is much smaller.

There are alternative optimizers like 'adam' that we can also investigate that seem to do okay here as well.

2 Likes

I think @Danny_Yoo has it right. The inputs as defined in this model don’t need to be scaled. Scaling comes in to play when there are orders of magnitude difference: the classic example is the number of rooms in a house is order of ones (6, 7, 8, etc) and the cost of a house is in hundreds of thousands. In that case it makes the gradients better behaved to scale the cost down before being input to the model.

Setting the initial learning rate to a very small number using something like this…

opt = tf.keras.optimizers.SGD(learning_rate=0.0001)
...
model.compile(optimizer = opt, loss = ...)

…will solve the nan/inf problem during optimization. You can also greatly reduce the number of epochs. With such a simple, linear relationship you will stop seeing accuracy improve after only a few epochs. 5 is plenty here. 500 is way over specified.

I tried the exact same thing when going through this. I wanted to add (100,199) to see if the prediction for x=10 would be better. If this is an example of how to choose your datasets, this would be a great thing to cover early on in this course as it is a good example of the missing knowledge someone new to this area would encounter.