"C1_W1_Lab_1 Ungraded Lab: The Hello World of Deep Learning" giving nan for 100 data points

Suraj13 · January 20, 2022, 11:43am

Hi,
I was going through the Hello World ungraded lab ( C1_W1_Lab_1_hello_world_nn.ipynb) and wanted to make the model more accurate. So I tried adding more points (100) to learn the function “y = 7*x”. However, it works for 10 points but falters unusually for 100 points and gives back loss as nan.

Please find below my code for the same:

import tensorflow as tf
import numpy as np
from tensorflow import keras


model = tf.keras.Sequential(keras.layers.Dense(units = 1, input_shape = [1]))
model.compile(optimizer = "sgd", loss = "mean_squared_error")

xs = []
ys = []

for i in range(100):
    xs.append(i)
    ys.append(i*7)

print(xs)
print(ys)

model.fit(xs,ys, epochs = 500)

print(model.predict([10]))

balaji.ambresh · January 20, 2022, 12:40pm

Try this after printing your xs and ys. Remove the rest of your code.:

from sklearn.preprocessing import StandardScaler
x_scaler = StandardScaler()
y_scaler = StandardScaler()

xs_new = x_scaler.fit_transform(np.array(xs).reshape(-1, 1))
ys_new = y_scaler.fit_transform(np.array(ys).reshape(-1, 1))
model.fit(xs_new, ys_new, epochs=500, verbose=1)

test_x = np.array([10]).reshape(-1, 1)
print(y_scaler.inverse_transform(model.predict(x_scaler.transform(test_x))))

Suraj13 · January 20, 2022, 1:48pm

Thank you for the prompt response, but could you please point out the issue in my code. It even works when I keep the training size to 10 elements. I cannot get why it behaves so when I increase the size of input data.

balaji.ambresh · January 20, 2022, 2:19pm

There is no syntax error in your code if that’s what you’re asking.

That said, read the Hint in the assignment here: tensorflow-1-public/C1_W1_Assignment.ipynb at main · https-deeplearning-ai/tensorflow-1-public · GitHub

Scaling features helps.

Suraj13 · January 20, 2022, 2:57pm

Unfortunately, I am unable to get sklearn working with my system. But I feel my data points are not big enough to require scaling. Even the relations are straight way linear.

balaji.ambresh · January 20, 2022, 7:09pm

What’s the problem with sklearn on your system? Why does pip install fail?

Have you tried using numpy to perform scaling?

balaji.ambresh · January 20, 2022, 7:38pm

To put things in perspective, these were the model weights before scaling when I got the model to stop upon encountering NaN.
Here’s the reference: tf.keras.callbacks.TerminateOnNaN | TensorFlow Core v2.7.0

[array([[-5.404947e+18]], dtype=float32), array([-8.655783e+16], dtype=float32)]

These are the weights with scaling
[array([[1.0000013]], dtype=float32), array([5.485496e-09], dtype=float32)]

Danny_Yoo · April 3, 2022, 11:34pm

For your unscaled input, the default learning rate value used by the sgd optimizer is much too high. The result is that, rather than converging on the minimum value, each iteration of the model is leaping further and further away from the minimum. That is, when the system is trying to improve the model, it’s actually making it worse.

One way to resolve this is to take shorter steps when we’re trying to improve the model, to reduce the learning rate. If you replace your optimizer from "sgd" with tf.keras.optimizers.SGD(learning_rate=0.0001), this will, for this particular example, should allow you to train with your unscaled example features. We override the default learning rate with one that is much smaller.

There are alternative optimizers like 'adam' that we can also investigate that seem to do okay here as well.

ai_curious · April 4, 2022, 12:06am

I think @Danny_Yoo has it right. The inputs as defined in this model don’t need to be scaled. Scaling comes in to play when there are orders of magnitude difference: the classic example is the number of rooms in a house is order of ones (6, 7, 8, etc) and the cost of a house is in hundreds of thousands. In that case it makes the gradients better behaved to scale the cost down before being input to the model.

Setting the initial learning rate to a very small number using something like this…

opt = tf.keras.optimizers.SGD(learning_rate=0.0001)
...
model.compile(optimizer = opt, loss = ...)

…will solve the nan/inf problem during optimization. You can also greatly reduce the number of epochs. With such a simple, linear relationship you will stop seeing accuracy improve after only a few epochs. 5 is plenty here. 500 is way over specified.

Jerry_Russell · January 30, 2023, 6:15am

I tried the exact same thing when going through this. I wanted to add (100,199) to see if the prediction for x=10 would be better. If this is an example of how to choose your datasets, this would be a great thing to cover early on in this course as it is a good example of the missing knowledge someone new to this area would encounter.

Topic		Replies	Views
When I add more training valus neural network can not find a solution. Why? Introduction to TF for Artificial Intelligence ... week-module-1	16	447	December 9, 2023
Assignment: Housing Prices Introduction to TF for Artificial Intelligence ... week-module-1	32	1800	January 11, 2024
0 Vote C3W1_Assignment model loss=nan Advanced Computer Vision with TensorFlow week-module-1	2	381	October 9, 2023
TensorFlow fails to adjust weights Sequences, Time Series and Prediction week-module-2	4	38	January 19, 2025
First Assignment, I get 13 instead of 4 for some reason Introduction to TF for Artificial Intelligence ... week-module-1	1	541	June 29, 2022

"C1_W1_Lab_1 Ungraded Lab: The Hello World of Deep Learning" giving nan for 100 data points

Related topics