# Why tensorflow does not give the same result

Hello

I try to replicate previous assignment Course 2 - Week 01 - Assignment 01
using tensorflow. However, the result not only similar to the result in Assignment 01, but also does not converge. I hope to get some help from fellows for:

1. Why the result does not converge, giving the same set up in the assignment 1 (W1)
2. Why the result is not similar?

Here is the complete code:

Cell 1/7

``````import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import tensorflow as tf
from public_tests import *

%matplotlib inline
plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

``````

Cell 2/7

``````train_X, train_Y, test_X, test_Y = load_dataset()
``````

Cell 3/7

``````def forward_propagation(X, parameters):`

# retrieve parameters
W1 = parameters["W1"]
b1 = parameters["b1"]
W2 = parameters["W2"]
b2 = parameters["b2"]
W3 = parameters["W3"]
b3 = parameters["b3"]

# LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
Z1 = W1@X + b1
A1 = tf.keras.activations.relu(Z1)
Z2 = W2@A1 + b2
A2 = tf.keras.activations.relu(Z2)
Z3 = W3@A2 + b3
A3 = tf.keras.activations.sigmoid(Z3)

return A3
``````

Cell 4/7

``````def compute_loss(A3, Y):
m = tf.cast(tf.shape(Y)[1], tf.float32)  # Ensure the shape parameter is a float type for division
loss = (-1./m) * tf.reduce_sum(Y * tf.math.log(A3 + 1e-9) + (1 - Y) * tf.math.log(1 - A3 + 1e-9))
loss = tf.squeeze(loss)  # Removes dimensions of size 1 from the shape of the tensor.
return loss
``````

Cell 5/7

``````def initialize_parameters_random_tf(layers_dims):
np.random.seed(3)  # Setting the seed for reproducibility
parameters = {}
L = len(layers_dims)  # number of layers in the network

for l in range(1, L):
# Using np.random.randn for initialization
W = np.random.randn(layers_dims[l], layers_dims[l-1]).astype(np.float32) * 10
b = np.zeros((layers_dims[l], 1), dtype=np.float32)

# Converting to tf.Variable for TensorFlow compatibility
parameters[f'W{l}'] = tf.Variable(W, trainable=True)
parameters[f'b{l}'] = tf.Variable(b, trainable=True)

return parameters
``````

Cell 6/7

``````def fit_check(X, Y, epochs=15000):

# Step 01: Variables setup:
optimizer = tf.keras.optimizers.SGD(0.01)

X = tf.convert_to_tensor(X, dtype=tf.float32)
Y = tf.convert_to_tensor(Y, dtype=tf.float32)

parameters = initialize_parameters_random_tf([X.shape[0], 10, 5, 1])

# Step 02: Iteration
for epoch in range(epochs):

# Forward propagation:
A3 = forward_propagation(X, parameters)

# Loss calculation
loss = compute_loss(A3, Y)

# Calculate gradients with respect to the parameters

# Update the parameters

if epoch % 1000 == 0:
print(f"Epoch {epoch}, Loss: {loss.numpy()}")
``````

Cell 7/7

``````fit_check(train_X, train_Y)
``````

Here is the result I get

``````Epoch 0, Loss: 10.361632347106934
Epoch 1000, Loss: 10.361632347106934
Epoch 2000, Loss: 10.361632347106934
Epoch 3000, Loss: 10.361632347106934
Epoch 4000, Loss: 10.361632347106934
Epoch 5000, Loss: 10.361632347106934
Epoch 6000, Loss: 10.361632347106934
Epoch 7000, Loss: 10.361632347106934
Epoch 8000, Loss: 10.361632347106934
Epoch 9000, Loss: 10.361632347106934
Epoch 10000, Loss: 10.361632347106934
Epoch 11000, Loss: 10.361632347106934
Epoch 12000, Loss: 10.361632347106934
Epoch 13000, Loss: 10.361632347106934
Epoch 14000, Loss: 10.361632347106934
``````
1 Like

If youâ€™re doing this in TensorFlow, you donâ€™t need functions like forward_propagation() or iterations or gradient tape.

You just create a model with the layers you want to use - like tfl.Dense(â€¦) - with a specific activation.

Then you compile and fit the model to some data set.

Tensorflow does all of the detailed work for you.

1 Like

Thank you for your reply. I just finished the Specialization 02, so that is the best I can do.

However, my main concern is that the tensorflow code and the code in the assignment are maintained in similar but they give totally different output. I am more looking to this issue

1 Like

Well, there are several things that you are doing differently than the code you copied from the TensorFlow Introduction exercise:

1. You are not using a real TF cost function. You could use the TF `BinaryCrossentropy` loss function there. But note that you have to play the same games to deal with the fact that the TF functions all expect â€śsamples firstâ€ť data orientation. But I would expect they can compute gradients of your loss function as well.
2. You are handling the elements of the `parameters` dictionary differently by using a list of the â€śvalues()â€ť in the dictionary. They get specific references to the entries of the dictionary and use those. There are some subtleties about how object references in python work. It looks like the fundamental problem is that your parameters are just not getting updated, which is why Iâ€™m pointing out this difference. Not sure itâ€™s the real cause of the issue, but something to consider carefully.

Hereâ€™s a thread which talks about some of the pitfalls with object references. But note that itâ€™s warning you about the exact opposite scenario: in your case here you want to be referencing the global objects that are the elements of the `parameters` dictionary. In the case discussed on that other thread, the point is you want to break the link between the global values and the values you are modifying locally.

3 Likes

But the fundamental point here is that this is still very â€śearly daysâ€ť in terms of how to use TF. It might not be a productive use of your time to figure out this particular case. The other way to invest your time would be to proceed with DLS Course 4 and really get more exposure to how to use TF to implement models.

2 Likes

Hello @quoc

From your work, you are definitely better than that.

Donâ€™t invent this `10` The lab used `np.sqrt( 2 / layers_dims[l-1])`.

Cheers,
Raymond

1 Like

I do not invent that. The instruction askes to scale by 10

1 Like

Ok, I did the experiment and my theory 2) above is not the problem. Hereâ€™s a test cell that uses the python â€ś`id()`â€ť function to show how object references work w.r.t. dictionaries, a `copy()` of a dictionary and a `deepcopy` of a dictionary. Then I added a few more steps to show that taking the list of the values still gives you a reference to the actual entries in the dictionary.

To understand the output that follows, note that the output of `id()` is essentially the address of the object in memory translated to a decimal number. So if the ids are the same, it means that both variables point to the same actual object in memory. If they are different, then itâ€™s a separate copy of the object.

Hereâ€™s the test cell:

``````# Experiment with copy and deepcopy
np.random.seed(42)
A = np.random.randint(0, 10, (3,4))
print(f"A = {A}")
print(f"id(A) = {id(A)}")
B = A
print(f"id(B) = {id(B)}")
origDict = {}
print(f"id(origDict) = {id(origDict)}")
origDict["A"] = A
print(f"id(origDict) = {id(origDict)}")
B = origDict["A"]
print(f"id(B) = {id(B)}")
copyDict = origDict.copy()
print(f"id(copyDict) = {id(copyDict)}")
copyB = copyDict["A"]
print(f"id(copyB) = {id(copyB)}")
deepcopyDict = copy.deepcopy(origDict)
print(f"id(deepcopyDict) = {id(deepcopyDict)}")
deepcopyB = deepcopyDict["A"]
print(f"id(deepcopyB) = {id(deepcopyB)}")
listOrig = list(origDict.values())
print(f"type(listOrig) {type(listOrig)}")
print(f"id(listOrig) {id(listOrig)}")
listCopyA = listOrig[0]
print(f"type(listCopyA) {type(listCopyA)}")
print(f"id(listCopyA) {id(listCopyA)}")
``````

And here is the output I get when I run that:

``````A = [[6 3 7 4]
[6 9 2 6]
[7 4 3 7]]
id(A) = 132417685968176
id(B) = 132417685968176
id(origDict) = 132417685967648
id(origDict) = 132417685967648
id(B) = 132417685968176
id(copyDict) = 132417686207136
id(copyB) = 132417685968176
id(deepcopyDict) = 132417685967408
id(deepcopyB) = 132417685968896
type(listOrig) <class 'list'>
id(listOrig) 132417685969008
type(listCopyA) <class 'numpy.ndarray'>
id(listCopyA) 132417685968176
``````

You really have to look at all that in detail and follow the meaning of each step to â€śgetâ€ť what it all means, but the bottom line is that the reference `listCopyA` does actually point to the real object that is indexed by the dictionary.

So this is all educational, but the net result is that it doesnâ€™t explain anything about why your code generates a different result. Whatever this issue actually is, my theory 2) was not it.

1 Like

Oh, @quoc, my bad. I take my words back.

I forgot that the lab had demoed a bad case with `10`.

Please check out `initialize_parameters_he`, and use the multiplication factor there instead of `10` , then your code should work fine.

Cheers,
Raymond

Hello @quoc,

Please let me know if your code is still not working after changing the multiplication factor as I said, okay?

Cheers,
Raymond