Why tensorflow does not give the same result

Hello

I try to replicate previous assignment Course 2 - Week 01 - Assignment 01
using tensorflow. However, the result not only similar to the result in Assignment 01, but also does not converge. I hope to get some help from fellows for:

  1. Why the result does not converge, giving the same set up in the assignment 1 (W1)
  2. Why the result is not similar?

Here is the complete code:

Cell 1/7

import numpy as np
import matplotlib.pyplot as plt
import sklearn
import sklearn.datasets
import tensorflow as tf
from public_tests import *

from init_utils import load_dataset, plot_decision_boundary

%matplotlib inline
plt.rcParams['figure.figsize'] = (7.0, 4.0) # set default size of plots
plt.rcParams['image.interpolation'] = 'nearest'
plt.rcParams['image.cmap'] = 'gray'

%load_ext autoreload
%autoreload 2

Cell 2/7

train_X, train_Y, test_X, test_Y = load_dataset()

Cell 3/7

def forward_propagation(X, parameters):`

    # retrieve parameters
    W1 = parameters["W1"]
    b1 = parameters["b1"]
    W2 = parameters["W2"]
    b2 = parameters["b2"]
    W3 = parameters["W3"]
    b3 = parameters["b3"]
    
    # LINEAR -> RELU -> LINEAR -> RELU -> LINEAR -> SIGMOID
    Z1 = W1@X + b1
    A1 = tf.keras.activations.relu(Z1)
    Z2 = W2@A1 + b2
    A2 = tf.keras.activations.relu(Z2)
    Z3 = W3@A2 + b3
    A3 = tf.keras.activations.sigmoid(Z3)
        
    return A3

Cell 4/7

def compute_loss(A3, Y):
    m = tf.cast(tf.shape(Y)[1], tf.float32)  # Ensure the shape parameter is a float type for division
    loss = (-1./m) * tf.reduce_sum(Y * tf.math.log(A3 + 1e-9) + (1 - Y) * tf.math.log(1 - A3 + 1e-9))
    loss = tf.squeeze(loss)  # Removes dimensions of size 1 from the shape of the tensor.
    return loss

Cell 5/7

def initialize_parameters_random_tf(layers_dims):
    np.random.seed(3)  # Setting the seed for reproducibility
    parameters = {}
    L = len(layers_dims)  # number of layers in the network
    
    for l in range(1, L):
        # Using np.random.randn for initialization
        W = np.random.randn(layers_dims[l], layers_dims[l-1]).astype(np.float32) * 10
        b = np.zeros((layers_dims[l], 1), dtype=np.float32)
        
        # Converting to tf.Variable for TensorFlow compatibility
        parameters[f'W{l}'] = tf.Variable(W, trainable=True)
        parameters[f'b{l}'] = tf.Variable(b, trainable=True)

    return parameters

Cell 6/7

def fit_check(X, Y, epochs=15000):
    
    # Step 01: Variables setup:
    optimizer = tf.keras.optimizers.SGD(0.01)
    
    X = tf.convert_to_tensor(X, dtype=tf.float32)
    Y = tf.convert_to_tensor(Y, dtype=tf.float32)
    
    parameters = initialize_parameters_random_tf([X.shape[0], 10, 5, 1])
    
    # Step 02: Iteration
    for epoch in range(epochs):
        with tf.GradientTape() as tape:
                
            # Forward propagation:
            A3 = forward_propagation(X, parameters)
            
            # Loss calculation
            loss = compute_loss(A3, Y)
            
        # Calculate gradients with respect to the parameters
        grads = tape.gradient(loss, list(parameters.values()))
        
        # Update the parameters
        optimizer.apply_gradients(zip(grads, list(parameters.values())))
        
        if epoch % 1000 == 0:
            print(f"Epoch {epoch}, Loss: {loss.numpy()}")

Cell 7/7

fit_check(train_X, train_Y)

Here is the result I get

Epoch 0, Loss: 10.361632347106934
Epoch 1000, Loss: 10.361632347106934
Epoch 2000, Loss: 10.361632347106934
Epoch 3000, Loss: 10.361632347106934
Epoch 4000, Loss: 10.361632347106934
Epoch 5000, Loss: 10.361632347106934
Epoch 6000, Loss: 10.361632347106934
Epoch 7000, Loss: 10.361632347106934
Epoch 8000, Loss: 10.361632347106934
Epoch 9000, Loss: 10.361632347106934
Epoch 10000, Loss: 10.361632347106934
Epoch 11000, Loss: 10.361632347106934
Epoch 12000, Loss: 10.361632347106934
Epoch 13000, Loss: 10.361632347106934
Epoch 14000, Loss: 10.361632347106934
1 Like

If you’re doing this in TensorFlow, you don’t need functions like forward_propagation() or iterations or gradient tape.

You just create a model with the layers you want to use - like tfl.Dense(…) - with a specific activation.

Then you compile and fit the model to some data set.

Tensorflow does all of the detailed work for you.

1 Like

Thank you for your reply. I just finished the Specialization 02, so that is the best I can do.

However, my main concern is that the tensorflow code and the code in the assignment are maintained in similar but they give totally different output. I am more looking to this issue

1 Like

Well, there are several things that you are doing differently than the code you copied from the TensorFlow Introduction exercise:

  1. You are not using a real TF cost function. You could use the TF BinaryCrossentropy loss function there. But note that you have to play the same games to deal with the fact that the TF functions all expect “samples first” data orientation. But I would expect they can compute gradients of your loss function as well.
  2. You are handling the elements of the parameters dictionary differently by using a list of the “values()” in the dictionary. They get specific references to the entries of the dictionary and use those. There are some subtleties about how object references in python work. It looks like the fundamental problem is that your parameters are just not getting updated, which is why I’m pointing out this difference. Not sure it’s the real cause of the issue, but something to consider carefully.

Here’s a thread which talks about some of the pitfalls with object references. But note that it’s warning you about the exact opposite scenario: in your case here you want to be referencing the global objects that are the elements of the parameters dictionary. In the case discussed on that other thread, the point is you want to break the link between the global values and the values you are modifying locally.

3 Likes

But the fundamental point here is that this is still very “early days” in terms of how to use TF. It might not be a productive use of your time to figure out this particular case. The other way to invest your time would be to proceed with DLS Course 4 and really get more exposure to how to use TF to implement models.

2 Likes

Hello @quoc

From your work, you are definitely better than that. :wink:

Don’t invent this 10 :wink: The lab used np.sqrt( 2 / layers_dims[l-1]).

Cheers,
Raymond

1 Like

I do not invent that. The instruction askes to scale by 10

1 Like

Ok, I did the experiment and my theory 2) above is not the problem. Here’s a test cell that uses the python “id()” function to show how object references work w.r.t. dictionaries, a copy() of a dictionary and a deepcopy of a dictionary. Then I added a few more steps to show that taking the list of the values still gives you a reference to the actual entries in the dictionary.

To understand the output that follows, note that the output of id() is essentially the address of the object in memory translated to a decimal number. So if the ids are the same, it means that both variables point to the same actual object in memory. If they are different, then it’s a separate copy of the object.

Here’s the test cell:

# Experiment with copy and deepcopy
np.random.seed(42)
A = np.random.randint(0, 10, (3,4))
print(f"A = {A}")
print(f"id(A) = {id(A)}")
B = A
print(f"id(B) = {id(B)}")
origDict = {}
print(f"id(origDict) = {id(origDict)}")
origDict["A"] = A
print(f"id(origDict) = {id(origDict)}")
B = origDict["A"]
print(f"id(B) = {id(B)}")
copyDict = origDict.copy()
print(f"id(copyDict) = {id(copyDict)}")
copyB = copyDict["A"]
print(f"id(copyB) = {id(copyB)}")
deepcopyDict = copy.deepcopy(origDict)
print(f"id(deepcopyDict) = {id(deepcopyDict)}")
deepcopyB = deepcopyDict["A"]
print(f"id(deepcopyB) = {id(deepcopyB)}")
listOrig = list(origDict.values())
print(f"type(listOrig) {type(listOrig)}")
print(f"id(listOrig) {id(listOrig)}")
listCopyA = listOrig[0]
print(f"type(listCopyA) {type(listCopyA)}")
print(f"id(listCopyA) {id(listCopyA)}")

And here is the output I get when I run that:

A = [[6 3 7 4]
 [6 9 2 6]
 [7 4 3 7]]
id(A) = 132417685968176
id(B) = 132417685968176
id(origDict) = 132417685967648
id(origDict) = 132417685967648
id(B) = 132417685968176
id(copyDict) = 132417686207136
id(copyB) = 132417685968176
id(deepcopyDict) = 132417685967408
id(deepcopyB) = 132417685968896
type(listOrig) <class 'list'>
id(listOrig) 132417685969008
type(listCopyA) <class 'numpy.ndarray'>
id(listCopyA) 132417685968176

You really have to look at all that in detail and follow the meaning of each step to “get” what it all means, but the bottom line is that the reference listCopyA does actually point to the real object that is indexed by the dictionary.

So this is all educational, but the net result is that it doesn’t explain anything about why your code generates a different result. Whatever this issue actually is, my theory 2) was not it. :disappointed_relieved:

1 Like

Oh, @quoc, my bad. I take my words back.

I forgot that the lab had demoed a bad case with 10.

Please check out initialize_parameters_he, and use the multiplication factor there instead of 10 :wink: , then your code should work fine.

Cheers,
Raymond

Hello @quoc,

Please let me know if your code is still not working after changing the multiplication factor as I said, okay? :wink: :wink:

Cheers,
Raymond