Location of initial random parameters relative to the loop for NNs

Through other questions I’ve asked here, I’ve learned that it is a good idea to use this:

tf.random.set_seed(1234) # to initialize the parameters/weights.

before running through a NN (Sequential, .compile, .fit, and .predict). This makes sense. But, I have a question the situation in section “7 - Iterate to find optimal regularization value” in the class 2, week 3, assignment.

In that section, the parameters are initialized outside of the lambda for loop. If you move the
tf.random.set_seed(1234)
to inside of the loop, then the probabilities for the NN using the first lambda are the same whether you initialize in or out of the loop (which makes sense), but the probabilities for the NN’s beyond the first lambda differ. To make my question more concrete, I rewrote the script/loop here twice, and simplified the NN architecture and gave new training inputs to show what I mean. The first script/loop has the random initialization outside of the loop, like in the assignment, the second one has it inside of the loop. During both I capture the probabilities and then after both I compare the probabilities. My question is below that.

###########################################
# Make new inputs/training data, for faster run time:

X_train = np.array([[1,2.2,3.2,5],[5,.9,1.2,9.5],[4,3,6,9],[-.3,-2.5,1.1,5.1],[-.6,-2.6,10.1,5.1]])
y_train = np.array([1,1,0,1,2])

# Script/loop number 1:

tf.random.set_seed(1234)      # Outside of loop
lambdas = [0.01, 0.05]             # Only 2 lambdas now
models=[None] * len(lambdas)
probs_list_random_out_ofloop = []               # Added to capture the probabilities
for i in range(len(lambdas)):
    lambda_ = lambdas[i]
    models[i] =  Sequential( [
            Dense(8, activation = 'relu', kernel_regularizer=tf.keras.regularizers.l2(lambda_)),
            Dense(4, activation = 'relu', kernel_regularizer=tf.keras.regularizers.l2(lambda_)),
            Dense(3, activation = 'linear') ] )
    models[i].compile(
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        optimizer=tf.keras.optimizers.Adam(0.01),)
    models[i].fit(X_train,y_train, epochs=20 )
    probs = tf.nn.sigmoid(models[i].predict(X_train)).numpy()
    probs_list_random_out_ofloop.append(probs)            # Capture probs here

# Script/loop number 2:

lambdas = [0.01, 0.05]                                     # Only 2 lambdas now
models=[None] * len(lambdas)
probs_list_random_inloop = []               # Added to capture the probabilities
for i in range(len(lambdas)):
    tf.random.set_seed(1234)      # Inside of loop
    lambda_ = lambdas[i]
    models[i] =  Sequential( [
            Dense(8, activation = 'relu', kernel_regularizer=tf.keras.regularizers.l2(lambda_)),
            Dense(4, activation = 'relu', kernel_regularizer=tf.keras.regularizers.l2(lambda_)),
            Dense(3, activation = 'linear') ] )
    models[i].compile(
        loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
        optimizer=tf.keras.optimizers.Adam(0.01),)
    models[i].fit(X_train,y_train, epochs=20 )
    probs = tf.nn.sigmoid(models[i].predict(X_train)).numpy()
    probs_list_random_inloop.append(probs)            # Capture probs here

print('first iteration prob comparison:\n',probs_list_random_out_ofloop[0] == probs_list_random_inloop[0])
print('first iteration prob comparison:\n',probs_list_random_out_ofloop[1] == probs_list_random_inloop[1])

###########################################

The second NN that is solved for, for the second lambda, has a different answer (different probabilities) depending on when you initialize the parameters.

Question: Does it matter where/when we initialize? To me it makes sense that you initialize the parameters before each .fit (within the loop). Any thoughts or explanation of this?

Thanks.
ps - apologies for not knowing how to put code into the question in a different format, and instead just pasting the text directly in.

Hello Navead @naveadjensen!!

That’s a very good question!

I support to set the random seed inside the for loop, because in that way your model always begins with the same parameters, and then the only difference becomes the lambda value. Consequently, the only cause of the difference among the models would be the lambda value which aligns with the objective of the experiment.

Cheers,
Raymond

PS: I modified your post for formatting your code. To see what I have changed, please edit your post.