How is training finding different weights for ReLU units that are all initialized to 0?

Jessica-G · March 9, 2024, 8:39pm

I was playing around with initial weights in C2_W2_Multiclass_TF and I expected that setting all the weights to 0 before training would result in all the units getting trained to the same parameters, as explained in DeepLearning Initializing Neural Networks : “Initializing all the weights with zeros leads the neurons to learn the same features during training.”

But that didn’t happen! After setting all the weights to 0, then training, the training result was different parameters for each unit and it found approximately the same parameters as the original training (before setting initial weights to 0).

How did it do that? What is happening?! Please help me if you can, this is driving me completely nuts.

model = Sequential(
    [
        Dense(2, activation = 'relu',   name = "L1"),
        Dense(4, activation = 'linear', name = "L2")
    ]
)

model.compile(
    loss=tf.keras.losses.SparseCategoricalCrossentropy(from_logits=True),
    optimizer=tf.keras.optimizers.Adam(0.01),
)

# Model wont allow .set_weights until this is ran once
model.fit(
    X_train,y_train,
    epochs=200
)

# Look at the resulting weights
model.get_weights()
[array([[ 1.22,  0.6 ],
        [ 0.92, -1.7 ]], dtype=float32),
 array([1.59, 1.5 ], dtype=float32),
 array([[-2.01, -3.07,  1.3 ,  0.33],
        [-2.83,  1.09, -1.89,  0.69]], dtype=float32),
 array([ 3.18,  0.21, -1.31, -2.61], dtype=float32)]

# Set all the weights to 0 for all units in all layers
model.set_weights([
    np.array([[ 0.0,  0.0 ],
     [ 0.0, 0.0 ]]),
    np.array([0.0, 0.0 ]),
    np.array([[0.0, 0.0,  0.0 ,  0.0],
     [0.0,  0.0, 0.0,  0.0]]),
    np.array([ 0.0,  0.0, 0.0, 0.0])
])

# Re-train
model.fit(
    X_train,y_train,
    epochs=100
)

# Print the new weights
model.get_weights()

# Resulting weights
[array([[ 1.28,  0.45],
        [ 0.71, -1.69]], dtype=float32),
 array([1.48, 1.51], dtype=float32),
 array([[-2.06, -2.33,  1.11,  0.42],
        [-1.9 ,  1.3 , -1.94,  0.59]], dtype=float32),
 array([ 2.47, -0.52, -0.46, -2.14], dtype=float32)]

paulinpaloalto · March 9, 2024, 9:07pm

I think your experiment is just invalid. TF must be doing something else that messes up the results, e.g. that there are non-zero gradients still left over from the first training run. It is strange to run the training once and then set the weights and then train again. The better way to run the experiment would be to use the kernel_initializer keyword argument when you define the layers. E.g.

model = Sequential(
    [
        Dense(2, activation = 'relu',   kernel_initializer = 'zeros', name = "L1"),
        Dense(4, activation = 'linear', kernel_initializer = 'zeros', name = "L2")
    ]
)

If you do that, then you don’t have to do the disruptive thing of running the training twice. Please give that a try and see if it changes the results. Note that you also need to initialize the bias values to be zeros, but that is the default. If you want to be sure, you can also add

bias_initializer = 'zeros'

on both layers. For more info, here’s the docpage for Dense. Note that you can also break symmetry with zero weights and non-zero biases.

Here’s a thread from DLS that discusses Symmetry Breaking and why it is not needed in Logistic Regression, but is needed for real Neural Networks.

Jessica-G · March 9, 2024, 9:35pm

Thank you Paul! Initializing the model that way did the trick - after training, all the weights were still at zero. The world makes sense!

I also read through the DLS thread you linked - a great explanation. I enjoyed the extra information to get my head around how these initializations affect gradient descent.

paulinpaloalto · March 9, 2024, 9:39pm

Whew! It’s great to hear that the experiment now produces the results that we expected. It always feels better when things make sense.

rmwkwok · March 10, 2024, 5:58am

Can’t agree more with you, Paul.

Topic		Replies	Views
Weight Initalization AI Discussions ai-discussions	14	338	October 8, 2024
Week 1, Programming Assignment initialization, Exercise 1 - initialize_parameters_zeros Improving Deep Neural Networks: Hyperparameter tun	8	829	December 15, 2023
Concept in Initialization Assignment-Help needed in understanding Improving Deep Neural Networks: Hyperparameter tun	6	658	March 11, 2025
Why don't weights get adjusted when initialized to 0? Improving Deep Neural Networks: Hyperparameter tun	2	478	August 30, 2023
Why Tensorflow outputs the initial, randomly initialized weights? Advanced Learning Algorithms week-1	2	525	August 24, 2022

How is training finding different weights for ReLU units that are all initialized to 0?

Related topics