How is training finding different weights for ReLU units that are all initialized to 0?

paulinpaloalto · March 9, 2024, 9:07pm

I think your experiment is just invalid. TF must be doing something else that messes up the results, e.g. that there are non-zero gradients still left over from the first training run. It is strange to run the training once and then set the weights and then train again. The better way to run the experiment would be to use the kernel_initializer keyword argument when you define the layers. E.g.

model = Sequential(
    [
        Dense(2, activation = 'relu',   kernel_initializer = 'zeros', name = "L1"),
        Dense(4, activation = 'linear', kernel_initializer = 'zeros', name = "L2")
    ]
)

If you do that, then you don’t have to do the disruptive thing of running the training twice. Please give that a try and see if it changes the results. Note that you also need to initialize the bias values to be zeros, but that is the default. If you want to be sure, you can also add

bias_initializer = 'zeros'

on both layers. For more info, here’s the docpage for Dense. Note that you can also break symmetry with zero weights and non-zero biases.

Here’s a thread from DLS that discusses Symmetry Breaking and why it is not needed in Logistic Regression, but is needed for real Neural Networks.

Topic		Replies	Views
Weight Initalization AI Discussions ai-discussions	14	340	October 8, 2024
Week 1, Programming Assignment initialization, Exercise 1 - initialize_parameters_zeros Improving Deep Neural Networks: Hyperparameter tun coursera-platform	8	830	December 15, 2023
Concept in Initialization Assignment-Help needed in understanding Improving Deep Neural Networks: Hyperparameter tun coursera-platform	6	658	March 11, 2025
Why don't weights get adjusted when initialized to 0? Improving Deep Neural Networks: Hyperparameter tun coursera-platform	2	479	August 30, 2023
Why Tensorflow outputs the initial, randomly initialized weights? Advanced Learning Algorithms week-1	2	525	August 24, 2022

How is training finding different weights for ReLU units that are all initialized to 0?

Related topics