I think your experiment is just invalid. TF must be doing something else that messes up the results, e.g. that there are non-zero gradients still left over from the first training run. It is strange to run the training once and then set the weights and then train again. The better way to run the experiment would be to use the kernel_initializer
keyword argument when you define the layers. E.g.
model = Sequential(
[
Dense(2, activation = 'relu', kernel_initializer = 'zeros', name = "L1"),
Dense(4, activation = 'linear', kernel_initializer = 'zeros', name = "L2")
]
)
If you do that, then you don’t have to do the disruptive thing of running the training twice. Please give that a try and see if it changes the results. Note that you also need to initialize the bias values to be zeros, but that is the default. If you want to be sure, you can also add
bias_initializer = 'zeros'
on both layers. For more info, here’s the docpage for Dense. Note that you can also break symmetry with zero weights and non-zero biases.
Here’s a thread from DLS that discusses Symmetry Breaking and why it is not needed in Logistic Regression, but is needed for real Neural Networks.