Weight Initalization

Syed_Hamza_Mohiuddin · May 11, 2024, 3:13am

I have a question regarding the identical and constant weight intialization.
I have read in the notes on this website, as well as in other resources that if the network is intialzed with constant identical weights, then each neuron learns the same weights.
However, when I did the same in tensorflow, i am getting different weights after training for each neuron.
Here’s my code:
multi_linear_model = tf.keras.Sequential([

tf.keras.layers.Lambda(lambda x: x[:,-1:,:]),

tf.keras.layers.Dense(OUT_STEPS*num_features,kernel_initializer=tf.initializers.zeros(),bias_initializer=tf.initializers.zeros()),

= tf.keras.layers.Reshape([OUT_STEPS, num_features])

])

I am basically working with time series data. The first lambda layer just extracts the last time stamp. The second layer is a dense layer that number of neurons defined by the some constants.

The important part is that I have set the weights and biases to zero, But after training, I am getting different weights for each neuron in this layer.

Can someone explain why??Pleas…

TMosh · May 11, 2024, 4:22am

Tensorflow automatically initializes the weights to small random values as soon as you create the layer objects.

Syed_Hamza_Mohiuddin · May 11, 2024, 4:24am

I have passed the zero initilazers for weigth and biase. After creating the model, I also checked the weights and biases to be zero.
But after training, I am getting different wieghts for each neuron. I can give more details if needed

TMosh · May 11, 2024, 4:33am

See my previous reply. When you run the code and it actually creates the layer objects, it randomly initializes the weights.

Syed_Hamza_Mohiuddin · May 11, 2024, 4:38am

Are you saying that kernel_initializer argument is ineffective when defining Dense layer?
I printed weight values after the creation of the sequential model.
They were zero.

TMosh · May 11, 2024, 4:44am

Interesting. I did not study your code in detail. It has some methods that are unusual.

TMosh · May 11, 2024, 4:44am

Perhaps we should wait for another mentor to contribute.

paulinpaloalto · May 11, 2024, 11:06am

I think the point here is that your “network” is not really a neural network: it is the equivalent of Logistic Regression. Or Linear Regression, since it looks like you are also not including an output activation from what we can see there. It is simply one Dense layer. Here’s an article which demonstrates why zero initialization does not prevent Logistic Regression from learning a valid solution. In DLS Course 2 Week 2, Prof Ng makes the point that you can look at Logistic Regression as a “trivial” neural network, but the fact that it has only a single layer changes the math.

Once you go to multiple cascaded Dense layers, that will no longer be true and Symmetry Breaking will be required.

You could easily test my theory by adding a second Dense layer also with zero initializations and see what happens. That may or may not be what you want for your actual solution, but my bet is that it would clear up the theoretical point you are making here.

Syed_Hamza_Mohiuddin · May 11, 2024, 3:55pm

sure

Syed_Hamza_Mohiuddin · May 11, 2024, 4:06pm

Thanks a lot.
I did experiment by adding an additional layer and trying different constant values for initialization. It worked as expected. I also managed to get the theory of regression down, and why it wont face this symmetrics issue. The only thing left is understanding the theoretical reasoning for a Multi layer perceptron.
I wasnt aware that a single layer doesn’t face this issue.
Thanks again…

paulinpaloalto · May 12, 2024, 1:43am

Thanks for doing the further investigations and sharing your results.

Taking DLS Course 1 would be a good way to learn about Multi-layer Perceptrons, although Prof Ng uses the more modern terminology and calls them Fully Connected Neural Networks.

Syed_Hamza_Mohiuddin · May 12, 2024, 7:02am

sure. I am planning on revisiting some of the consepts taught in the course. I think if I have to summerize this issue, it would be that, there’s no backpropagation happening when there’s only a single layer. Correct me if I am wrong.

Neverthless, I am grateful that these forums with amazing people exist.
May God guide and bless you abundantly.

paulinpaloalto · May 12, 2024, 4:29pm

Gradients still get computed and applied. It’s just that you only have three functions involved: the linear function, the sigmoid activation and the cross entropy loss. But it’s still “back propagation” even with one layer: it goes “backwards” from the loss to the activation to the linear coefficients. There is nothing to update (no “parameters”) in the sigmoid and loss functions, so we only update the w and b from the linear function.

There’s no learning without applying gradients. Well, if you’re doing Linear Regression, then there actually is a “closed form” solution called the Normal Equation. But you’ll see Gradient Descent used in Linear Regression in cases with relatively high dimensions because the computational complexity of the Normal Equation is higher than Gradient Descent, so there can be cases in which GD is a cheaper way to get your solution. Once you graduate to Logistic Regression and multi-layer networks, there is no longer a closed form solution and you’ve got no choice other than some form of Gradient Descent basically. There are other iterative approximation methods like Newton’s Method, but they have a similar flavor (using derivatives to push the solution in a better direction repetitively).

TMosh · May 12, 2024, 5:02pm

You are correct in one way. “backpropagation” usually refers to how the gradients are computed in the hidden layer of a neural network. So since you have no hidden layer, backpropagation isn’t used.

With a simple linear or logistic regression, you still have to compute the gradients in order to find the weights that give the minimum cost. TensorFlow does this for you automatically.

Syed_Hamza_Mohiuddin · October 8, 2024, 5:34am

Thanks everyone. I missed the notifications. @paulinpaloalto @TMosh

Topic		Replies	Views
Question about Hidden Layers Advanced Learning Algorithms week-1	2	389	July 25, 2023
How is training finding different weights for ReLU units that are all initialized to 0? Advanced Learning Algorithms week-2	4	289	March 10, 2024
What is the use of Weights in TensorFlow Layers AI Discussions tensorflow , weights-biases	6	196	May 20, 2024
Coffe Roasting in TensorFlow Lab - initial weights? Advanced Learning Algorithms week-1	12	552	July 3, 2023
I have a question about the weight initialization in the layers AI Discussions ai-discussions	0	50	January 29, 2024

Weight Initalization

Related topics