Initial Parameter Values in Neural Networks (Deep Learning Special Course, Course 1 Week 3)

Shiori_YAMASHITA · October 24, 2022, 8:08am

In the third week of this course, I learned that the parameters of the linear transformation part of the NN should not be zero matrices, because each parameter would be updated in exactly the same way, losing the meaning of separating features. .
However, even if the original linear transformation part (W) is a 0 matrix, if a different initial value is set for the constant parameter (b) that is added later, the learning will not be symmetrical?
Why do we care that the initial value of W is not a 0 matrix? If it’s common to start b with a 0 vector, then that makes sense, but if so, why is b initially set to a 0 vector?

Mubsi · October 24, 2022, 9:07am

Hi @Shiori_YAMASHITA,

Think of it this way:

For simplicity, consider, X = 1, W1 = W2 = W3 = 15 and b1 = b2 = b3 = 2.

For every Wx + b, the answer will be 17. What this means is, as you mentioned, there are no separating features.

Now consider X = 1, W1 = 3, W2 = 5, W3 = 15 and b1 = b2 = b3 = 2

The values now become: W1x + b1 = 5, W2x + b2 = 7, W3x + b3 = 17. Now as you can see, different values of W gave us different values, even when b was consistent.

Now we know that W is matrix and b is just a singular value. When Ws are initialised randomly, a singular, same value of b would not have that much affect on them, those value of Ws will still remain different from each other (what we aim to achieve)

This is why we care more about having random values of W and don’t care much even if B’s are set to 0.

Hope I made sense,
Mubsi

Shiori_YAMASHITA · October 24, 2022, 1:51pm

Hi, Mubsi san,
Thank you for your reply!

I’m sorry that I couldn’t convey the intent of the question well enough.

Indeed, even if the initial value of b is constant for each feature, the parameter update proceeds well if the initial value of W is random.
However, in the same way, even if the initial value of W is constant, if b is random, the output will not be symmetrical, so the intention of the question is that learning will progress.
For example, X=1, W1=W2=0, b1=1, b2=2.
In other words, the fact that the initial value of W is not a 0 matrix is a sufficient condition for successful learning, but it is not a necessary condition. Why would you prefer a random starting value for W over a random starting value for b?
I don’t think the reply I received just now is an answer to that question, but is my understanding insufficient?

Best,
Shiori

2022年10月24日(月) 18:18 Muhammad Mubashar via DeepLearning.AI <notifications@dlai.discoursemail.com>:

paulinpaloalto · October 24, 2022, 2:52pm

Yes, you are correct that you can “break symmetry” by making the W values constant and the b values random. My guess is that the reason the common practice is to use W as the random values is that it must give better convergence in most cases. You can try some experiments and see if you can see any difference. Here’s a thread from a while back that discusses Symmetry Breaking in more detail.

Note that there are a number of different possible random initialization algorithms. They show us a very simple one in Week 3 and Week 4 of Course 1. But it turns out those straightforward algorithms do not always work very well. Prof Ng will show us some more sophisticated initialization algorithms and discuss these issues in more detail in Course 2, so stay tuned for that. I point this out to give some background on my comment that there may be a reason for not using the bias values for symmetry breaking. Initialization matters for the performance of convergence and there is no single “silver bullet” solution that works best in all cases.

Shiori_YAMASHITA · October 24, 2022, 3:27pm

Thanks Paulinpaloalto san,

I’m looking forward to learning more in detail on the next course or reading your thread!

Best,
Shiori

2022年10月25日(火) 0:03 Paul Mielke via DeepLearning.AI <notifications@dlai.discoursemail.com>:

Topic		Replies	Views
Week 3 Random Initialization Neural Networks and Deep Learning coursera-platform	6	686	May 6, 2022
Parameter Initializatio Neural Networks and Deep Learning coursera-platform	1	675	October 14, 2021
Randomly initialize parameter b instead of W Neural Networks and Deep Learning coursera-platform	6	684	August 23, 2022
Symmetry Breaking versus Zero Initialization Neural Networks and Deep Learning week-module-3 , coursera-platform	7	11756	January 5, 2022
Course 2 Initialization with zero weights and none zeros bias Improving Deep Neural Networks: Hyperparameter tun coursera-platform	3	602	May 14, 2021

Initial Parameter Values in Neural Networks (Deep Learning Special Course, Course 1 Week 3)

Related topics